Researcher profile

Magne Thoresen

Magne Thoresen contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 19 - UnverifiedVerification L1Unclaimed author
5works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

5 published item(s)

preprint2021arXiv

RaJIVE: Robust Angle Based JIVE for Integrating Noisy Multi-Source Data

With increasing availability of high dimensional, multi-source data, the identification of joint and data specific patterns of variability has become a subject of interest in many research areas. Several matrix decomposition methods have been formulated for this purpose, for example JIVE (Joint and Individual Variation Explained), and its angle based variation, aJIVE. Although the effect of data contamination on the estimated joint and individual components has not been considered in the literature, gross errors and outliers in the data can cause instability in such methods, and lead to incorrect estimation of joint and individual variance components. We focus on the aJIVE factorization method and provide a thorough analysis of the effect outliers on the resulting variation decomposition. After showing that such effect is not negligible when all data-sources are contaminated, we propose a robust extension of aJIVE (RaJIVE) that integrates a robust formulation of the singular value decomposition into the aJIVE approach. The proposed RaJIVE is shown to provide correct decompositions even in the presence of outliers and improves the performance of aJIVE. We use extensive simulation studies with different levels of data contamination to compare the two methods. Finally, we describe an application of RaJIVE to a multi-omics breast cancer dataset from The Cancer Genome Atlas. We provide the R package RaJIVE with a ready-to-use implementation of the methods and documentation of code and examples.

preprint2020arXiv

A robust variable screening procedure for ultra-high dimensional data

Variable selection in ultra-high dimensional regression problems has become an important issue. In such situations, penalized regression models may face computational problems and some pre screening of the variables may be necessary. A number of procedures for such pre-screening has been developed; among them the sure independence screening (SIS) enjoys some popularity. However, SIS is vulnerable to outliers in the data, and in particular in small samples this may lead to faulty inference. In this paper, we develop a new robust screening procedure. We build on the density power divergence (DPD) estimation approach and introduce DPD-SIS and its extension iterative DPD-SIS. We illustrate the behavior of the methods through extensive simulation studies and show that they are superior to both the original SIS and other robust methods when there are outliers in the data. We demonstrate the claimed robustness through use of influence functions, and we discuss appropriate choice of the tuning parameter $α$. Finally, we illustrate its use on a small dataset from a study on regulation of lipid metabolism.

preprint2020arXiv

Consistent Fixed-Effects Selection in Ultra-high dimensional Linear Mixed Models with Error-Covariate Endogeneity

Recently, applied sciences, including longitudinal and clustered studies in biomedicine require the analysis of ultra-high dimensional linear mixed effects models where we need to select important fixed effect variables from a vast pool of available candidates. However, all existing literature assume that all the available covariates and random effect components are independent of the model error which is often violated (endogeneity) in practice. In this paper, we first investigate this important issue in ultra-high dimensional linear mixed effects models with particular focus on the fixed effects selection. We study the effects of different types of endogeneity on existing regularization methods and prove their inconsistencies. Then, we propose a new profiled focused generalized method of moments (PFGMM) approach to consistently select fixed effects under 'error-covariate' endogeneity, i.e., in the presence of correlation between the model error and covariates. Our proposal is proved to be oracle consistent with probability tending to one and works well under most other type of endogeneity too. Additionally, we also propose and illustrate a few consistent parameter estimators, including those of the variance components, along with variable selection through PFGMM. Empirical simulations and an interesting real data example further support the claimed utility of our proposal.

preprint2020arXiv

On optimal two-stage testing of multiple mediators

Mediation analysis in high-dimensional settings often involves identifying potential mediators among a large number of measured variables. For this purpose, a two-step familywise error rate procedure called ScreenMin has been recently proposed (Djordjilović et al. 2019). In ScreenMin, variables are first screened and only those that pass the screening are tested. The proposed threshold for selection has been shown to guarantee asymptotic familywise error rate. In this work, we investigate the impact of the selection threshold on the finite sample familywise error rate. We derive a power maximizing selection threshold and show that it is well approximated by an adaptive threshold of Wang et al. (2016). We illustrate the investigated procedures on a case-control study examining the effect of fish intake on the risk of colorectal adenoma.

preprint2020arXiv

When and why are principal component scores a good tool for visualizing high-dimensional data?

Principal component analysis (PCA) is a popular dimension reduction technique often used to visualize high-dimensional data structures. In genomics, this can involve millions of variables, but only tens to hundreds of observations. Theoretically, such extreme high-dimensionality will cause biased or inconsistent eigenvector estimates, but in practice the principal component scores are used for visualization with great success. In this paper, we explore when and why the classical principal component scores can be used to visualize structures in high-dimensional data, even when there are few observations compared to the number of variables. Our argument is two-fold: First, we argue that eigenvectors related to pervasive signals will have eigenvalues scaling linearly with the number of variables. Second, we prove that for linearly increasing eigenvalues, the sample component scores will be scaled and rotated versions of the population scores, asymptotically. Thus the visual information of the sample scores will be unchanged, even though the sample eigenvectors are biased. In the case of pervasive signals, the principal component scores can be used to visualize the population structures, even in extreme high-dimensional situations.