Source author record

Aaron J. Molstad

Aaron J. Molstad appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Methodology Computation Machine Learning Applications

Catalog footprint

What is connected

8works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

A likelihood-based approach for multivariate categorical response regression in high dimensions

We propose a penalized likelihood method to fit the bivariate categorical response regression model. Our method allows practitioners to estimate which predictors are irrelevant, which predictors only affect the marginal distributions of the bivariate response, and which predictors affect both the marginal distributions and log odds ratios. To compute our estimator, we propose an efficient first order algorithm which we extend to settings where some subjects have only one response variable measured, i.e., the semi-supervised setting. We derive an asymptotic error bound which illustrates the performance of our estimator in high-dimensional settings. Generalizations to the multivariate categorical response regression model are proposed. Finally, simulation studies and an application in pan-cancer risk prediction demonstrate the usefulness of our method in terms of interpretability and prediction accuracy. An R package implementing the proposed method is available for download at github.com/ajmolstad/BvCategorical.

preprint2022arXiv

Mixed-type multivariate response regression with covariance estimation

We propose a new method for multivariate response regression and covariance estimation when elements of the response vector are of mixed types, for example some continuous and some discrete. Our method is based on a model which assumes the observable mixed-type response vector is connected to a latent multivariate normal response linear regression through a link function. We explore the properties of this model and show its parameters are identifiable under reasonable conditions. We impose no parametric restrictions on the covariance of the latent normal other than positive definiteness, thereby avoiding assumptions about unobservable variables which can be difficult to verify in practice. To accommodate this generality, we propose a novel algorithm for approximate maximum likelihood estimation that works "off-the-shelf" with many different combinations of response types, and which scales well in the dimension of the response vector. Our method typically gives better predictions and parameter estimates than fitting separate models for the different response types and allows for approximate likelihood ratio testing of relevant hypotheses such as independence of responses. The usefulness of the proposed method is illustrated in simulations; and one biomedical and one genomic data example.

preprint2022arXiv

New insights for the multivariate square-root lasso

We study the multivariate square-root lasso, a method for fitting the multivariate response linear regression model with dependent errors. This estimator minimizes the nuclear norm of the residual matrix plus a convex penalty. Unlike existing methods that require explicit estimates of the error precision (inverse covariance) matrix, the multivariate square-root lasso implicitly accounts for error dependence and is the solution to a convex optimization problem. We establish error bounds which reveal that like the univariate square-root lasso, the multivariate square-root lasso is pivotal with respect to the unknown error covariance matrix. In addition, we propose a variation of the alternating direction method of multipliers algorithm to compute the estimator and discuss an accelerated first order algorithm that can be applied in certain cases. In both simulation studies and a genomic data application, we show that the multivariate square-root lasso can outperform more computationally intensive methods that require explicit estimation of the error precision matrix.

preprint2022arXiv

Scalable algorithms for semiparametric accelerated failure time models in high dimensions

Semiparametric accelerated failure time (AFT) models are a useful alternative to Cox proportional hazards models, especially when the assumption of constant hazard ratios is untenable. However, rank-based criteria for fitting AFT models are often non-differentiable, which poses a computational challenge in high-dimensional settings. In this article, we propose a new alternating direction method of multipliers algorithm for fitting semiparametric AFT models by minimizing a penalized rank-based loss function. Our algorithm scales well in both the number of subjects and number of predictors; and can easily accommodate a wide range of popular penalties. To improve the selection of tuning parameters, we propose a new criterion which avoids some common problems in cross-validation with censored responses. Through extensive simulation studies, we show that our algorithm and software is much faster than existing methods (which can only be applied to special cases), and we show that estimators which minimize a penalized rank-based criterion often outperform alternative estimators which minimize penalized weighted least squares criteria. Application to nine cancer datasets further demonstrates that rank-based estimators of semiparametric AFT models are competitive with estimators assuming proportional hazards model in high-dimensional settings, whereas weighted least squares estimators are often not. A software package implementing the algorithm, along with a set of auxiliary functions, is available for download at github.com/ajmolstad/penAFT.

preprint2020arXiv

A covariance-enhanced approach to multi-tissue joint eQTL mapping with application to transcriptome-wide association studies

Transcriptome-wide association studies based on genetically predicted gene expression have the potential to identify novel regions associated with various complex traits. It has been shown that incorporating expression quantitative trait loci (eQTLs) corresponding to multiple tissue types can improve power for association studies involving complex etiology. In this article, we propose a new multivariate response linear regression model and method for predicting gene expression in multiple tissues simultaneously. Unlike existing methods for multi-tissue joint eQTL mapping, our approach incorporates tissue-tissue expression correlation, which allows us to more efficiently handle missing expression measurements and more accurately predict gene expression using a weighted summation of eQTL genotypes. We show through simulation studies that our approach performs better than the existing methods in many scenarios. We use our method to estimate eQTL weights for 29 tissues collected by GTEx, and show that our approach significantly improves expression prediction accuracy compared to competitors. Using our eQTL weights, we perform a multi-tissue-based S-MultiXcan transcriptome-wide association study and show that our method leads to more discoveries in novel regions and more discoveries overall than the existing methods. Estimated eQTL weights are available for download online at github.com/ajmolstad/MTeQTLResults.

preprint2020arXiv

Estimating Multiple Precision Matrices with Cluster Fusion Regularization

We propose a penalized likelihood framework for estimating multiple precision matrices from different classes. Most existing methods either incorporate no information on relationships between the precision matrices, or require this information be known a priori. The framework proposed in this article allows for simultaneous estimation of the precision matrices and relationships between the precision matrices, jointly. Sparse and non-sparse estimators are proposed, both of which require solving a non-convex optimization problem. To compute our proposed estimators, we use an iterative algorithm which alternates between a convex optimization problem solved by blockwise coordinate descent and a k-means clustering problem. Blockwise updates for computing the sparse estimator require solving an elastic net penalized precision matrix estimation problem, which we solve using a proximal gradient descent algorithm. We prove that this subalgorithm has a linear rate of convergence. In simulation studies and two real data applications, we show that our method can outperform competitors that ignore relevant relationships between precision matrices and performs similarly to methods which use prior information often uknown in practice.

preprint2016arXiv

A penalized likelihood method for classification with matrix-valued predictors

We propose a penalized likelihood method to fit the linear discriminant analysis model when the predictor is matrix valued. We simultaneously estimate the means and the precision matrix, which we assume has a Kronecker product decomposition. Our penalties encourage pairs of response category mean matrices to have equal entries and also encourage zeros in the precision matrix. To compute our estimators, we use a blockwise coordinate descent algorithm. To update the optimization variables corresponding to response category mean matrices, we use an alternating minimization algorithm that takes advantage of the Kronecker structure of the precision matrix. We show that our method can outperform relevant competitors in classification, even when our modeling assumptions are violated. We analyze an EEG dataset to demonstrate our method's interpretability and classification accuracy.

preprint2015arXiv

Indirect multivariate response linear regression

We propose a new class of estimators of the multivariate response linear regression coefficient matrix that exploits the assumption that the response and predictors have a joint multivariate Normal distribution. This allows us to indirectly estimate the regression coefficient matrix through shrinkage estimation of the parameters of the inverse regression, or the conditional distribution of the predictors given the responses. We establish a convergence rate bound for estimators in our class and we study two examples. The first example estimator exploits an assumption that the inverse regression's coefficient matrix is sparse. The second example estimator exploits an assumption that the inverse regression's coefficient matrix is rank deficient. These estimators do not require the popular assumption that the forward regression coefficient matrix is sparse or has small Frobenius norm. Using simulation studies, we show that our example estimators outperform relevant competitors for some data generating models.

Aaron J. Molstad

What is connected

Connect this record

See the researcher in context

Building this map preview

8 published item(s)

A likelihood-based approach for multivariate categorical response regression in high dimensions

Mixed-type multivariate response regression with covariance estimation

New insights for the multivariate square-root lasso

Scalable algorithms for semiparametric accelerated failure time models in high dimensions

A covariance-enhanced approach to multi-tissue joint eQTL mapping with application to transcriptome-wide association studies

Estimating Multiple Precision Matrices with Cluster Fusion Regularization

A penalized likelihood method for classification with matrix-valued predictors

Indirect multivariate response linear regression