Source author record

Lexin Li

Lexin Li appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Methodology math.ST Statistics Theory Machine Learning Computation Applications eess.IV

Catalog footprint

What is connected

15works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Fairness-aware kidney exchange and kidney paired donation

The kidney paired donation (KPD) program provides an innovative solution to overcome incompatibility challenges in kidney transplants by matching incompatible donor-patient pairs and facilitating kidney exchanges. To address unequal access to transplant opportunities, there are two widely used fairness criteria: group fairness and individual fairness. However, these criteria do not consider protected patient features, which refer to characteristics legally or ethically recognized as needing protection from discrimination, such as race and gender. Motivated by the calibration principle in machine learning, we introduce a new fairness criterion: the matching outcome should be conditionally independent of the protected feature, given the sensitization level. We integrate this fairness criterion as a constraint within the KPD optimization framework and propose a computationally efficient solution using linearization strategies and column-generation methods. Theoretically, we analyze the associated price of fairness using random graph models. Empirically, we compare our fairness criterion with group fairness and individual fairness through both simulations and a real-data example.

preprint2022arXiv

Image Response Regression via Deep Neural Networks

Delineating the associations between images and a vector of covariates is of central interest in medical imaging studies. To tackle this problem of image response regression, we propose a novel nonparametric approach in the framework of spatially varying coefficient models, where the spatially varying functions are estimated through deep neural networks. Compared to existing solutions, the proposed method explicitly accounts for spatial smoothness and subject heterogeneity, has straightforward interpretations, and is highly flexible and accurate in capturing complex association patterns. A key idea in our approach is to treat the image voxels as the effective samples, which not only alleviates the limited sample size issue that haunts the majority of medical imaging studies, but also leads to more robust and reproducible results. Focusing on a broad family of piecewise smooth functions, we establish the estimation and selection consistency, and derive the asymptotic error bounds. We demonstrate the efficacy of the method through intensive simulations, and further illustrate its advantages with analyses of two functional magnetic resonance imaging datasets.

preprint2022arXiv

Kernel Knockoffs Selection for Nonparametric Additive Models

Thanks to its fine balance between model flexibility and interpretability, the nonparametric additive model has been widely used, and variable selection for this type of model has been frequently studied. However, none of the existing solutions can control the false discovery rate (FDR) unless the sample size tends to infinity. The knockoff framework is a recent proposal that can address this issue, but few knockoff solutions are directly applicable to nonparametric models. In this article, we propose a novel kernel knockoffs selection procedure for the nonparametric additive model. We integrate three key components: the knockoffs, the subsampling for stability, and the random feature mapping for nonparametric function approximation. We show that the proposed method is guaranteed to control the FDR for any sample size, and achieves a power that approaches one as the sample size tends to infinity. We demonstrate the efficacy of our method through intensive simulations and comparisons with the alternative solutions. Our proposal thus makes useful contributions to the methodology of nonparametric variable selection, FDR-based inference, as well as knockoffs.

preprint2022arXiv

Sliced Inverse Regression in Metric Spaces

In this article, we propose a general nonlinear sufficient dimension reduction (SDR) framework when both the predictor and response lie in some general metric spaces. We construct reproducing kernel Hilbert spaces whose kernels are fully determined by the distance functions of the metric spaces, then leverage the inherent structures of these spaces to define a nonlinear SDR framework. We adapt the classical sliced inverse regression of \citet{Li:1991} within this framework for the metric space data. We build the estimator based on the corresponding linear operators, and show it recovers the regression information unbiasedly. We derive the estimator at both the operator level and under a coordinate system, and also establish its convergence rate. We illustrate the proposed method with both synthetic and real datasets exhibiting non-Euclidean geometry.

preprint2021arXiv

Testing Mediation Effects Using Logic of Boolean Matrices

Mediation analysis is becoming an increasingly important tool in scientific studies. A central question in high-dimensional mediation analysis is to infer the significance of individual mediators. The main challenge is the sheer number of possible paths that go through all combinations of mediators. Most existing mediation inference solutions either explicitly impose that the mediators are conditionally independent given the exposure, or ignore any potential directed paths among the mediators. In this article, we propose a novel hypothesis testing procedure to evaluate individual mediation effects, while taking into account potential interactions among the mediators. Our proposal thus fills a crucial gap, and greatly extends the scope of existing mediation tests. Our key idea is to construct the test statistic using the logic of Boolean matrices, which enables us to establish the proper limiting distribution under the null hypothesis. We further employ screening, data splitting, and decorrelated estimation to reduce the bias and increase the power of the test. We show our test can control both the size and false discovery rate asymptotically, and the power of the test approaches one, meanwhile allowing the number of mediators to diverge to infinity with the sample size. We demonstrate the efficacy of our method through both simulations and a neuroimaging study of Alzheimer's disease.

preprint2020arXiv

Learning from Binary Multiway Data: Probabilistic Tensor Decomposition and its Statistical Optimality

We consider the problem of decomposing a higher-order tensor with binary entries. Such data problems arise frequently in applications such as neuroimaging, recommendation system, topic modeling, and sensor network localization. We propose a multilinear Bernoulli model, develop a rank-constrained likelihood-based estimation method, and obtain the theoretical accuracy guarantees. In contrast to continuous-valued problems, the binary tensor problem exhibits an interesting phase transition phenomenon according to the signal-to-noise ratio. The error bound for the parameter tensor estimation is established, and we show that the obtained rate is minimax optimal under the considered model. Furthermore, we develop an alternating optimization algorithm with convergence guarantees. The efficacy of our approach is demonstrated through both simulations and analyses of multiple data sets on the tasks of tensor completion and clustering.

preprint2020arXiv

Statistical Inference for High-Dimensional Vector Autoregression with Measurement Error

High-dimensional vector autoregression with measurement error is frequently encountered in a large variety of scientific and business applications. In this article, we study statistical inference of the transition matrix under this model. While there has been a large body of literature studying sparse estimation of the transition matrix, there is a paucity of inference solutions, especially in the high-dimensional scenario. We develop inferential procedures for both the global and simultaneous testing of the transition matrix. We first develop a new sparse expectation-maximization algorithm to estimate the model parameters, and carefully characterize their estimation precisions. We then construct a Gaussian matrix, after proper bias and variance corrections, from which we derive the test statistics. Finally, we develop the testing procedures and establish their asymptotic guarantees. We study the finite-sample performance of our tests through intensive simulations, and illustrate with a brain connectivity analysis example.

preprint2015arXiv

Hypothesis Testing of Matrix Graph Model with Application to Brain Connectivity Analysis

Brain connectivity analysis is now at the foreground of neuroscience research. A connectivity network is characterized by a graph, where nodes represent neural elements such as neurons and brain regions, and links represent statistical dependences that are often encoded in terms of partial correlations. Such a graph is inferred from matrix-valued neuroimaging data such as electroencephalography and functional magnetic resonance imaging. There have been a good number of successful proposals for sparse precision matrix estimation under normal or matrix normal distribution; however, this family of solutions do not offer a statistical significance quantification for the estimated links. In this article, we adopt a matrix normal distribution framework and formulate the brain connectivity analysis as a precision matrix hypothesis testing problem. Based on the separable spatial-temporal dependence structure, we develop oracle and data-driven procedures to test the global hypothesis that all spatial locations are conditionally independent, which are shown to be particularly powerful against the sparse alternatives. In addition, simultaneous tests for identifying conditional dependent spatial locations with false discovery rate control are proposed in both oracle and data-driven settings. Theoretical results show that the data-driven procedures perform asymptotically as well as the oracle procedures and enjoy certain optimality properties. The empirical finite-sample performance of the proposed tests is studied via simulations, and the new tests are applied on a real electroencephalography data analysis.

preprint2015arXiv

Parsimonious Tensor Response Regression

Aiming at abundant scientific and engineering data with not only high dimensionality but also complex structure, we study the regression problem with a multidimensional array (tensor) response and a vector predictor. Applications include, among others, comparing tensor images across groups after adjusting for additional covariates, which is of central interest in neuroimaging analysis. We propose parsimonious tensor response regression adopting a generalized sparsity principle. It models all voxels of the tensor response jointly, while accounting for the inherent structural information among the voxels. It effectively reduces the number of free parameters, leading to feasible computation and improved interpretation. We achieve model estimation through a nascent technique called the envelope method, which identifies the immaterial information and focuses the estimation based upon the material information in the tensor response. We demonstrate that the resulting estimator is asymptotically efficient, and it enjoys a competitive finite sample performance. We also illustrate the new method on two real neuroimaging studies.

preprint2014arXiv

Tensor Generalized Estimating Equations for Longitudinal Imaging Analysis

In an increasing number of neuroimaging studies, brain images, which are in the form of multidimensional arrays (tensors), have been collected on multiple subjects at multiple time points. Of scientific interest is to analyze such massive and complex longitudinal images to diagnose neurodegenerative disorders and to identify disease relevant brain regions. In this article, we treat those problems in a unifying regression framework with image predictors, and propose tensor generalized estimating equations (GEE) for longitudinal imaging analysis. The GEE approach takes into account intra-subject correlation of responses, whereas a low rank tensor decomposition of the coefficient array enables effective estimation and prediction with limited sample size. We propose an efficient estimation algorithm, study the asymptotics in both fixed $p$ and diverging $p$ regimes, and also investigate tensor GEE with regularization that is particularly useful for region selection. The efficacy of the proposed tensor GEE is demonstrated on both simulated data and a real data set from the Alzheimer's Disease Neuroimaging Initiative (ADNI).

preprint2013arXiv

High-dimensional influence measure

Influence diagnosis is important since presence of influential observations could lead to distorted analysis and misleading interpretations. For high-dimensional data, it is particularly so, as the increased dimensionality and complexity may amplify both the chance of an observation being influential, and its potential impact on the analysis. In this article, we propose a novel high-dimensional influence measure for regressions with the number of predictors far exceeding the sample size. Our proposal can be viewed as a high-dimensional counterpart to the classical Cook's distance. However, whereas the Cook's distance quantifies the individual observation's influence on the least squares regression coefficient estimate, our new diagnosis measure captures the influence on the marginal correlations, which in turn exerts serious influence on downstream analysis including coefficient estimation, variable selection and screening. Moreover, we establish the asymptotic distribution of the proposed influence measure by letting the predictor dimension go to infinity. Availability of this asymptotic distribution leads to a principled rule to determine the critical value for influential observation detection. Both simulations and real data analysis demonstrate usefulness of the new influence diagnosis measure.

preprint2013arXiv

Tucker Tensor Regression and Neuroimaging Analysis

Large-scale neuroimaging studies have been collecting brain images of study individuals, which take the form of two-dimensional, three-dimensional, or higher dimensional arrays, also known as tensors. Addressing scientific questions arising from such data demands new regression models that take multidimensional arrays as covariates. Simply turning an image array into a long vector causes extremely high dimensionality that compromises classical regression methods, and, more seriously, destroys the inherent spatial structure of array data that possesses wealth of information. In this article, we propose a family of generalized linear tensor regression models based upon the Tucker decomposition of regression coefficient arrays. Effectively exploiting the low rank structure of tensor covariates brings the ultrahigh dimensionality to a manageable level that leads to efficient estimation. We demonstrate, both numerically that the new model could provide a sound recovery of even high rank signals, and asymptotically that the model is consistently estimating the best Tucker structure approximation to the full array model in the sense of Kullback-Liebler distance. The new model is also compared to a recently proposed tensor regression model that relies upon an alternative CANDECOMP/PARAFAC (CP) decomposition.

preprint2012arXiv

Principal support vector machines for linear and nonlinear sufficient dimension reduction

We introduce a principal support vector machine (PSVM) approach that can be used for both linear and nonlinear sufficient dimension reduction. The basic idea is to divide the response variables into slices and use a modified form of support vector machine to find the optimal hyperplanes that separate them. These optimal hyperplanes are then aligned by the principal components of their normal vectors. It is proved that the aligned normal vectors provide an unbiased, $\sqrt{n}$-consistent, and asymptotically normal estimator of the sufficient dimension reduction space. The method is then generalized to nonlinear sufficient dimension reduction using the reproducing kernel Hilbert space. In that context, the aligned normal vectors become functions and it is proved that they are unbiased in the sense that they are functions of the true nonlinear sufficient predictors. We compare PSVM with other sufficient dimension reduction methods by simulation and in real data analysis, and through both comparisons firmly establish its practical advantages.

preprint2012arXiv

Regularized Matrix Regression

Modern technologies are producing a wealth of data with complex structures. For instance, in two-dimensional digital imaging, flow cytometry, and electroencephalography, matrix type covariates frequently arise when measurements are obtained for each combination of two underlying variables. To address scientific questions arising from those data, new regression methods that take matrices as covariates are needed, and sparsity or other forms of regularization are crucial due to the ultrahigh dimensionality and complex structure of the matrix data. The popular lasso and related regularization methods hinge upon the sparsity of the true signal in terms of the number of its nonzero coefficients. However, for the matrix data, the true signal is often of, or can be well approximated by, a low rank structure. As such, the sparsity is frequently in the form of low rank of the matrix parameters, which may seriously violate the assumption of the classical lasso. In this article, we propose a class of regularized matrix regression methods based on spectral regularization. Highly efficient and scalable estimation algorithm is developed, and a degrees of freedom formula is derived to facilitate model selection along the regularization path. Superior performance of the proposed method is demonstrated on both synthetic and real examples.

preprint2012arXiv

Tensor Regression with Applications in Neuroimaging Data Analysis

Classical regression methods treat covariates as a vector and estimate a corresponding vector of regression coefficients. Modern applications in medical imaging generate covariates of more complex form such as multidimensional arrays (tensors). Traditional statistical and computational methods are proving insufficient for analysis of these high-throughput data due to their ultrahigh dimensionality as well as complex structure. In this article, we propose a new family of tensor regression models that efficiently exploit the special structure of tensor covariates. Under this framework, ultrahigh dimensionality is reduced to a manageable level, resulting in efficient estimation and prediction. A fast and highly scalable estimation algorithm is proposed for maximum likelihood estimation and its associated asymptotic properties are studied. Effectiveness of the new methods is demonstrated on both synthetic and real MRI imaging data.

Lexin Li

What is connected

Connect this record

See the researcher in context

Building this map preview

15 published item(s)

Fairness-aware kidney exchange and kidney paired donation

Image Response Regression via Deep Neural Networks

Kernel Knockoffs Selection for Nonparametric Additive Models

Sliced Inverse Regression in Metric Spaces

Testing Mediation Effects Using Logic of Boolean Matrices

Learning from Binary Multiway Data: Probabilistic Tensor Decomposition and its Statistical Optimality

Statistical Inference for High-Dimensional Vector Autoregression with Measurement Error

Hypothesis Testing of Matrix Graph Model with Application to Brain Connectivity Analysis

Parsimonious Tensor Response Regression

Tensor Generalized Estimating Equations for Longitudinal Imaging Analysis

High-dimensional influence measure

Tucker Tensor Regression and Neuroimaging Analysis

Principal support vector machines for linear and nonlinear sufficient dimension reduction

Regularized Matrix Regression

Tensor Regression with Applications in Neuroimaging Data Analysis