Source author record

Menggang Yu

Menggang Yu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Methodology Applications Machine Learning

Catalog footprint

What is connected

6works

3topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Entropy Balancing for Causal Generalization with Target Sample Summary Information

In this paper, we focus on estimating the average treatment effect (ATE) of a target population when individual-level data from a source population and summary-level data (e.g., first or second moments of certain covariates) from the target population are available. In the presence of heterogeneous treatment effect, the ATE of the target population can be different from that of the source population when distributions of treatment effect modifiers are dissimilar in these two populations, a phenomenon also known as covariate shift. Many methods have been developed to adjust for covariate shift, but most require individual covariates from a representative target sample. We develop a weighting approach based on summary-level information from the target sample to adjust for possible covariate shift in effect modifiers. In particular, weights of the treated and control groups within a source sample are calibrated by the summary-level information of the target sample. Our approach also seeks additional covariate balance between the treated and control groups in the source sample. We study the asymptotic behavior of the corresponding weighted estimator for the target population ATE under a wide range of conditions. The theoretical implications are confirmed in simulation studies and a real data application.

preprint2022arXiv

Non-convex SVM for cancer diagnosis based on morphologic features of tumor microenvironment

The surroundings of a cancerous tumor impact how it grows and develops in humans. New data from early breast cancer patients contains information on the collagen fibers surrounding the tumorous tissue -- offering hope of finding additional biomarkers for diagnosis and prognosis -- but poses two challenges for typical analysis. Each image section contains information on hundreds of fibers, and each tissue has multiple image sections contributing to a single prediction of tumor vs. non-tumor. This nested relationship of fibers within image spots within tissue samples requires a specialized analysis approach. We devise a novel support vector machine (SVM)-based predictive algorithm for this data structure. By treating the collection of fibers as a probability distribution, we can measure similarities between the collections through a flexible kernel approach. By assuming the relationship of tumor status between image sections and tissue samples, the constructed SVM problem is non-convex and traditional algorithms can not be applied. We propose two algorithms that exchange computational accuracy and efficiency to manage data of all sizes. The predictive performance of both algorithms is evaluated on the collagen fiber data set and additional simulation scenarios. We offer reproducible implementations of both algorithms of this approach in the R package mildsvm.

preprint2022arXiv

Policy Learning for Optimal Individualized Dose Intervals

We study the problem of learning individualized dose intervals using observational data. There are very few previous works for policy learning with continuous treatment, and all of them focused on recommending an optimal dose rather than an optimal dose interval. In this paper, we propose a new method to estimate such an optimal dose interval, named probability dose interval (PDI). The potential outcomes for doses in the PDI are guaranteed better than a pre-specified threshold with a given probability (e.g., 50%). The associated nonconvex optimization problem can be efficiently solved by the Difference-of-Convex functions (DC) algorithm. We prove that our estimated policy is consistent, and its risk converges to that of the best-in-class policy at a root-n rate. Numerical simulations show the advantage of the proposed method over outcome modeling based benchmarks. We further demonstrate the performance of our method in determining individualized Hemoglobin A1c (HbA1c) control intervals for elderly patients with diabetes.

preprint2021arXiv

Studentized Permutation Method for Comparing Restricted Mean Survival Times with Small Sample from Randomized Trials

Recent observations, especially in cancer immunotherapy clinical trials with time-to-event outcomes, show that the commonly used proportial hazard assumption is often not justifiable, hampering an appropriate analyse of the data by hazard ratios. An attractive alternative advocated is given by the restricted mean survival time (RMST), which does not rely on any model assumption and can always be interpreted intuitively. As pointed out recently by Horiguchi and Uno (2020), methods for the RMST based on asymptotic theory suffer from inflated type-I error under small sample sizes. To overcome this problem, they suggested a permutation strategy leading to more convincing results in simulations. However, their proposal requires an exchangeable data set-up between comparison groups which may be limiting in practice. In addition, it is not possible to invert their testing procedure to obtain valid confidence intervals, which can provide more in-depth information. In this paper, we address these limitations by proposing a studentized permutation test as well as the corresponding permutation-based confidence intervals. In our extensive simulation study, we demonstrate the advantage of our new method, especially in situations with relative small sample sizes and unbalanced groups. Finally we illustrate the application of the proposed method by re-analysing data from a recent lung cancer clinical trial.

preprint2020arXiv

A Semiparametric Approach to Model Effect Modification

One fundamental statistical question for research areas such as precision medicine and health disparity is about discovering effect modification of treatment or exposure by observed covariates. We propose a semiparametric framework for identifying such effect modification. Instead of using the traditional outcome models, we directly posit semiparametric models on contrasts, or expected differences of the outcome under different treatment choices or exposures. Through semiparametric estimation theory, all valid estimating equations, including the efficient scores, are derived. Besides doubly robust loss functions, our approach also enables dimension reduction in presence of many covariates. The asymptotic and non-asymptotic properties of the proposed methods are explored via a unified statistical and algorithmic analysis. Comparison with existing methods in both simulation and real data analysis demonstrates the superiority of our estimators especially for an efficiency improved version.

preprint2013arXiv

Group variable selection via convex Log-Exp-Sum penalty with application to a breast cancer survivor study

In many scientific and engineering applications, covariates are naturally grouped. When the group structures are available among covariates, people are usually interested in identifying both important groups and important variables within the selected groups. Among existing successful group variable selection methods, some methods fail to conduct the within group selection. Some methods are able to conduct both group and within group selection, but the corresponding objective functions are non-convex. Such a non-convexity may require extra numerical effort. In this paper, we propose a novel Log-Exp-Sum(LES) penalty for group variable selection. The LES penalty is strictly convex. It can identify important groups as well as select important variables within the group. We develop an efficient group-level coordinate descent algorithm to fit the model. We also derive non-asymptotic error bounds and asymptotic group selection consistency for our method in the high-dimensional setting where the number of covariates can be much larger than the sample size. Numerical results demonstrate the good performance of our method in both variable selection and prediction. We applied the proposed method to an American Cancer Society breast cancer survivor dataset. The findings are clinically meaningful and lead immediately to testable clinical hypotheses.

Menggang Yu

What is connected

Connect this record

See the researcher in context

Building this map preview

6 published item(s)

Entropy Balancing for Causal Generalization with Target Sample Summary Information

Non-convex SVM for cancer diagnosis based on morphologic features of tumor microenvironment

Policy Learning for Optimal Individualized Dose Intervals

Studentized Permutation Method for Comparing Restricted Mean Survival Times with Small Sample from Randomized Trials

A Semiparametric Approach to Model Effect Modification

Group variable selection via convex Log-Exp-Sum penalty with application to a breast cancer survivor study