Source author record

Liangyuan Hu

Liangyuan Hu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Methodology Applications Computation

Catalog footprint

What is connected

5works

3topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

A flexible approach for causal inference with multiple treatments and clustered survival outcomes

When drawing causal inferences about the effects of multiple treatments on clustered survival outcomes using observational data, we need to address implications of the multilevel data structure, multiple treatments, censoring and unmeasured confounding for causal analyses. Few off-the-shelf causal inference tools are available to simultaneously tackle these issues. We develop a flexible random-intercept accelerated failure time model, in which we use Bayesian additive regression trees to capture arbitrarily complex relationships between censored survival times and pre-treatment covariates and use the random intercepts to capture cluster-specific main effects. We develop an efficient Markov chain Monte Carlo algorithm to draw posterior inferences about the population survival effects of multiple treatments and examine the variability in cluster-level effects. We further propose an interpretable sensitivity analysis approach to evaluate the sensitivity of drawn causal inferences about treatment effect to the potential magnitude of departure from the causal assumption of no unmeasured confounding. Expansive simulations empirically validate and demonstrate good practical operating characteristics of our proposed methods. Applying the proposed methods to a dataset on older high-risk localized prostate cancer patients drawn from the National Cancer Database, we evaluate the comparative effects of three treatment approaches on patient survival, and assess the ramifications of potential unmeasured confounding. The methods developed in this work are readily available in the $\textsf{R}$ package $\textsf{riAFTBART}$.

preprint2022arXiv

A flexible approach for variable selection in large-scale healthcare database studies with missing covariate and outcome data

Prior work has shown that combining bootstrap imputation with tree-based machine learning variable selection methods can provide good performances achievable on fully observed data when covariate and outcome data are missing at random (MAR). This approach however is computationally expensive, especially on large-scale datasets. We propose an inference-based method, called RR-BART, which leverages the likelihood-based Bayesian machine learning technique, Bayesian additive regression trees, and uses Rubin's rule to combine the estimates and variances of the variable importance measures on multiply imputed datasets for variable selection in the presence of MAR data. We conduct a representative simulation study to investigate the practical operating characteristics of RR-BART, and compare it with the bootstrap imputation based methods. We further demonstrate the methods via a case study of risk factors for 3-year incidence of metabolic syndrome among middle-aged women using data from the Study of Women's Health Across the Nation (SWAN). The simulation study suggests that even in complex conditions of nonlinearity and nonadditivity with a large percentage of missingness, RR-BART can reasonably recover both prediction and variable selection performances, achievable on the fully observed data. RR-BART provides the best performance that the bootstrap imputation based methods can achieve with the optimal selection threshold value. In addition, RR-BART demonstrates a substantially stronger ability of detecting discrete predictors. Furthermore, RR-BART offers substantial computational savings. When implemented on the SWAN data, RR-BART adds to the literature by selecting a set of predictors that had been less commonly identified as risk factors but had substantial biological justifications.

preprint2022arXiv

CIMTx: An R package for causal inference with multiple treatments using observational data

CIMTx provides efficient and unified functions to implement modern methods for causal inferences with multiple treatments using observational data with a focus on binary outcomes. The methods include regression adjustment, inverse probability of treatment weighting, Bayesian additive regression trees, regression adjustment with multivariate spline of the generalized propensity score, vector matching and targeted maximum likelihood estimation. In addition, CIMTx illustrates ways in which users can simulate data adhering to the complex data structures in the multiple treatment setting. Furthermore, the CIMTx package offers a unique set of features to address the key causal assumptions: positivity and ignorability. For the positivity assumption, CIMTx demonstrates techniques to identify the common support region for retaining inferential units using inverse probability of treatment weighting, Bayesian additive regression trees and vector matching}. To handle the ignorability assumption, CIMTx provides a flexible Monte Carlo sensitivity analysis approach to evaluate how causal conclusions would be altered in response to different magnitude of departure from ignorable treatment assignment.

preprint2020arXiv

Estimation of Causal Effects of Multiple Treatments in Observational Studies with a Binary Outcome

There is a dearth of robust methods to estimate the causal effects of multiple treatments when the outcome is binary. This paper uses two unique sets of simulations to propose and evaluate the use of Bayesian Additive Regression Trees (BART) in such settings. First, we compare BART to several approaches that have been proposed for continuous outcomes, including inverse probability of treatment weighting (IPTW), targeted maximum likelihood estimator (TMLE), vector matching and regression adjustment. Results suggest that under conditions of non-linearity and non-additivity of both the treatment assignment and outcome generating mechanisms, BART, TMLE and IPTW using generalized boosted models (GBM) provide better bias reduction and smaller root mean squared error. BART and TMLE provide more consistent 95 per cent CI coverage and better large-sample convergence property. Second, we supply BART with a strategy to identify a common support region for retaining inferential units and for avoiding extrapolating over areas of the covariate space where common support does not exist. BART retains more inferential units than the generalized propensity score based strategy, and shows lower bias, compared to TMLE or GBM, in a variety of scenarios differing by the degree of covariate overlap. A case study examining the effects of three surgical approaches for non-small cell lung cancer demonstrates the methods.

preprint2020arXiv

The Estimation of Causal Effects of Multiple Treatments in Observational Studies Using Bayesian Additive Regression Trees

There is currently a dearth of appropriate methods to estimate the causal effects of multiple treatments when the outcome is binary. For such settings, we propose the use of nonparametric Bayesian modeling, Bayesian Additive Regression Trees (BART). We conduct an extensive simulation study to compare BART to several existing, propensity score-based methods and to identify its operating characteristics when estimating average treatment effects on the treated. BART consistently demonstrates low bias and mean-squared errors. We illustrate the use of BART through a comparative effectiveness analysis of a large dataset, drawn from the latest SEER-Medicare linkage, on patients who were operated via robotic-assisted surgery, video-assisted thoratic surgery or open thoracotomy.

Liangyuan Hu

What is connected

Connect this record

See the researcher in context

Building this map preview

5 published item(s)

A flexible approach for causal inference with multiple treatments and clustered survival outcomes

A flexible approach for variable selection in large-scale healthcare database studies with missing covariate and outcome data

CIMTx: An R package for causal inference with multiple treatments using observational data

Estimation of Causal Effects of Multiple Treatments in Observational Studies with a Binary Outcome

The Estimation of Causal Effects of Multiple Treatments in Observational Studies Using Bayesian Additive Regression Trees