Source author record

Donglin Zeng

Donglin Zeng appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Methodology

Catalog footprint

What is connected

10works

1topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Semiparametric Analysis of Interval-Censored Data Subject to Inaccurate Diagnoses with A Terminal Event

Interval-censoring frequently occurs in studies of chronic diseases where disease status is inferred from intermittently collected biomarkers. Although many methods have been developed to analyze such data, they typically assume perfect disease diagnosis, which often does not hold in practice due to the inherent imperfect clinical diagnosis of cognitive functions or measurement errors of biomarkers such as cerebrospinal fluid. In this work, we introduce a semiparametric modeling framework using the Cox proportional hazards model to address interval-censored data in the presence of inaccurate disease diagnosis. Our model incorporates sensitivity and specificity of the diagnosis to account for uncertainty in whether the interval truly contains the disease onset. Furthermore, the framework accommodates scenarios involving a terminal event and when diagnosis is accurate, such as through postmortem analysis. We propose a nonparametric maximum likelihood estimation method for inference and develop an efficient EM algorithm to ensure computational feasibility. The regression coefficient estimators are shown to be asymptotically normal, achieving semiparametric efficiency bounds. We further validate our approach through extensive simulation studies and an application assessing Alzheimer's disease (AD) risk. We find that amyloid-beta is significantly associated with AD, but Tau is predictive of both AD and mortality.

preprint2022arXiv

Analyzing Clustered Continuous Response Variables with Ordinal Regression Models

Continuous response variables often need to be transformed to meet regression modeling assumptions; however, finding the optimal transformation is challenging and results may vary with the choice of transformation. When a continuous response variable is measured repeatedly for a subject or the continuous responses arise from clusters, it is more challenging to model the continuous response data due to correlation within clusters. We extend a widely used ordinal regression model, the cumulative probability model (CPM), to fit clustered continuous response variables based on generalized estimating equation (GEE) methods for ordinal responses. With our approach, estimates of marginal parameters, cumulative distribution functions (CDFs), expectations, and quantiles conditional on covariates can be obtained without pre-transformation of the potentially skewed continuous response data. Computational challenges arise with large numbers of distinct values of the continuous response variable, and we propose two feasible and computationally efficient approaches to fit CPMs for clustered continuous response variables with different working correlation structures. We study finite sample operating characteristics of the estimators via simulation, and illustrate their implementation with two data examples. One studies predictors of CD4:CD8 ratios in an HIV study. The other uses data from The Lung Health Study to investigate the contribution of a single nucleotide polymorphism to lung function decline.

preprint2021arXiv

Evaluating Effectiveness of Public Health Intervention Strategies for Mitigating COVID-19 Pandemic

Coronavirus disease 2019 (COVID-19) pandemic is an unprecedented global public health challenge. In the United States (US), state governments have implemented various non-pharmaceutical interventions (NPIs), such as physical distance closure (lockdown), stay-at-home order, mandatory facial mask in public in response to the rapid spread of COVID-19. To evaluate the effectiveness of these NPIs, we propose a nested case-control design with propensity score weighting under the quasi-experiment framework to estimate the average intervention effect on disease transmission across states. We further develop a method to test for factors that moderate intervention effect to assist precision public health intervention. Our method takes account of the underlying dynamics of disease transmission and balance state-level pre-intervention characteristics. We prove that our estimator provides causal intervention effect under assumptions. We apply this method to analyze US COVID-19 incidence cases to estimate the effects of six interventions. We show that lockdown has the largest effect on reducing transmission and reopening bars significantly increase transmission. States with a higher percentage of non-white population are at greater risk of increased $R_t$ associated with reopening bars.

preprint2021arXiv

Improving trial generalizability using observational studies

Complementary features of randomized controlled trials (RCTs) and observational studies (OSs) can be used jointly to estimate the average treatment effect of a target population. We propose a calibration weighting estimator that enforces the covariate balance between the RCT and OS, therefore improving the trial-based estimator's generalizability. Exploiting semiparametric efficiency theory, we propose a doubly robust augmented calibration weighting estimator that achieves the efficiency bound derived under the identification assumptions. A nonparametric sieve method is provided as an alternative to the parametric approach, which enables the robust approximation of the nuisance functions and data-adaptive selection of outcome predictors for calibration. We establish asymptotic results and confirm the finite sample performances of the proposed estimators by simulation experiments and an application on the estimation of the treatment effect of adjuvant chemotherapy for early-stage non-small cell lung patients after surgery.

preprint2020arXiv

Assessing Impact of Unobserved Confounders with Sensitivity Index Probabilities through Pseudo-Experiments

Unobserved confounders are a long-standing issue in causal inference using propensity score methods. This study proposed nonparametric indices to quantify the impact of unobserved confounders through pseudo-experiments with an application to real-world data. The study finding suggests that the proposed indices can reflect the true impact of confounders. It is hoped that this study will lead to further discussion on this important issue and help move the science of causal inference forward.

preprint2016arXiv

Maximum Likelihood Estimation for Semiparametric Transformation Models with Interval-Censored Data

Interval censoring arises frequently in clinical, epidemiological, financial, and sociological studies, where the event or failure of interest is known only to occur within an interval induced by periodic monitoring. We formulate the effects of potentially time-dependent covariates on the interval-censored failure time through a broad class of semiparametric transformation models that encompasses proportional hazards and proportional odds models. We consider nonparametric maximum likelihood estimation for this class of models with an arbitrary number of monitoring times for each subject. We devise an EM-type algorithm that converges stably, even in the presence of time-dependent covariates, and show that the estimators for the regression parameters are consistent, asymptotically normal, and asymptotically efficient with an easily estimated covariance matrix. Finally, we demonstrate the performance of our procedures through extensive simulation studies and application to an HIV/AIDS study conducted in Thailand.

preprint2016arXiv

Robust Hybrid Learning for Estimating Personalized Dynamic Treatment Regimens

Dynamic treatment regimens (DTRs) are sequential decision rules tailored at each stage by potentially time-varying patient features and intermediate outcomes observed in previous stages. The complexity, patient heterogeneity and chronicity of many diseases and disorders call for learning optimal DTRs which best dynamically tailor treatment to each individual's response over time. Proliferation of personalized data (e.g., genetic and imaging data) provides opportunities for deep tailoring as well as new challenges for statistical methodology. In this work, we propose a robust hybrid approach referred as Augmented Multistage Outcome-Weighted Learning (AMOL) to integrate outcome-weighted learning and Q-learning to identify optimal DTRs from the Sequential Multiple Assignment Randomization Trials (SMARTs). We generalize outcome weighted learning (O-learning; Zhao et al.~2012) to allow for negative outcomes; we propose methods to reduce variability of weights in O-learning to achieve numeric stability and higher efficiency; finally, for multiple-stage SMART studies, we introduce doubly robust augmentation to machine learning based O-learning to improve efficiency by drawing information from regression model-based Q-learning at each stage. The proposed AMOL remains valid even if the Q-learning model is misspecified. We establish the theoretical properties of AMOL, including the consistency of the estimated rules and the rates of convergence to the optimal value function. The comparative advantage of AMOL over existing methods is demonstrated in extensive simulation studies and applications to two SMART data sets: a two-stage trial for attention deficit and hyperactive disorder (ADHD) and the STAR*D trial for major depressive disorder (MDD).

preprint2016arXiv

Semiparametric Regression Analysis of Interval-Censored Competing Risks Data

Interval-censored competing risks data arise when each study subject may experience an event or failure from one of several causes and the failure time is not observed exactly but rather known to lie in an interval between two successive examinations. We formulate the effects of possibly time-varying covariates on the cumulative incidence or sub-distribution function (i.e., the marginal probability of failure from a particular cause) of competing risks through a broad class of semiparametric regression models that captures both proportional and non-proportional hazards structures for the sub-distribution. We allow each subject to have an arbitrary number of examinations and accommodate missing information on the cause of failure. We consider nonparametric maximum likelihood estimation and devise a fast and stable EM-type algorithm for its computation. We then establish the consistency, asymptotic normality, and semiparametric efficiency of the resulting estimators by appealing to modern empirical process theory. In addition, we show through extensive simulation studies that the proposed methods perform well in realistic situations. Finally, we provide an application to a study on HIV-1 infection with different viral subtypes.

preprint2012arXiv

Efficient Semiparametric Estimation of Short-term and Long-term Hazard Ratios with Right-Censored Data

The proportional hazards assumption in the commonly used Cox model for censored failure time data is often violated in scientific studies. Yang and Prentice (2005) proposed a novel semiparametric two-sample model that includes the proportional hazards model and the proportional odds model as sub-models, and accommodates crossing survival curves. The model leaves the baseline hazard unspecified and the two model parameters can be interpreted as the short-term and long-term hazard ratios. Inference procedures were developed based on a pseudo score approach. Although extension to accommodate covariates was mentioned, no formal procedures have been provided or proved. Furthermore, the pseudo score approach may not be asymptotically efficient. We study the extension of the short-term and long-term hazard ratio model of Yang and Prentice (2005) to accommodate potentially time-dependent covariates. We develop efficient likelihood-based estimation and inference procedures. The nonparametric maximum likelihood estimators are shown to be consistent, asymptotically normal, and asymptotically efficient. Extensive simulation studies demonstrate that the proposed methods perform well in practical settings. The proposed method captured the phenomenon of crossing hazards in a cancer clinical trial and identified a genetic marker with significant long-term effect missed by using the proportional hazards model on age-at-onset of alcoholism in a genetic study.

preprint2011arXiv

Penalized Q-Learning for Dynamic Treatment Regimes

A dynamic treatment regime effectively incorporates both accrued information and long-term effects of treatment from specially designed clinical trials. As these become more and more popular in conjunction with longitudinal data from clinical studies, the development of statistical inference for optimal dynamic treatment regimes is a high priority. This is very challenging due to the difficulties arising form non-regularities in the treatment effect parameters. In this paper, we propose a new reinforcement learning framework called penalized Q-learning (PQ-learning), under which the non-regularities can be resolved and valid statistical inference established. We also propose a new statistical procedure---individual selection---and corresponding methods for incorporating individual selection within PQ-learning. Extensive numerical studies are presented which compare the proposed methods with existing methods, under a variety of non-regular scenarios, and demonstrate that the proposed approach is both inferentially and computationally superior. The proposed method is demonstrated with the data from a depression clinical trial study.

Donglin Zeng

What is connected

Connect this record

See the researcher in context

Building this map preview

10 published item(s)

Semiparametric Analysis of Interval-Censored Data Subject to Inaccurate Diagnoses with A Terminal Event

Analyzing Clustered Continuous Response Variables with Ordinal Regression Models

Evaluating Effectiveness of Public Health Intervention Strategies for Mitigating COVID-19 Pandemic

Improving trial generalizability using observational studies

Assessing Impact of Unobserved Confounders with Sensitivity Index Probabilities through Pseudo-Experiments

Maximum Likelihood Estimation for Semiparametric Transformation Models with Interval-Censored Data

Robust Hybrid Learning for Estimating Personalized Dynamic Treatment Regimens

Semiparametric Regression Analysis of Interval-Censored Competing Risks Data

Efficient Semiparametric Estimation of Short-term and Long-term Hazard Ratios with Right-Censored Data

Penalized Q-Learning for Dynamic Treatment Regimes