Source author record

Yajuan Si

Yajuan Si appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Methodology Applications Computation

Catalog footprint

What is connected

10works

3topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Multilevel Regression and Poststratification Interface: An Application to Track Community-level COVID-19 Viral Transmission

We present a novel Bayesian workflow for multilevel regression and poststratification (MRP), introducing extensions to time-varying data and granular geography and publicly available open-source computation tools, facilitating broad research adoption and reproducibility. In the absence of comprehensive or random testing throughout the COVID-19 pandemic, we have developed a proxy method for synthetic random sampling to estimate community-level viral incidence, based on viral RNA testing of asymptomatic patients who present for elective procedures within a hospital system. The approach collects routine testing data on SARS-CoV-2 exposure among outpatients and performs statistical adjustments of sample representation using MRP, a procedure that adjusts for nonrepresentativeness of the sample and yields stable small group estimates. We illustrate the MRP interface with an application to track community-level COVID-19 viral transmission in the state of Michigan.

preprint2022arXiv

A Case Study of Nonresponse Bias Analysis In Educational Assessment Surveys

Nonresponse bias is a widely prevalent problem for data on education. We develop a ten-step exemplar to guide nonresponse bias analysis (NRBA) in cross-sectional studies and apply these steps to the Early Childhood Longitudinal Study, Kindergarten Class of 2010-11. A key step is the construction of indices of nonresponse bias based on proxy pattern-mixture models for survey variables of interest. A novel feature is to characterize the strength of evidence about nonresponse bias contained in these indices, based on the strength of the relationship between the characteristics in the nonresponse adjustment and the key survey variables. Our NRBA improves existing methods by incorporating both missing at random and missing not at random mechanisms, and all analyses can be done straightforwardly with standard statistical software.

preprint2022arXiv

Beyond Vaccination Rates: A Synthetic Random Proxy Metric of Total SARS-CoV-2 Immunity Seroprevalence in the Community

Explicit knowledge of total community-level immune seroprevalence is critical to developing policies to mitigate the social and clinical impact of SARS-CoV-2. Publicly available vaccination data are frequently cited as a proxy for population immunity, but this metric ignores the effects of naturally-acquired immunity, which varies broadly throughout the country and world. Without broad or random sampling of the population, accurate measurement of persistent immunity post natural infection is generally unavailable. To enable tracking of both naturally-acquired and vaccine-induced immunity, we set up a synthetic random proxy based on routine hospital testing for estimating total Immunoglobulin G (IgG) prevalence in the sampled community. Our approach analyzes viral IgG testing data of asymptomatic patients who present for elective procedures within a hospital system. We apply multilevel regression and poststratification to adjust for demographic and geographic discrepancies between the sample and the community population. We then apply state-based vaccination data to categorize immune status as driven by natural infection or by vaccine. We have validated the model using verified clinical metrics of viral and symptomatic disease incidence to show the expected biological correlation of these entities with the timing, rate, and magnitude of seroprevalence. In mid-July 2021, the estimated immunity level was 74% with the administered vaccination rate of 45% in the two counties. The metric improves real-time understanding of immunity to COVID-19 as it evolves and the coordination of policy responses to the disease, toward an inexpensive and easily operational surveillance system that transcends the limits of vaccination datasets alone.

preprint2020arXiv

Bayesian hierarchical weighting adjustment and survey inference

We combine Bayesian prediction and weighted inference as a unified approach to survey inference. The general principles of Bayesian analysis imply that models for survey outcomes should be conditional on all variables that affect the probability of inclusion. We incorporate the weighting variables under the framework of multilevel regression and poststratification, as a byproduct generating model-based weights after smoothing. We investigate deep interactions and introduce structured prior distributions for smoothing and stability of estimates. The computation is done via Stan and implemented in the open source R package "rstanarm" ready for public use. Simulation studies illustrate that model-based prediction and weighting inference outperform classical weighting. We apply the proposal to the New York Longitudinal Study of Wellbeing. The new approach generates robust weights and increases efficiency for finite population inference, especially for subsets of the population.

preprint2020arXiv

Bayesian Profiling Multiple Imputation for Missing Electronic Health Records

Electronic health records (EHRs) are increasingly used for clinical and comparative effectiveness research, but suffer from missing data. Motivated by health services research on diabetes care, we seek to increase the quality of EHRs by focusing on missing values of longitudinal glycosylated hemoglobin (A1c), a key risk factor for diabetes complications and adverse events. Under the framework of multiple imputation (MI), we propose an individualized Bayesian latent profiling approach to capture A1c measurement trajectories subject to missingness. The proposed method is applied to EHRs of adult patients with diabetes in a large academic Midwestern health system between 2003 and 2013 and had Medicare A and B coverage. We combine MI inferences to evaluate the association of A1c levels with the incidence of acute adverse health events and examine patient heterogeneity across identified patient profiles. We investigate different missingness mechanisms and perform imputation diagnostics. Our approach is computationally efficient and fits flexible models that provide useful clinical insights.

preprint2019arXiv

Bayes-raking: Bayesian Finite Population Inference with Known Margins

Raking is widely used in categorical data modeling and survey practice but faced with methodological and computational challenges. We develop a Bayesian paradigm for raking by incorporating the marginal constraints as a prior distribution via two main strategies: 1) constructing the solution subspaces via basis functions or projection matrix and 2) modeling soft constraints. The proposed Bayes-raking estimation integrates the models for the margins, the sample selection and response mechanism, and the outcome, with the capability to propagate all sources of uncertainty. Computation is done via Stan, and codes are ready for public use. Simulation studies show that Bayes-raking can perform as well as raking with large samples and outperform in terms of validity and efficiency gains, especially with a sparse contingency table or dependent raking factors. We apply the new method to the Longitudinal Study of Wellbeing study and demonstrate that model-based approaches significantly improve inferential reliability and substantive findings as a unified survey inference framework.

preprint2017arXiv

Bayesian Inference under Cluster Sampling with Probability Proportional to Size

Cluster sampling is common in survey practice, and the corresponding inference has been predominantly design-based. We develop a Bayesian framework for cluster sampling and account for the design effect in the outcome modeling. We consider a two-stage cluster sampling design where the clusters are first selected with probability proportional to cluster size, and then units are randomly sampled inside selected clusters. Challenges arise when the sizes of nonsampled cluster are unknown. We propose nonparametric and parametric Bayesian approaches for predicting the unknown cluster sizes, with this inference performed simultaneously with the model for survey outcome. Simulation studies show that the integrated Bayesian approach outperforms classical methods with efficiency gains. We use Stan for computing and apply the proposal to the Fragile Families and Child Wellbeing study as an illustration of complex survey inference in health surveys.

preprint2015arXiv

Bayesian Latent Pattern Mixture Models for Handling Attrition in Panel Studies With Refreshment Samples

Many panel studies collect refreshment samples---new, randomly sampled respondents who complete the questionnaire at the same time as a subsequent wave of the panel. With appropriate modeling, these samples can be leveraged to correct inferences for biases caused by non-ignorable attrition. We present such a model when the panel includes many categorical survey variables. The model relies on a Bayesian latent pattern mixture model, in which an indicator for attrition and the survey variables are modeled jointly via a latent class model. We allow the multinomial probabilities within classes to depend on the attrition indicator, which offers additional flexibility over standard applications of latent class models. We present results of simulation studies that illustrate the benefits of this flexibility. We apply the model to correct attrition bias in an analysis of data from the 2007-2008 Associated Press/Yahoo News election panel study.

preprint2015arXiv

Bayesian Nonparametric Weighted Sampling Inference

It has historically been a challenge to perform Bayesian inference in a design-based survey context. The present paper develops a Bayesian model for sampling inference in the presence of inverse-probability weights. We use a hierarchical approach in which we model the distribution of the weights of the nonsampled units in the population and simultaneously include them as predictors in a nonparametric Gaussian process regression. We use simulation studies to evaluate the performance of our procedure and compare it to the classical design-based estimator. We apply our method to the Fragile Family and Child Wellbeing Study. Our studies find the Bayesian nonparametric finite population estimator to be more robust than the classical design-based estimator without loss in efficiency, which works because we induce regularization for small cells and thus this is a way of automatically smoothing the highly variable weights.

preprint2013arXiv

Handling Attrition in Longitudinal Studies: The Case for Refreshment Samples

Panel studies typically suffer from attrition, which reduces sample size and can result in biased inferences. It is impossible to know whether or not the attrition causes bias from the observed panel data alone. Refreshment samples - new, randomly sampled respondents given the questionnaire at the same time as a subsequent wave of the panel - offer information that can be used to diagnose and adjust for bias due to attrition. We review and bolster the case for the use of refreshment samples in panel studies. We include examples of both a fully Bayesian approach for analyzing the concatenated panel and refreshment data, and a multiple imputation approach for analyzing only the original panel. For the latter, we document a positive bias in the usual multiple imputation variance estimator. We present models appropriate for three waves and two refreshment samples, including nonterminal attrition. We illustrate the three-wave analysis using the 2007-2008 Associated Press-Yahoo! News Election Poll.

Yajuan Si

What is connected

Connect this record

See the researcher in context

Building this map preview

10 published item(s)

Multilevel Regression and Poststratification Interface: An Application to Track Community-level COVID-19 Viral Transmission

A Case Study of Nonresponse Bias Analysis In Educational Assessment Surveys

Beyond Vaccination Rates: A Synthetic Random Proxy Metric of Total SARS-CoV-2 Immunity Seroprevalence in the Community

Bayesian hierarchical weighting adjustment and survey inference

Bayesian Profiling Multiple Imputation for Missing Electronic Health Records

Bayes-raking: Bayesian Finite Population Inference with Known Margins

Bayesian Inference under Cluster Sampling with Probability Proportional to Size

Bayesian Latent Pattern Mixture Models for Handling Attrition in Panel Studies With Refreshment Samples

Bayesian Nonparametric Weighted Sampling Inference

Handling Attrition in Longitudinal Studies: The Case for Refreshment Samples