Researcher profile

Michael R. Elliott

Michael R. Elliott contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 19 - UnverifiedVerification L1Unclaimed author
5works
0followers
2topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

5 published item(s)

preprint2022arXiv

Robust and Efficient Bayesian Inference for Non-Probability Samples

The declining response rates in probability surveys along with the widespread availability of unstructured data has led to growing research into non-probability samples. Existing robust approaches are not well-developed for non-Gaussian outcomes and may perform poorly in presence of influential pseudo-weights. Furthermore, their variance estimator lacks a unified framework and rely often on asymptotic theory. To address these gaps, we propose an alternative Bayesian approach using a partially linear Gaussian process regression that utilizes a prediction model with a flexible function of the pseudo-inclusion probabilities to impute the outcome variable for the reference survey. By efficiency, we mean not only computational scalability but also superiority with respect to variance. We also show that Gaussian process regression behaves as a kernel matching technique based on the estimated propensity scores, which yields double robustness and lowers sensitivity to influential pseudo-weights. Using the simulated posterior predictive distribution, one can directly quantify the uncertainty of the proposed estimator and derive associated $95\%$ credible intervals. We assess the repeated sampling properties of our method in two simulation studies. The application of this study deals with modeling count data with varying exposures under a non-probability sample setting.

preprint2022arXiv

Robust Bayesian Inference for Big Data: Combining Sensor-based Records with Traditional Survey Data

Big Data often presents as massive non-probability samples. Not only is the selection mechanism often unknown, but larger data volume amplifies the relative contribution of selection bias to total error. Existing bias adjustment approaches assume that the conditional mean structures have been correctly specified for the selection indicator or key substantive measures. In the presence of a reference probability sample, these methods rely on a pseudo-likelihood method to account for the sampling weights of the reference sample, which is parametric in nature. Under a Bayesian framework, handling the sampling weights is an even bigger hurdle. To further protect against model misspecification, we expand the idea of double robustness such that more flexible non-parametric methods, as well as Bayesian models, can be used for prediction. In particular, we employ Bayesian additive regression trees, which not only capture non-linear associations automatically but permit direct quantification of the uncertainty of point estimates through its posterior predictive draws. We apply our method to sensor-based naturalistic driving data from the second Strategic Highway Research Program using the 2017 National Household Travel Survey as a benchmark.

preprint2022arXiv

Robust Model-based Inference for Non-Probability Samples

With the ubiquitous availability of unstructured data, growing attention is paid as how to adjust for selection bias in such non-probability samples. The majority of the robust estimators proposed by prior literature are either fully or partially design-based, which may lead to inefficient estimates if outlying (pseudo-)weights are present. In addition, correctly reflecting the uncertainty of the adjusted estimator remains a challenge when the available reference survey is complex in the sample design. This article proposes a fully model-based method for inference using non-probability samples where the goal is to predict the outcome variable for the entire population units. We employ a Bayesian bootstrap method with Rubin's combing rules to derive the adjusted point and interval estimates. Using Gaussian process regression, our method allows for kernel matching between the non-probability sample units and population units based on the estimated selection propensities when the outcome model is misspecified. The repeated sampling properties of our method are evaluated through two Monte Carlo simulation studies. Finally, we examine it on a real-world non-probability sample with the aim to estimate crash-attributed injury rates in different body regions in the United States.

preprint2022arXiv

Variance as a predictor of health outcomes: Subject-level trajectories and variability of sex hormones to predict body fat changes in peri- and post-menopausal women

Longitudinal biomarker data and cross-sectional outcomes are routinely collected in modern epidemiology studies, often with the goal of informing tailored early intervention decisions. For example, hormones such as estradiol and follicle-stimulating hormone may predict changes in womens' health during the midlife. Most existing methods focus on constructing predictors from mean marker trajectories. However, subject-level biomarker variability may also provide critical information about disease risks and health outcomes. In this paper, we develop a joint model that estimates subject-level means and variances of longitudinal biomarkers to predict a cross-sectional health outcome. Simulations demonstrate excellent recovery of true model parameters. The proposed method provides less biased and more efficient estimates, relative to alternative approaches that either ignore subject-level differences in variances or perform two-stage estimation where estimated marker variances are treated as observed. Analyses of women's health data reveal larger variability of E2 or larger variability of FSH were associated with higher levels of fat mass change and higher levels of lean mass change across the menopausal transition.

preprint2020arXiv

Using Multiple Imputation to Classify Potential Outcomes Subgroups

With medical tests becoming increasingly available, concerns about over-testing and over-treatment dramatically increase. Hence, it is important to understand the influence of testing on treatment selection in general practice. Most statistical methods focus on average effects of testing on treatment decisions. However, this may be ill-advised, particularly for patient subgroups that tend not to benefit from such tests. Furthermore, missing data are common, representing large and often unaddressed threats to the validity of statistical methods. Finally, it is desirable to conduct analyses that can be interpreted causally. We propose to classify patients into four potential outcomes subgroups, defined by whether or not a patient's treatment selection is changed by the test result and by the direction of how the test result changes treatment selection. This subgroup classification naturally captures the differential influence of medical testing on treatment selections for different patients, which can suggest targets to improve the utilization of medical tests. We can then examine patient characteristics associated with patient potential outcomes subgroup memberships. We used multiple imputation methods to simultaneously impute the missing potential outcomes as well as regular missing values. This approach can also provide estimates of many traditional causal quantities. We find that explicitly incorporating causal inference assumptions into the multiple imputation process can improve the precision for some causal estimates of interest. We also find that bias can occur when the potential outcomes conditional independence assumption is violated; sensitivity analyses are proposed to assess the impact of this violation. We applied the proposed methods to examine the influence of 21-gene assay, the most commonly used genomic test, on chemotherapy selection among breast cancer patients.