Researcher profile

Jared S. Murray

Jared S. Murray contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
8works
0followers
3topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

8 published item(s)

preprint2022arXiv

Adaptive Conditional Distribution Estimation with Bayesian Decision Tree Ensembles

We present a Bayesian nonparametric model for conditional distribution estimation using Bayesian additive regression trees (BART). The generative model we use is based on rejection sampling from a base model. Typical of BART models, our model is flexible, has a default prior specification, and is computationally convenient. To address the distinguished role of the response in the BART model we propose, we further introduce an approach to targeted smoothing which is possibly of independent interest for BART models. We study the proposed model theoretically and provide sufficient conditions for the posterior distribution to concentrate at close to the minimax optimal rate adaptively over smoothness classes in the high-dimensional regime in which many predictors are irrelevant. To fit our model we propose a data augmentation algorithm which allows for existing BART samplers to be extended with minimal effort. We illustrate the performance of our methodology on simulated data and use it to study the relationship between education and body mass index using data from the medical expenditure panel survey (MEPS).

preprint2022arXiv

Bayesian inference for treatment effects under nested subsets of controls

When constructing a model to estimate the causal effect of a treatment, it is necessary to control for other factors which may have confounding effects. Because the ignorability assumption is not testable, however, it is usually unclear which minimal set of controls is appropriate -- as is their appropriate functional form in the model -- and effect estimation can be sensitive to these choices. A common approach in this case is to fit several models, each with a different control specification (under the assumption that the available controls are sufficient but possibly not all necessary to deconfound the treatment effect), but it is difficult to reconcile inference for the treatment effect under the multiple resulting posterior distributions. Therefore we propose a two-stage approach to measure the sensitivity of effect estimation with respect to control specification. In the first stage, a model is fit with all available controls using a prior carefully selected to adjust for confounding. In the second stage, posterior distributions are calculated for the treatment effect under submodels of nested sets of controls using projected posteriors under the full model, providing valid Bayesian inference. We demonstrate how our approach can be used to detect influential confounders in a dataset, and apply it in a sensitivity analysis of an observational study measuring the effect of legalized abortion on crime rates.

preprint2020arXiv

A Bayesian Hierarchical Model for Evaluating Forensic Footwear Evidence

When a latent shoeprint is discovered at a crime scene, forensic analysts inspect it for distinctive patterns of wear such as scratches and holes (known as accidentals) on the source shoe's sole. If its accidentals correspond to those of a suspect's shoe, the print can be used as forensic evidence to place the suspect at the crime scene. The strength of this evidence depends on the random match probability---the chance that a shoe chosen at random would match the crime scene print's accidentals. Evaluating random match probabilities requires an accurate model for the spatial distribution of accidentals on shoe soles. A recent report by the President's Council of Advisors in Science and Technology criticized existing models in the literature, calling for new empirically validated techniques. We respond to this request with a new spatial point process model for accidental locations, developed within a hierarchical Bayesian framework. We treat the tread pattern of each shoe as a covariate, allowing us to pool information across large heterogeneous databases of shoes. Existing models ignore this information; our results show that including it leads to significantly better model fit. We demonstrate this by fitting our model to one such database.

preprint2020arXiv

Estimating heterogeneous effects of continuous exposures using Bayesian tree ensembles: revisiting the impact of abortion rates on crime

In estimating the causal effect of a continuous exposure or treatment, it is important to control for all confounding factors. However, most existing methods require parametric specification for how control variables influence the outcome or generalized propensity score, and inference on treatment effects is usually sensitive to this choice. Additionally, it is often the goal to estimate how the treatment effect varies across observed units. To address this gap, we propose a semiparametric model using Bayesian tree ensembles for estimating the causal effect of a continuous treatment of exposure which (i) does not require a priori parametric specification of the influence of control variables, and (ii) allows for identification of effect modification by pre-specified moderators. The main parametric assumption we make is that the effect of the exposure on the outcome is linear, with the steepness of this relationship determined by a nonparametric function of the moderators, and we provide heuristics to diagnose the validity of this assumption. We apply our methods to revisit a 2001 study of how abortion rates affect incidence of crime.

preprint2020arXiv

Invited Discussion of "A Unified Framework for De-Duplication and Population Size Estimation"

Invited Discussion of "A Unified Framework for De-Duplication and Population Size Estimation", published in Bayesian Analysis. My discussion focuses on two main themes: Providing a more nuanced picture of the costs and benefits of joint models for record linkage and the "downstream task" (i.e. whatever we might want to do with the linked and de-duplicated files), and how we should measure performance.

preprint2020arXiv

Model interpretation through lower-dimensional posterior summarization

Nonparametric regression models have recently surged in their power and popularity, accompanying the trend of increasing dataset size and complexity. While these models have proven their predictive ability in empirical settings, they are often difficult to interpret and do not address the underlying inferential goals of the analyst or decision maker. In this paper, we propose a modular two-stage approach for creating parsimonious, interpretable summaries of complex models which allow freedom in the choice of modeling technique and the inferential target. In the first stage a flexible model is fit which is believed to be as accurate as possible. In the second stage, lower-dimensional summaries are constructed by projecting draws from the distribution onto simpler structures. These summaries naturally come with valid Bayesian uncertainty estimates. Further, since we use the data only once to move from prior to posterior, these uncertainty estimates remain valid across multiple summaries and after iteratively refining a summary. We apply our method and demonstrate its strengths across a range of simulated and real datasets. Code to reproduce the examples shown is avaiable at github.com/spencerwoody/ghost

preprint2020arXiv

Scaling Bayesian Probabilistic Record Linkage with Post-Hoc Blocking: An Application to the California Great Registers

Probabilistic record linkage (PRL) is the process of determining which records in two databases correspond to the same underlying entity in the absence of a unique identifier. Bayesian solutions to this problem provide a powerful mechanism for propagating uncertainty due to uncertain links between records (via the posterior distribution). However, computational considerations severely limit the practical applicability of existing Bayesian approaches. We propose a new computational approach, providing both a fast algorithm for deriving point estimates of the linkage structure that properly account for one-to-one matching and a restricted MCMC algorithm that samples from an approximate posterior distribution. Our advances make it possible to perform Bayesian PRL for larger problems, and to assess the sensitivity of results to varying prior specifications. We demonstrate the methods on a subset of an OCR'd dataset, the California Great Registers, a collection of 57 million voter registrations from 1900 to 1968 that comprise the only panel data set of party registration collected before the advent of scientific surveys.

preprint2020arXiv

Targeted Smooth Bayesian Causal Forests: An analysis of heterogeneous treatment effects for simultaneous versus interval medical abortion regimens over gestation

We introduce Targeted Smooth Bayesian Causal Forests (tsBCF), a nonparametric Bayesian approach for estimating heterogeneous treatment effects which vary smoothly over a single covariate in the observational data setting. The tsBCF method induces smoothness by parameterizing terminal tree nodes with smooth functions, and allows for separate regularization of treatment effects versus prognostic effect of control covariates. Smoothing parameters for prognostic and treatment effects can be chosen to reflect prior knowledge or tuned in a data-dependent way. We use tsBCF to analyze a new clinical protocol for early medical abortion. Our aim is to assess relative effectiveness of simultaneous versus interval administration of mifepristone and misoprostol over the first nine weeks of gestation. The model reflects our expectation that the relative effectiveness varies smoothly over gestation, but not necessarily over other covariates. We demonstrate the performance of the tsBCF method on benchmarking experiments. Software for tsBCF is available at https://github.com/jestarling/tsbcf/.