Researcher profile

Nandita Mitra

Nandita Mitra contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
7works
0followers
3topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

7 published item(s)

preprint2025arXiv

A Causal Framework for Evaluating Drivers of Policy Effect Heterogeneity Using Difference-in-Differences

Policymakers and researchers often seek to understand how a policy differentially affects a population and the pathways driving this heterogeneity. For example, when studying an excise tax on sweetened beverages, researchers might assess the roles of cross-border shopping, economic competition, and store-level price changes on beverage sales trends. However, traditional policy evaluation tools, like the difference-in-differences (DiD) approach, primarily target average effects of the observed intervention rather than the underlying drivers of effect heterogeneity. Common approaches to evaluate sources of heterogeneity often lack a causal framework, making it difficult to determine whether observed outcome differences are truly driven by the proposed source of heterogeneity or by other confounding factors. In this paper, we present a framework for evaluating such policy drivers by representing questions of effect heterogeneity under hypothetical interventions and use it to evaluate drivers of the Philadelphia sweetened beverage tax policy effects. Building on recent advancements in estimating causal effect curves under DiD designs, we provide tools to assess policy effect heterogeneity while addressing practical challenges including confounding and neighborhood dynamics.

preprint2023arXiv

Hierarchical Bayesian Bootstrap for Heterogeneous Treatment Effect Estimation

A major focus of causal inference is the estimation of heterogeneous average treatment effects (HTE) - average treatment effects within strata of another variable of interest such as levels of a biomarker, education, or age strata. Inference involves estimating a stratum-specific regression and integrating it over the distribution of confounders in that stratum - which itself must be estimated. Standard practice involves estimating these stratum-specific confounder distributions independently (e.g. via the empirical distribution or Rubin's Bayesian bootstrap), which becomes problematic for sparsely populated strata with few observed confounder vectors. In this paper, we develop a nonparametric hierarchical Bayesian bootstrap (HBB) prior over the stratum-specific confounder distributions for HTE estimation. The HBB partially pools the stratum-specific distributions, thereby allowing principled borrowing of confounder information across strata when sparsity is a concern. We show that posterior inference under the HBB can yield efficiency gains over standard marginalization approaches while avoiding strong parametric assumptions about the confounder distribution. We use our approach to estimate the adverse event risk of proton versus photon chemoradiotherapy across various cancer types.

preprint2022arXiv

Addressing Positivity Violations in Causal Effect Estimation using Gaussian Process Priors

In observational studies, causal inference relies on several key identifying assumptions. One identifiability condition is the positivity assumption, which requires the probability of treatment be bounded away from 0 and 1. That is, for every covariate combination, it should be possible to observe both treated and control subjects, i.e., the covariate distributions should overlap between treatment arms. If the positivity assumption is violated, population-level causal inference necessarily involves some extrapolation. Ideally, a greater amount of uncertainty about the causal effect estimate should be reflected in such situations. With that goal in mind, we construct a Gaussian process model for estimating treatment effects in the presence of practical violations of positivity. Advantages of our method include minimal distributional assumptions, a cohesive model for estimating treatment effects, and more uncertainty associated with areas in the covariate space where there is less overlap. We assess the performance of our approach with respect to bias and efficiency using simulation studies. The method is then applied to a study of critically ill female patients to examine the effect of undergoing right heart catheterization.

preprint2021arXiv

A regression framework for a probabilistic measure of cost-effectiveness

To make informed health policy decisions regarding a treatment, we must consider both its cost and its clinical effectiveness. In past work, we introduced the net benefit separation (NBS) as a novel measure of cost-effectiveness. The NBS is a probabilistic measure that characterizes the extent to which a treated patient will be more likely to experience benefit as compared to an untreated patient. Due to variation in treatment response across patients, uncovering factors that influence cost-effectiveness can assist policy makers in population-level decisions regarding resource allocation. In this paper, we introduce a regression framework for NBS in order to estimate covariate-specific NBS and find determinants of variation in NBS. Our approach is able to accommodate informative cost censoring through inverse probability weighting techniques, and addresses confounding through a semiparametric standardization procedure. Through simulations, we show that NBS regression performs well in a variety of common scenarios. We apply our proposed regression procedure to a realistic simulated data set as an illustration of how our approach could be used to investigate the association between cancer stage, comorbidities and cost-effectiveness when comparing adjuvant radiation therapy and chemotherapy in post-hysterectomy endometrial cancer patients.

preprint2021arXiv

Analysis of survival data with non-proportional hazards: A comparison of propensity score weighted methods

One of the most common ways researchers compare survival outcomes across treatments when confounding is present is using Cox regression. This model is limited by its underlying assumption of proportional hazards; in some cases, substantial violations may occur. Here we present and compare approaches which attempt to address this issue, including Cox models with time-varying hazard ratios; parametric accelerated failure time models; Kaplan-Meier curves; and pseudo-observations. To adjust for differences between treatment groups, we use Inverse Probability of Treatment Weighting based on the propensity score. We examine clinically meaningful outcome measures that can be computed and directly compared across each method, namely, survival probability at time T, median survival, and restricted mean survival. We conduct simulation studies under a range of scenarios, and determine the biases, coverages, and standard errors of the Average Treatment Effects for each method. We then apply these approaches to two published observational studies of survival after cancer treatment. The first examines chemotherapy in sarcoma, where survival is very similar initially, but after two years the chemotherapy group shows a benefit. The other study is a comparison of surgical techniques for kidney cancer, where survival differences are attenuated over time.

preprint2020arXiv

Bayesian Nonparametric Cost-Effectiveness Analyses: Causal Estimation and Adaptive Subgroup Discovery

Cost-effectiveness analyses (CEAs) are at the center of health economic decision making. While these analyses help policy analysts and economists determine coverage, inform policy, and guide resource allocation, they are statistically challenging for several reasons. Cost and effectiveness are correlated and follow complex joint distributions which are difficult to capture parametrically. Effectiveness (often measured as increased survival time) and accumulated cost tends to be right-censored in many applications. Moreover, CEAs are often conducted using observational data with non-random treatment assignment. Policy-relevant causal estimation therefore requires robust confounding control. Finally, current CEA methods do not address cost-effectiveness heterogeneity in a principled way - often presenting population-averaged estimates even though significant effect heterogeneity may exist. Motivated by these challenges, we develop a nonparametric Bayesian model for joint cost-survival distributions in the presence of censoring. Our approach utilizes a joint Enriched Dirichlet Process prior on the covariate effects of cost and survival time, while using a Gamma Process prior on the baseline survival time hazard. Causal CEA estimands, with policy-relevant interpretations, are identified and estimated via a Bayesian nonparametric g-computation procedure. Finally, we outline how the induced clustering of the Enriched Dirichlet Process can be used to adaptively detect presence of subgroups with different cost-effectiveness profiles. We outline an MCMC procedure for full posterior inference and evaluate frequentist properties via simulations. We use our model to assess the cost-efficacy of chemotherapy versus radiation adjuvant therapy for treating endometrial cancer in the SEER-Medicare database.

preprint2019arXiv

A Bayesian Nonparametric Model for Zero-Inflated Outcomes: Prediction, Clustering, and Causal Estimation

Researchers are often interested in predicting outcomes, conducting clustering analysis to detect distinct subgroups of their data, or computing causal treatment effects. Pathological data distributions that exhibit skewness and zero-inflation complicate these tasks - requiring highly flexible, data-adaptive modeling. In this paper, we present a fully nonparametric Bayesian generative model for continuous, zero-inflated outcomes that simultaneously predicts structural zeros, captures skewness, and clusters patients with similar joint data distributions. The flexibility of our approach yields predictions that capture the joint data distribution better than commonly used zero-inflated methods. Moreover, we demonstrate that our model can be coherently incorporated into a standardization procedure for computing causal effect estimates that are robust to such data pathologies. Uncertainty at all levels of this model flow through to the causal effect estimates of interest - allowing easy point estimation, interval estimation, and posterior predictive checks verifying positivity, a required causal identification assumption. Our simulation results show point estimates to have low bias and interval estimates to have close to nominal coverage under complicated data settings. Under simpler settings, these results hold while incurring lower efficiency loss than comparator methods. Lastly, we use our proposed method to analyze zero-inflated inpatient medical costs among endometrial cancer patients receiving either chemotherapy and radiation therapy in the SEER medicare database.