Source author record

Francesca Dominici

Francesca Dominici appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Methodology Applications Artificial Intelligence eess.SP Machine Learning math.ST Statistics Theory

Catalog footprint

What is connected

7works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

A Statistical Framework for Algorithmic Collective Action with Multiple Collectives

As learning systems increasingly shape everyday decisions, Algorithmic Collective Action (ACA), i.e., users coordinating changes to shared data to steer model behavior, offers a complement to regulator-side policy and corporate model design. Real-world collective actions have traditionally been decentralized and fragmented into multiple collectives, despite sharing overarching objectives, with each collective differing in size, strategy, and actionable goals. However, most of the ACA literature focuses on single collective settings. To address this, we propose the first comprehensive statistical framework for ACA with multiple collectives acting on the same system. In particular, we focus on collective action in classification, studying how multiple collectives can influence a classifier's behavior. We provide quantitative statistical bounds on the success of the collectives, considering the role and the interplay of the collectives' sizes and the alignment of their goals. We make such bounds computable by each collective with only partial knowledge of other collectives' sizes and strategies. Finally, we numerically illustrate our framework on simulations inspired by interventions for climate adaptation in smart cities, demonstrating the usefulness of our bounds.

preprint2026arXiv

Consistent Geometric Deep Learning via Hilbert Bundles and Cellular Sheaves

Modern deep learning architectures increasingly contend with sophisticated signals that are natively infinite-dimensional, such as time series, probability distributions, or operators, and are defined over irregular domains. Yet, a unified learning theory for these settings has been lacking. To start addressing this gap, we introduce a novel convolutional learning framework for possibly infinite-dimensional signals supported on a manifold. Namely, we use the connection Laplacian associated with a Hilbert bundle as a convolutional operator, and we derive filters and neural networks, dubbed as \textit{HilbNets}. We make HilbNets and, more generally, the convolution operation, implementable via a two-stage sampling procedure. First, we show that sampling the manifold induces a Hilbert Cellular Sheaf, a generalized graph structure with Hilbert feature spaces and edge-wise coupling rules, and we prove that its sheaf Laplacian converges in probability to the underlying connection Laplacian as the sampling density increases. Notably, this result is a generalization to the infinite-dimensional bundle setting of the Belkin \& Niyogi \cite{BELKIN20081289} convergence result for the graph Laplacian to the manifold Laplacian, a theoretical cornerstone of geometric learning methods. Second, we discretize the signals and prove that the discretized (implementable) HilbNets converge to the underlying continuous architectures and are transferable across different samplings of the same bundle, providing consistency for learning. Finally, we validate our framework on synthetic and real-world tasks. Overall, our results broaden the scope of geometric learning as a whole by lifting classical Laplacian-based frameworks to settings where the signal at each point lives in its own Hilbert space.

preprint2022arXiv

Assessing the causal effects of a stochastic intervention in time series data: Are heat alerts effective in preventing deaths and hospitalizations?

The methodological development of this paper is motivated by the need to address the following scientific question: does the issuance of heat alerts prevent adverse health effects? Our goal is to address this question within a causal inference framework in the context of time series data. A key challenge is that causal inference methods require the overlap assumption to hold: each unit (i.e., a day) must have a positive probability of receiving the treatment (i.e., issuing a heat alert on that day). In our motivating example, the overlap assumption is often violated: the probability of issuing a heat alert on a cooler day is zero. To overcome this challenge, we propose a stochastic intervention for time series data which is implemented via an incremental time-varying propensity score (ItvPS). The ItvPS intervention is executed by multiplying the probability of issuing a heat alert on day $t$ -- conditional on past information up to day $t$ -- by an odds ratio $δ_t$. First, we introduce a new class of causal estimands that relies on the ItvPS intervention. We provide theoretical results to show that these causal estimands can be identified and estimated under a weaker version of the overlap assumption. Second, we propose nonparametric estimators based on the ItvPS and derive an upper bound for the variances of these estimators. Third, we extend this framework to multi-site time series using a spatial meta-analysis approach. Fourth, we show that the proposed estimators perform well in terms of bias and root mean squared error via simulations. Finally, we apply our proposed approach to estimate the causal effects of increasing the probability of issuing heat alerts on each warm-season day in reducing deaths and hospitalizations among Medicare enrollees in $2,837$ U.S. counties.

preprint2021arXiv

Accounting for recall bias in case-control studies: a causal inference approach

A case-control study is designed to help determine if an exposure is associated with an outcome. However, since case-control studies are retrospective, they are often subject to recall bias. Recall bias can occur when study subjects do not remember previous events accurately. In this paper, we first define the estimand of interest: the causal odds ratio (COR) for a case-control study. Second, we develop estimation approaches for the COR and present estimates as a function of recall bias. Third, we define a new quantity called the \textit{R-factor}, which denotes the minimal amount of recall bias that leads to altering the initial conclusion. We show that a failure to account for recall bias can significantly bias estimation of the COR. Finally, we apply the proposed framework to a case-control study of the causal effect of childhood physical abuse on adulthood mental health.

preprint2020arXiv

A causal exposure response function with local adjustment for confounding: Estimating health effects of exposure to low levels of ambient fine particulate matter

The Clean Air Act mandates that the National Ambient Air Quality Standards (NAAQS) must be routinely assessed to protect populations based on the latest science. Therefore, researchers should continue to address whether exposure to levels of air pollution below the NAAQS is harmful to human health. The contentious nature surrounding environmental regulations urges us to cast this question within a causal inference framework. Parametric and semi-parametric regression approaches have been used to estimate the exposure-response (ER) curve between ambient air pollution and health outcomes. Most of these approaches are not formulated within a causal framework, adjust for the same covariates across all levels of exposure, and do not account for model uncertainty. We introduce a Bayesian framework for the estimation of a causal ER curve called LERCA (Local Exposure Response Confounding Adjustment), which allows for different confounders and different strength of confounding at the different exposure levels; and propagates uncertainty regarding confounders' selection and the shape of the ER. LERCA provides a principled way of assessing the covariates' confounding importance at different exposure levels, providing researchers with information regarding the variables to adjust for in regression models. Using simulations, we show that state of the art approaches perform poorly in estimating the ER curve in the presence of local confounding. LERCA is used to evaluate the relationship between exposure to ambient PM2.5 and cardiovascular hospitalizations for 5,362 zip codes in the US, while adjusting for a potentially varying set of confounders across the exposure range. Ambient PM2.5 leads to an increase in cardiovascular hospitalization rates when focusing at the low exposure range. Our results indicate that there is no threshold for the effect of PM2.5 on cardiovascular hospitalizations.

preprint2015arXiv

Generalized Quantile Treatment Effect: A Flexible Bayesian Approach Using Quantile Ratio Smoothing

We propose a new general approach for estimating the effect of a binary treatment on a continuous and potentially highly skewed response variable, the generalized quantile treatment effect (GQTE). The GQTE is defined as the difference between a function of the quantiles under the two treatment conditions. As such, it represents a generalization over the standard approaches typically used for estimating a treatment effect (i.e., the average treatment effect and the quantile treatment effect) because it allows the comparison of any arbitrary characteristic of the outcome's distribution under the two treatments. Following Dominici et al. (2005), we assume that a pre-specified transformation of the two quantiles is modeled as a smooth function of the percentiles. This assumption allows us to link the two quantile functions and thus to borrow information from one distribution to the other. The main theoretical contribution we provide is the analytical derivation of a closed form expression for the likelihood of the model. Exploiting this result we propose a novel Bayesian inferential methodology for the GQTE. We show some finite sample properties of our approach through a simulation study which confirms that in some cases it performs better than other nonparametric methods. As an illustration we finally apply our methodology to the 1987 National Medicare Expenditure Survey data to estimate the difference in the single hospitalization medical cost distributions between cases (i.e., subjects affected by smoking attributable diseases) and controls.

preprint2010arXiv

A smoothing approach for masking spatial data

Individual-level health data are often not publicly available due to confidentiality; masked data are released instead. Therefore, it is important to evaluate the utility of using the masked data in statistical analyses such as regression. In this paper we propose a data masking method which is based on spatial smoothing techniques. The proposed method allows for selecting both the form and the degree of masking, thus resulting in a large degree of flexibility. We investigate the utility of the masked data sets in terms of the mean square error (MSE) of regression parameter estimates when fitting a Generalized Linear Model (GLM) to the masked data. We also show that incorporating prior knowledge on the spatial pattern of the exposure into the data masking may reduce the bias and MSE of the parameter estimates. By evaluating both utility and disclosure risk as functions of the form and the degree of masking, our method produces a risk-utility profile which can facilitate the selection of masking parameters. We apply the method to a study of racial disparities in mortality rates using data on more than 4 million Medicare enrollees residing in 2095 zip codes in the Northeast region of the United States.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Topic footprint