Source author record

Rina Foygel Barber

Rina Foygel Barber appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.ST Statistics Theory Methodology Machine Learning Information Theory math.IT math.OC physics.med-ph

Catalog footprint

What is connected

25works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

False positive control in time series coincidence detection

We study the problem of coincidence detection in time series data, where we aim to determine whether the appearance of simultaneous or near-simultaneous events in two time series is indicative of some shared underlying signal or synchronicity, or might simply be due to random chance. This problem arises across many applications, such as astrophysics (e.g., detecting astrophysical events such as gravitational waves, with two or more detectors) and neuroscience (e.g., detecting synchronous firing patterns between two or more neurons). In this work, we consider methods based on time-shifting, where the timeline of one data stream is randomly shifted relative to another, to mimic the types of coincidences that could occur by random chance. Our theoretical results establish rigorous finite-sample guarantees controlling the probability of false positives, under weak assumptions that allow for dependence within the time series data, providing reassurance that time-shifting methods are a reliable tool for inference in this setting. Empirical results with simulated and real data validate the strong performance of time-shifting methods in dependent-data settings.

preprint2026arXiv

Predictive inference for time series: why is split conformal effective despite temporal dependence?

We consider the problem of uncertainty quantification for prediction in a time series: if we use past data to forecast the next time point, can we provide valid prediction intervals around our forecasts? To avoid placing distributional assumptions on the data, in recent years the conformal prediction method has been a popular approach for predictive inference, since it provides distribution-free coverage for any iid or exchangeable data distribution. However, in the time series setting, the strong empirical performance of conformal prediction methods is not well understood, since even short-range temporal dependence is a strong violation of the exchangeability assumption. Using predictors with "memory" -- i.e., predictors that utilize past observations, such as autoregressive models -- further exacerbates this problem. In this work, we examine the theoretical properties of split conformal prediction in the time series setting, including the case where predictors may have memory. Our results bound the loss of coverage of these methods in terms of a new "switch coefficient", measuring the extent to which temporal dependence within the time series creates violations of exchangeability. Our characterization of the coverage probability is sharp over the class of stationary, $β$-mixing processes. Along the way, we introduce tools that may prove useful in analyzing other predictive inference methods for dependent data.

preprint2026arXiv

Stability and Accuracy Trade-offs in Statistical Estimation

Algorithmic stability is a central concept in statistics and learning theory that measures how sensitive an algorithm's output is to small changes in the training data. Stability plays a crucial role in understanding generalization, robustness, and replicability, and a variety of stability notions have been proposed in different learning settings. However, while stability entails desirable properties, it is typically not sufficient on its own for statistical learning -- and indeed, it may be at odds with accuracy, since an algorithm that always outputs a constant function is perfectly stable but statistically meaningless. Thus, it is essential to understand the potential statistical cost of stability. In this work, we address this question by adopting a statistical decision-theoretic perspective, treating stability as a constraint in estimation. Focusing on two representative notions-worst-case stability and average-case stability-we first establish general lower bounds on the achievable estimation accuracy under each type of stability constraint. We then develop optimal stable estimators for four canonical estimation problems, including several mean estimation and regression settings. Together, these results characterize the optimal trade-offs between stability and accuracy across these tasks. Our findings formalize the intuition that average-case stability imposes a qualitatively weaker restriction than worst-case stability, and they further reveal that the gap between these two can vary substantially across different estimation problems.

preprint2023arXiv

Training-conditional coverage for distribution-free predictive inference

The field of distribution-free predictive inference provides tools for provably valid prediction without any assumptions on the distribution of the data, which can be paired with any regression algorithm to provide accurate and reliable predictive intervals. The guarantees provided by these methods are typically marginal, meaning that predictive accuracy holds on average over both the training data set and the test point that is queried. However, it may be preferable to obtain a stronger guarantee of training-conditional coverage, which would ensure that most draws of the training data set result in accurate predictive accuracy on future test points. This property is known to hold for the split conformal prediction method. In this work, we examine the training-conditional coverage properties of several other distribution-free predictive inference methods, and find that training-conditional coverage is achieved by some methods but is impossible to guarantee without further assumptions for others.

preprint2022arXiv

Distribution-free inference for regression: discrete, continuous, and in between

In data analysis problems where we are not able to rely on distributional assumptions, what types of inference guarantees can still be obtained? Many popular methods, such as holdout methods, cross-validation methods, and conformal prediction, are able to provide distribution-free guarantees for predictive inference, but the problem of providing inference for the underlying regression function (for example, inference on the conditional mean $\mathbb{E}[Y|X]$) is more challenging. In the setting where the features $X$ are continuously distributed, recent work has established that any confidence interval for $\mathbb{E}[Y|X]$ must have non-vanishing width, even as sample size tends to infinity. At the other extreme, if $X$ takes only a small number of possible values, then inference on $\mathbb{E}[Y|X]$ is trivial to achieve. In this work, we study the problem in settings in between these two extremes. We find that there are several distinct regimes in between the finite setting and the continuous setting, where vanishing-width confidence intervals are achievable if and only if the effective support size of the distribution of $X$ is smaller than the square of the sample size.

preprint2022arXiv

Half-Trek Criterion for Identifiability of Latent Variable Models

We consider linear structural equation models with latent variables and develop a criterion to certify whether the direct causal effects between the observable variables are identifiable based on the observed covariance matrix. Linear structural equation models assume that both observed and latent variables solve a linear equation system featuring stochastic noise terms. Each model corresponds to a directed graph whose edges represent the direct effects that appear as coefficients in the equation system. Prior research has developed a variety of methods to decide identifiability of direct effects in a latent projection framework, in which the confounding effects of the latent variables are represented by correlation among noise terms. This approach is effective when the confounding is sparse and effects only small subsets of the observed variables. In contrast, the new latent-factor half-trek criterion (LF-HTC) we develop in this paper operates on the original unprojected latent variable model and is able to certify identifiability in settings, where some latent variables may also have dense effects on many or even all of the observables. Our LF-HTC is an effective sufficient criterion for rational identifiability, under which the direct effects can be uniquely recovered as rational functions of the joint covariance matrix of the observed random variables. When restricting the search steps in LF-HTC to consider subsets of latent variables of bounded size, the criterion can be verified in time that is polynomial in the size of the graph.

preprint2020arXiv

Conformal Prediction Under Covariate Shift

We extend conformal prediction methodology beyond the case of exchangeable data. In particular, we show that a weighted version of conformal prediction can be used to compute distribution-free prediction intervals for problems in which the test and training covariate distributions differ, but the likelihood ratio between these two distributions is known---or, in practice, can be estimated accurately with access to a large set of unlabeled data (test covariate points). Our weighted extension of conformal prediction also applies more generally, to settings in which the data satisfies a certain weighted notion of exchangeability. We discuss other potential applications of our new conformal methodology, including latent variable and missing data problems.

preprint2020arXiv

Convex and Non-convex Approaches for Statistical Inference with Class-Conditional Noisy Labels

We study the problem of estimation and testing in logistic regression with class-conditional noise in the observed labels, which has an important implication in the Positive-Unlabeled (PU) learning setting. With the key observation that the label noise problem belongs to a special sub-class of generalized linear models (GLM), we discuss convex and non-convex approaches that address this problem. A non-convex approach based on the maximum likelihood estimation produces an estimator with several optimal properties, but a convex approach has an obvious advantage in optimization. We demonstrate that in the low-dimensional setting, both estimators are consistent and asymptotically normal, where the asymptotic variance of the non-convex estimator is smaller than the convex counterpart. We also quantify the efficiency gap which provides insight into when the two methods are comparable. In the high-dimensional setting, we show that both estimation procedures achieve $\ell_2$-consistency at the minimax optimal $\sqrt{s\log p/n}$ rates under mild conditions. Finally, we propose an inference procedure using a de-biasing approach. We validate our theoretical findings through simulations and a real-data example.

preprint2020arXiv

Predictive inference with the jackknife+

This paper introduces the jackknife+, which is a novel method for constructing predictive confidence intervals. Whereas the jackknife outputs an interval centered at the predicted response of a test point, with the width of the interval determined by the quantiles of leave-one-out residuals, the jackknife+ also uses the leave-one-out predictions at the test point to account for the variability in the fitted regression function. Assuming exchangeable training samples, we prove that this crucial modification permits rigorous coverage guarantees regardless of the distribution of the data points, for any algorithm that treats the training points symmetrically. Such guarantees are not possible for the original jackknife and we demonstrate examples where the coverage rate may actually vanish. Our theoretical and empirical analysis reveals that the jackknife and the jackknife+ intervals achieve nearly exact coverage and have similar lengths whenever the fitting algorithm obeys some form of stability. Further, we extend the jackknife+ to K-fold cross validation and similarly establish rigorous coverage properties. Our methods are related to cross-conformal prediction proposed by Vovk [2015] and we discuss connections.

preprint2020arXiv

The bias of isotonic regression

We study the bias of the isotonic regression estimator. While there is extensive work characterizing the mean squared error of the isotonic regression estimator, relatively little is known about the bias. In this paper, we provide a sharp characterization, proving that the bias scales as $O(n^{-β/3})$ up to log factors, where $1 \leq β\leq 2$ is the exponent corresponding to H{ö}lder smoothness of the underlying mean. Importantly, this result only requires a strictly monotone mean and that the noise distribution has subexponential tails, without relying on symmetric noise or other restrictive assumptions.

preprint2020arXiv

The limits of distribution-free conditional predictive inference

We consider the problem of distribution-free predictive inference, with the goal of producing predictive coverage guarantees that hold conditionally rather than marginally. Existing methods such as conformal prediction offer marginal coverage guarantees, where predictive coverage holds on average over all possible test points, but this is not sufficient for many practical applications where we would like to know that our predictions are valid for a given individual, not merely on average over a population. On the other hand, exact conditional inference guarantees are known to be impossible without imposing assumptions on the underlying distribution. In this work we aim to explore the space in between these two, and examine what types of relaxations of the conditional coverage property would alleviate some of the practical concerns with marginal coverage guarantees while still being possible to achieve in a distribution-free setting.

preprint2016arXiv

Accumulation tests for FDR control in ordered hypothesis testing

Multiple testing problems arising in modern scientific applications can involve simultaneously testing thousands or even millions of hypotheses, with relatively few true signals. In this paper, we consider the multiple testing problem where prior information is available (for instance, from an earlier study under different experimental conditions), that can allow us to test the hypotheses as a ranked list in order to increase the number of discoveries. Given an ordered list of n hypotheses, the aim is to select a data-dependent cutoff k and declare the first k hypotheses to be statistically significant while bounding the false discovery rate (FDR). Generalizing several existing methods, we develop a family of "accumulation tests" to choose a cutoff k that adapts to the amount of signal at the top of the ranked list. We introduce a new method in this family, the HingeExp method, which offers higher power to detect true signals compared to existing techniques. Our theoretical results prove that these methods control a modified FDR on finite samples, and characterize the power of the methods in the family. We apply the tests to simulated data, including a high-dimensional model selection problem for linear regression. We also compare accumulation tests to existing methods for multiple testing on a real data problem of identifying differential gene expression over a dosage gradient.

preprint2016arXiv

EigenPrism: Inference for High-Dimensional Signal-to-Noise Ratios

Consider the following three important problems in statistical inference, namely, constructing confidence intervals for (1) the error of a high-dimensional ($p>n$) regression estimator, (2) the linear regression noise level, and (3) the genetic signal-to-noise ratio of a continuous-valued trait (related to the heritability). All three problems turn out to be closely related to the little-studied problem of performing inference on the $\ell_2$-norm of the signal in high-dimensional linear regression. We derive a novel procedure for this, which is asymptotically correct when the covariates are multivariate Gaussian and produces valid confidence intervals in finite samples as well. The procedure, called EigenPrism, is computationally fast and makes no assumptions on coefficient sparsity or knowledge of the noise level. We investigate the width of the EigenPrism confidence intervals, including a comparison with a Bayesian setting in which our interval is just 5% wider than the Bayes credible interval. We are then able to unify the three aforementioned problems by showing that the EigenPrism procedure with only minor modifications is able to make important contributions to all three. We also investigate the robustness of coverage and find that the method applies in practice and in finite samples much more widely than just the case of multivariate Gaussian covariates. Finally, we apply EigenPrism to a genetic dataset to estimate the genetic signal-to-noise ratio for a number of continuous phenotypes.

preprint2016arXiv

MOCCA: mirrored convex/concave optimization for nonconvex composite functions

Many optimization problems arising in high-dimensional statistics decompose naturally into a sum of several terms, where the individual terms are relatively simple but the composite objective function can only be optimized with iterative algorithms. In this paper, we are interested in optimization problems of the form F(Kx) + G(x), where K is a fixed linear transformation, while F and G are functions that may be nonconvex and/or nondifferentiable. In particular, if either of the terms are nonconvex, existing alternating minimization techniques may fail to converge; other types of existing approaches may instead be unable to handle nondifferentiability. We propose the MOCCA (mirrored convex/concave) algorithm, a primal/dual optimization approach that takes local convex approximation to each term at every iteration. Inspired by optimization problems arising in computed tomography (CT) imaging, this algorithm can handle a range of nonconvex composite optimization problems, and offers theoretical guarantees for convergence when the overall problem is approximately convex (that is, any concavity in one term is balanced out by convexity in the other term). Empirical results show fast convergence for several structured signal recovery problems.

preprint2016arXiv

Selective Inference for Group-Sparse Linear Models

We develop tools for selective inference in the setting of group sparsity, including the construction of confidence intervals and p-values for testing selected groups of variables. Our main technical result gives the precise distribution of the magnitude of the projection of the data onto a given subspace, and enables us to develop inference procedures for a broad class of group-sparse selection methods, including the group lasso, iterative hard thresholding, and forward stepwise regression. We give numerical results to illustrate these tools on simulated data and on health record data.

preprint2016arXiv

The Function-on-Scalar LASSO with Applications to Longitudinal GWAS

We present a new methodology for simultaneous variable selection and parameter estimation in function-on-scalar regression with an ultra-high dimensional predictor vector. We extend the LASSO to functional data in both the $\textit{dense}$ functional setting and the $\textit{sparse}$ functional setting. We provide theoretical guarantees which allow for an exponential number of predictor variables. Simulations are carried out which illustrate the methodology and compare the sparse/functional methods. Using the Framingham Heart Study, we demonstrate how our tools can be used in genome-wide association studies, finding a number of genetic mutations which affect blood pressure and are therefore important for cardiovascular health.

preprint2016arXiv

The knockoff filter for FDR control in group-sparse and multitask regression

We propose the group knockoff filter, a method for false discovery rate control in a linear regression setting where the features are grouped, and we would like to select a set of relevant groups which have a nonzero effect on the response. By considering the set of true and false discoveries at the group level, this method gains power relative to sparse regression methods. We also apply our method to the multitask regression problem where multiple response variables share similar sparsity patterns across the set of possible features. Empirically, the group knockoff filter successfully controls false discoveries at the group level in both settings, with substantially more discoveries made by leveraging the group structure.

preprint2016arXiv

The p-filter: multi-layer FDR control for grouped hypotheses

In many practical applications of multiple hypothesis testing using the False Discovery Rate (FDR), the given hypotheses can be naturally partitioned into groups, and one may not only want to control the number of false discoveries (wrongly rejected null hypotheses), but also the number of falsely discovered groups of hypotheses (we say a group is falsely discovered if at least one hypothesis within that group is rejected, when in reality the group contains only nulls). In this paper, we introduce the p-filter, a procedure which unifies and generalizes the standard FDR procedure by Benjamini and Hochberg and global null testing procedure by Simes. We first prove that our proposed method can simultaneously control the overall FDR at the finest level (individual hypotheses treated separately) and the group FDR at coarser levels (when such groups are user-specified). We then generalize the p-filter procedure even further to handle multiple partitions of hypotheses, since that might be natural in many applications. For example, in neuroscience experiments, we may have a hypothesis for every (discretized) location in the brain, and at every (discretized) timepoint: does the stimulus correlate with activity in location x at time t after the stimulus was presented? In this setting, one might want to group hypotheses by location and by time. Importantly, our procedure can handle multiple partitions which are nonhierarchical (i.e. one partition may arrange p-values by voxel, and another partition arranges them by time point; neither one is nested inside the other). We prove that our procedure controls FDR simultaneously across these multiple lay- ers, under assumptions that are standard in the literature: we do not need the hypotheses to be independent, but require a nonnegative dependence condition known as PRDS.

preprint2016arXiv

Trimmed Conformal Prediction for High-Dimensional Models

In regression, conformal prediction is a general methodology to construct prediction intervals in a distribution-free manner. Although conformal prediction guarantees strong statistical property for predictive inference, its inherent computational challenge has attracted the attention of researchers in the community. In this paper, we propose a new framework, called Trimmed Conformal Prediction (TCP), based on two stage procedure, a trimming step and a prediction step. The idea is to use a preliminary trimming step to substantially reduce the range of possible values for the prediction interval, and then applying conformal prediction becomes far more efficient. As is the case of conformal prediction, TCP can be applied to any regression method, and further offers both statistical accuracy and computational gains. For a specific example, we also show how TCP can be implemented in the sparse regression setting. The experiments on both synthetic and real data validate the empirical performance of TCP.

preprint2015arXiv

An algorithm for constrained one-step inversion of spectral CT data

We develop a primal-dual algorithm that allows for one-step inversion of spectral CT transmission photon counts data to a basis map decomposition. The algorithm allows for image constraints to be enforced on the basis maps during the inversion. The derivation of the algorithm makes use of a local upper bounding quadratic approximation to generate descent steps for non-convex spectral CT data discrepancy terms, combined with a new convex-concave optimization algorithm. Convergence of the algorithm is demonstrated on simulated spectral CT data. Simulations with noise and anthropomorphic phantoms show examples of how to employ the constrained one-step algorithm for spectral CT data.

preprint2015arXiv

Controlling the false discovery rate via knockoffs

In many fields of science, we observe a response variable together with a large number of potential explanatory variables, and would like to be able to discover which variables are truly associated with the response. At the same time, we need to know that the false discovery rate (FDR) - the expected fraction of false discoveries among all discoveries - is not too high, in order to assure the scientist that most of the discoveries are indeed true and replicable. This paper introduces the knockoff filter, a new variable selection procedure controlling the FDR in the statistical linear model whenever there are at least as many observations as variables. This method achieves exact FDR control in finite sample settings no matter the design or covariates, the number of variables in the model, or the amplitudes of the unknown regression coefficients, and does not require any knowledge of the noise level. As the name suggests, the method operates by manufacturing knockoff variables that are cheap - their construction does not require any new data - and are designed to mimic the correlation structure found within the existing variables, in a way that allows for accurate FDR control, beyond what is possible with permutation-based methods. The method of knockoffs is very general and flexible, and can work with a broad class of test statistics. We test the method in combination with statistics from the Lasso for sparse regression, and obtain empirical results showing that the resulting method has far more power than existing selection rules when the proportion of null variables is high.

preprint2015arXiv

High-dimensional Ising model selection with Bayesian information criteria

We consider the use of Bayesian information criteria for selection of the graph underlying an Ising model. In an Ising model, the full conditional distributions of each variable form logistic regression models, and variable selection techniques for regression allow one to identify the neighborhood of each node and, thus, the entire graph. We prove high-dimensional consistency results for this pseudo-likelihood approach to graph selection when using Bayesian information criteria for the variable selection problems in the logistic regressions. The results pertain to scenarios of sparsity and following related prior work the information criteria we consider incorporate an explicit prior that encourages sparsity.

preprint2015arXiv

Laplace Approximation in High-dimensional Bayesian Regression

We consider Bayesian variable selection in sparse high-dimensional regression, where the number of covariates $p$ may be large relative to the samples size $n$, but at most a moderate number $q$ of covariates are active. Specifically, we treat generalized linear models. For a single fixed sparse model with well-behaved prior distribution, classical theory proves that the Laplace approximation to the marginal likelihood of the model is accurate for sufficiently large sample size $n$. We extend this theory by giving results on uniform accuracy of the Laplace approximation across all models in a high-dimensional scenario in which $p$ and $q$, and thus also the number of considered models, may increase with $n$. Moreover, we show how this connection between marginal likelihood and Laplace approximation can be used to obtain consistency results for Bayesian approaches to variable selection in high-dimensional regression.

preprint2014arXiv

Privacy and Statistical Risk: Formalisms and Minimax Bounds

We explore and compare a variety of definitions for privacy and disclosure limitation in statistical estimation and data analysis, including (approximate) differential privacy, testing-based definitions of privacy, and posterior guarantees on disclosure risk. We give equivalence results between the definitions, shedding light on the relationships between different formalisms for privacy. We also take an inferential perspective, where---building off of these definitions---we provide minimax risk bounds for several estimation problems, including mean estimation, estimation of the support of a distribution, and nonparametric density estimation. These bounds highlight the statistical consequences of different definitions of privacy and provide a second lens for evaluating the advantages and disadvantages of different techniques for disclosure limitation.

preprint2014arXiv

The log-shift penalty for adaptive estimation of multiple Gaussian graphical models

Sparse Gaussian graphical models characterize sparse dependence relationships between random variables in a network. To estimate multiple related Gaussian graphical models on the same set of variables, we formulate a hierarchical model, which leads to an optimization problem with a nonconvex log-shift penalty function. We show that under mild conditions the optimization problem is convex despite the inclusion of a nonconvex penalty, and derive an efficient optimization algorithm. Experiments on both synthetic and real data show that the proposed method is able to achieve good selection and estimation performance simultaneously, because the nonconvexity of the log-shift penalty allows for weak signals to be thresholded to zero without excessive shrinkage on the strong signals.

Rina Foygel Barber

What is connected

Connect this record

See the researcher in context

Building this map preview

25 published item(s)

False positive control in time series coincidence detection

Predictive inference for time series: why is split conformal effective despite temporal dependence?

Stability and Accuracy Trade-offs in Statistical Estimation

Training-conditional coverage for distribution-free predictive inference

Distribution-free inference for regression: discrete, continuous, and in between

Half-Trek Criterion for Identifiability of Latent Variable Models

Conformal Prediction Under Covariate Shift

Convex and Non-convex Approaches for Statistical Inference with Class-Conditional Noisy Labels

Predictive inference with the jackknife+

The bias of isotonic regression

The limits of distribution-free conditional predictive inference

Accumulation tests for FDR control in ordered hypothesis testing

EigenPrism: Inference for High-Dimensional Signal-to-Noise Ratios

MOCCA: mirrored convex/concave optimization for nonconvex composite functions

Selective Inference for Group-Sparse Linear Models

The Function-on-Scalar LASSO with Applications to Longitudinal GWAS

The knockoff filter for FDR control in group-sparse and multitask regression

The p-filter: multi-layer FDR control for grouped hypotheses

Trimmed Conformal Prediction for High-Dimensional Models

An algorithm for constrained one-step inversion of spectral CT data

Controlling the false discovery rate via knockoffs

High-dimensional Ising model selection with Bayesian information criteria

Laplace Approximation in High-dimensional Bayesian Regression

Privacy and Statistical Risk: Formalisms and Minimax Bounds

The log-shift penalty for adaptive estimation of multiple Gaussian graphical models