Source author record

Victor M. Panaretos

Victor M. Panaretos appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Methodology math.ST Statistics Theory Applications Computation hep-ex physics.data-an math.FA

Catalog footprint

What is connected

16works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Functional Estimation of Anisotropic Covariance and Autocovariance Operators on the Sphere

We propose nonparametric estimators for the second-order central moments of possibly anisotropic spherical random fields, within a functional data analysis context. We consider a measurement framework where each random field among an identically distributed collection of spherical random fields is sampled at a few random directions, possibly subject to measurement error. The collection of random fields could be i.i.d. or serially dependent. Though similar setups have already been explored for random functions defined on the unit interval, the nonparametric estimators proposed in the literature often rely on local polynomials, which do not readily extend to the (product) spherical setting. We therefore formulate our estimation procedure as a variational problem involving a generalized Tikhonov regularization term. The latter favours smooth covariance/autocovariance functions, where the smoothness is specified by means of suitable Sobolev-like pseudo-differential operators. Using the machinery of reproducing kernel Hilbert spaces, we establish representer theorems that fully characterize the form of our estimators. We determine their uniform rates of convergence as the number of random fields diverges, both for the dense (increasing number of spatial samples) and sparse (bounded number of spatial samples) regimes. We moreover demonstrate the computational feasibility and practical merits of our estimation procedure in a simulation setting, assuming a fixed number of samples per random field. Our numerical estimation procedure leverages the sparsity and second-order Kronecker structure of our setup to reduce the computational and memory requirements by approximately three orders of magnitude compared to a naive implementation would require.

preprint2022arXiv

Minimax Rate for Optimal Transport Regression Between Distributions

Distribution-on-distribution regression considers the problem of formulating and estimating a regression relationship where both covariate and response are probability distributions. The optimal transport distributional regression model postulates that the conditional Fréchet mean of the response distribution is linked to the covariate distribution via an optimal transport map. We establish the minimax rate of estimation of such a regression function, by deriving a lower-bound that matches the convergence rate attained by the Fréchet least squares estimator.

preprint2022arXiv

On the rate of convergence for the autocorrelation operator in functional autoregression

We consider the problem of estimating the autocorrelation operator of an autoregressive Hilbertian process. By means of a Tikhonov approach, we establish a general result that yields the convergence rate of the estimated autocorrelation operator as a function of the rate of convergence of the estimated lag zero and lag one autocovariance operators. The result is general in that it can accommodate any consistent estimators of the lagged autocovariances. Consequently it can be applied to processes under any mode of observation: complete, discrete, sparse, and/or with measurement errors. An appealing feature is that the result does not require delicate spectral decay assumptions on the autocovariances but instead rests on natural source conditions. The result is illustrated by application to important special cases.

preprint2022arXiv

Separable Expansions for Covariance Estimation

The non-parametric estimation of covariance lies at the heart of functional data analysis, whether for curve or surface-valued data. The case of a two-dimensional domain poses both statistical and computational challenges, which are typically alleviated by assuming separability. However, separability is often questionable, sometimes even demonstrably inadequate. We propose a framework for the analysis of covariance operators of random surfaces that generalises separability, while retaining its major advantages. Our approach is based on the expansion of the covariance into a series of separable terms. The expansion is valid for any covariance over a two-dimensional domain. Leveraging the key notion of the partial inner product, we extend the power iteration method to general Hilbert spaces and show how the aforementioned expansion can be efficiently constructed in practice. Truncation of the expansion and retention of the leading terms automatically induces a non-parametric estimator of the covariance, whose parsimony is dictated by the truncation level. The resulting estimator can be calculated, stored and manipulated with little computational overhead relative to separability. Consistency and rates of convergence are derived under mild regularity assumptions, illustrating the trade-off between bias and variance regulated by the truncation level. The merits and practical performance of the proposed methodology are demonstrated in a comprehensive simulation study and on classification of EEG signals.

preprint2022arXiv

The Completion of Covariance Kernels

We consider the problem of positive-semidefinite continuation: extending a partially specified covariance kernel from a subdomain $Ω$ of a rectangular domain $I\times I$ to a covariance kernel on the entire domain $I\times I$. For a broad class of domains $Ω$ called \emph{serrated domains}, we are able to present a complete theory. Namely, we demonstrate that a canonical completion always exists and can be explicitly constructed. We characterise all possible completions as suitable perturbations of the canonical completion, and determine necessary and sufficient conditions for a unique completion to exist. We interpret the canonical completion via the graphical model structure it induces on the associated Gaussian process. Furthermore, we show how the estimation of the canonical completion reduces to the solution of a system of linear statistical inverse problems in the space of Hilbert-Schmidt operators, and derive rates of convergence. We conclude by providing extensions of our theory to more general forms of domains, and by demonstrating how our results can be used to construct covariance estimators from sample path fragments of the associated stochastic process. Our results are illustrated numerically by way of a simulation study and a real example.

preprint2020arXiv

Functional Lagged Regression with Sparse Noisy Observations

A functional (lagged) time series regression model involves the regression of scalar response time series on a time series of regressors that consists of a sequence of random functions. In practice, the underlying regressor curve time series are not always directly accessible, but are latent processes observed (sampled) only at discrete measurement locations. In this paper, we consider the so-called sparse observation scenario where only a relatively small number of measurement locations have been observed, possibly different for each curve. The measurements can be further contaminated by additive measurement error. A spectral approach to the estimation of the model dynamics is considered. The spectral density of the regressor time series and the cross-spectral density between the regressors and response time series are estimated by kernel smoothing methods from the sparse observations. The impulse response regression coefficients of the lagged regression model are then estimated by means of ridge regression (Tikhonov regularisation) or PCA regression (spectral truncation). The latent functional time series are then recovered by means of prediction, conditioning on all the observed observed data. The performance and implementation of our methods are illustrated by means of a simulation study and the analysis of meteorological data.

preprint2020arXiv

Functional Registration and Local Variations: Identifiability, Rank, and Tuning

We develop theory and methodology for the problem of nonparametric registration of functional data that have been subjected to random deformation (warping) of their time scale. The separation of this phase variation ("horizontal" variation) from the amplitude variation ("vertical" variation) is crucial in order to properly conduct further analyses, which otherwise can be severely distorted. We determine precise nonparametric conditions under which the two forms of variation are identifiable. These show that the identifiability delicately depends on the underlying rank. By means of several counterexamples, we demonstrate that our conditions are sharp if one wishes a genuinely nonparametric setup; and in doing so we caution that popular remedies such as structural assumptions or roughness penalties can easily fail. We then propose a nonparametric registration method based on a "local variation measure", the main element in elucidating identifiability. A key advantage of the method is that it is free of any tuning or penalisation parameters regulating the amount of alignment, thus circumventing the problem of over/under-registration often encountered in practice. We provide asymptotic theory for the resulting estimators under the identifiable regime, but also under mild departures from identifiability, quantifying the resulting bias in terms of the amplitude variation's spectral gap.

preprint2020arXiv

Spectral Simulation of Functional Time Series

We develop methodology allowing to simulate a stationary functional time series defined by means of its spectral density operators. Our framework is general, in that it encompasses any such stationary functional time series, whether linear or not. The methodology manifests particularly significant computational gains if the spectral density operators are specified by means of their eigendecomposition or as a filtering of white noise. In the special case of linear processes, we determine the analytical expressions for the spectral density operators of functional autoregressive (fractionally integrated) moving average processes, and leverage these as part of our spectral approach, leading to substantial improvements over time-domain simulation methods in some cases. The methods are implemented as an R package (specsimfts) accompanied by several demo files that are easy to modify and can be easily used by researchers aiming to probe the finite-sample performance of their functional time series methodology by means of simulation.

preprint2020arXiv

Testing for the Rank of a Covariance Operator

How can we discern whether the covariance operator of a stochastic process is of reduced rank, and if so, what its precise rank is? And how can we do so at a given level of confidence? This question is central to a great deal of methods for functional data, which require low-dimensional representations whether by functional PCA or other methods. The difficulty is that the determination is to be made on the basis of i.i.d. replications of the process observed discretely and with measurement error contamination. This adds a ridge to the empirical covariance, obfuscating the underlying dimension. We build a matrix-completion inspired test statistic that circumvents this issue by measuring the best possible least square fit of the empirical covariance's off-diagonal elements, optimised over covariances of given finite rank. For a fixed grid of sufficiently large size, we determine the statistic's asymptotic null distribution as the number of replications grows. We then use it to construct a bootstrap implementation of a stepwise testing procedure controlling the family-wise error rate corresponding to the collection of hypotheses formalising the question at hand. Under minimal regularity assumptions we prove that the procedure is consistent and that its bootstrap implementation is valid. The procedure circumvents smoothing and associated smoothing parameters, is indifferent to measurement error heteroskedasticity, and does not assume a low-noise regime. An extensive simulation study reveals an excellent practical performance, stably across a wide range of settings, and the procedure is further illustrated by means of two data analyses.

preprint2019arXiv

Sparsely Observed Functional Time Series: Estimation and Prediction

Functional time series analysis, whether based on time of frequency domain methodology, has traditionally been carried out under the assumption of complete observation of the constituent series of curves, assumed stationary. Nevertheless, as is often the case with independent functional data, it may well happen that the data available to the analyst are not the actual sequence of curves, but relatively few and noisy measurements per curve, potentially at different locations in each curve's domain. Under this sparse sampling regime, neither the established estimators of the time series' dynamics, nor their corresponding theoretical analysis will apply. The subject of this paper is to tackle the problem of estimating the dynamics and of recovering the latent process of smooth curves in the sparse regime. Assuming smoothness of the latent curves, we construct a consistent nonparametric estimator of the series' spectral density operator and use it develop a frequency-domain recovery approach, that predicts the latent curve at a given time by borrowing strength from the (estimated) dynamic correlations in the series across time. Further to predicting the latent curves from their noisy point samples, the method fills in gaps in the sequence (curves nowhere sampled), denoises the data, and serves as a basis for forecasting. Means of providing corresponding confidence bands are also investigated. A simulation study interestingly suggests that sparse observation for a longer time period, may be provide better performance than dense observation for a shorter period, in the presence of smoothness. The methodology is further illustrated by application to an environmental data set on fair-weather atmospheric electricity, which naturally leads to a sparse functional time-series.

preprint2016arXiv

Amplitude and phase variation of point processes

We develop a canonical framework for the study of the problem of registration of multiple point processes subjected to warping, known as the problem of separation of amplitude and phase variation. The amplitude variation of a real random function $\{Y(x):x\in[0,1]\}$ corresponds to its random oscillations in the $y$-axis, typically encapsulated by its (co)variation around a mean level. In contrast, its phase variation refers to fluctuations in the $x$-axis, often caused by random time changes. We formalise similar notions for a point process, and nonparametrically separate them based on realisations of i.i.d. copies $\{Π_i\}$ of the phase-varying point process. A key element in our approach is to demonstrate that when the classical phase variation assumptions of Functional Data Analysis (FDA) are applied to the point process case, they become equivalent to conditions interpretable through the prism of the theory of optimal transportation of measure. We demonstrate that these induce a natural Wasserstein geometry tailored to the warping problem, including a formal notion of bias expressing over-registration. Within this framework, we construct nonparametric estimators that tend to avoid over-registration in finite samples. We show that they consistently estimate the warp maps, consistently estimate the structural mean, and consistently register the warped point processes, even in a sparse sampling regime. We also establish convergence rates, and derive $\sqrt{n}$-consistency and a central limit theorem in the Cox process case under dense sampling, showing rate optimality of our structural mean estimator in that case.

preprint2015arXiv

Statistical unfolding of elementary particle spectra: Empirical Bayes estimation and bias-corrected uncertainty quantification

We consider the high energy physics unfolding problem where the goal is to estimate the spectrum of elementary particles given observations distorted by the limited resolution of a particle detector. This important statistical inverse problem arising in data analysis at the Large Hadron Collider at CERN consists in estimating the intensity function of an indirectly observed Poisson point process. Unfolding typically proceeds in two steps: one first produces a regularized point estimate of the unknown intensity and then uses the variability of this estimator to form frequentist confidence intervals that quantify the uncertainty of the solution. In this paper, we propose forming the point estimate using empirical Bayes estimation which enables a data-driven choice of the regularization strength through marginal maximum likelihood estimation. Observing that neither Bayesian credible intervals nor standard bootstrap confidence intervals succeed in achieving good frequentist coverage in this problem due to the inherent bias of the regularized point estimate, we introduce an iteratively bias-corrected bootstrap technique for constructing improved confidence intervals. We show using simulations that this enables us to achieve nearly nominal frequentist coverage with only a modest increase in interval length. The proposed methodology is applied to unfolding the $Z$ boson invariant mass spectrum as measured in the CMS experiment at the Large Hadron Collider.

preprint2014arXiv

Empirical Bayes unfolding of elementary particle spectra at the Large Hadron Collider

We consider the so-called unfolding problem in experimental high energy physics, where the goal is to estimate the true spectrum of elementary particles given observations distorted by measurement error due to the limited resolution of a particle detector. This an important statistical inverse problem arising in the analysis of data at the Large Hadron Collider at CERN. Mathematically, the problem is formalized as one of estimating the intensity function of an indirectly observed Poisson point process. Particle physicists are particularly keen on unfolding methods that feature a principled way of choosing the regularization strength and allow for the quantification of the uncertainty inherent in the solution. Though there are many approaches that have been considered by experimental physicists, it can be argued that few -- if any -- of these deal with these two key issues in a satisfactory manner. In this paper, we propose to attack the unfolding problem within the framework of empirical Bayes estimation: we consider Bayes estimators of the coefficients of a basis expansion of the unknown intensity, using a regularizing prior; and employ a Monte Carlo expectation-maximization algorithm to find the marginal maximum likelihood estimate of the hyperparameter controlling the strength of the regularization. Due to the data-driven choice of the hyperparameter, credible intervals derived using the empirical Bayes posterior lose their subjective Bayesian interpretation. Since the properties and meaning of such intervals are poorly understood, we explore instead the use of bootstrap resampling for constructing purely frequentist confidence bands for the true intensity. The performance of the proposed methodology is demonstrated using both simulations and real data from the Large Hadron Collider.

preprint2013arXiv

Fourier analysis of stationary time series in function space

We develop the basic building blocks of a frequency domain framework for drawing statistical inferences on the second-order structure of a stationary sequence of functional data. The key element in such a context is the spectral density operator, which generalises the notion of a spectral density matrix to the functional setting, and characterises the second-order dynamics of the process. Our main tool is the functional Discrete Fourier Transform (fDFT). We derive an asymptotic Gaussian representation of the fDFT, thus allowing the transformation of the original collection of dependent random functions into a collection of approximately independent complex-valued Gaussian random functions. Our results are then employed in order to construct estimators of the spectral density operator based on smoothed versions of the periodogram kernel, the functional generalisation of the periodogram matrix. The consistency and asymptotic law of these estimators are studied in detail. As immediate consequences, we obtain central limit theorems for the mean and the long-run covariance operator of a stationary functional time series. Our results do not depend on structural modelling assumptions, but only functional versions of classical cumulant mixing conditions, and are shown to be stable under discrete observation of the individual curves.

preprint2012arXiv

A Conversation with David R. Brillinger

David Ross Brillinger was born on the 27th of October 1937, in Toronto, Canada. In 1955, he entered the University of Toronto, graduating with a B.A. with Honours in Pure Mathematics in 1959, while also serving as a Lieutenant in the Royal Canadian Naval Reserve. He was one of the five winners of the Putnam mathematical competition in 1958. He then went on to obtain his M.A. and Ph.D. in Mathematics at Princeton University, in 1960 and 1961, the latter under the guidance of John W. Tukey. During the period 1962--1964 he held halftime appointments as a Lecturer in Mathematics at Princeton, and a Member of Technical Staff at Bell Telephone Laboratories, Murray Hill, New Jersey. In 1964, he was appointed Lecturer and, two years later, Reader in Statistics at the London School of Economics. After spending a sabbatical year at Berkeley in 1967--1968, he returned to become Professor of Statistics in 1970, and has been there ever since. During his 40 years (and counting) as a faculty member at Berkeley, he has supervised 40 doctoral theses. He has a record of academic and professional service and has received a number of honors and awards.

preprint2012arXiv

Sparse approximations of protein structure from noisy random projections

Single-particle electron microscopy is a modern technique that biophysicists employ to learn the structure of proteins. It yields data that consist of noisy random projections of the protein structure in random directions, with the added complication that the projection angles cannot be observed. In order to reconstruct a three-dimensional model, the projection directions need to be estimated by use of an ad-hoc starting estimate of the unknown particle. In this paper we propose a methodology that does not rely on knowledge of the projection angles, to construct an objective data-dependent low-resolution approximation of the unknown structure that can serve as such a starting estimate. The approach assumes that the protein admits a suitable sparse representation, and employs discrete $L^1$-regularization (LASSO) as well as notions from shape theory to tackle the peculiar challenges involved in the associated inverse problem. We illustrate the approach by application to the reconstruction of an E. coli protein component called the Klenow fragment.

Victor M. Panaretos

What is connected

Connect this record

See the researcher in context

Building this map preview

16 published item(s)

Functional Estimation of Anisotropic Covariance and Autocovariance Operators on the Sphere

Minimax Rate for Optimal Transport Regression Between Distributions

On the rate of convergence for the autocorrelation operator in functional autoregression

Separable Expansions for Covariance Estimation

The Completion of Covariance Kernels

Functional Lagged Regression with Sparse Noisy Observations

Functional Registration and Local Variations: Identifiability, Rank, and Tuning

Spectral Simulation of Functional Time Series

Testing for the Rank of a Covariance Operator

Sparsely Observed Functional Time Series: Estimation and Prediction

Amplitude and phase variation of point processes

Statistical unfolding of elementary particle spectra: Empirical Bayes estimation and bias-corrected uncertainty quantification

Empirical Bayes unfolding of elementary particle spectra at the Large Hadron Collider

Fourier analysis of stationary time series in function space

A Conversation with David R. Brillinger

Sparse approximations of protein structure from noisy random projections