Researcher profile

Cristina Butucea

Cristina Butucea contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
13works
0followers
6topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

13 published item(s)

preprint2022arXiv

Interactive versus non-interactive locally differentially private estimation: Two elbows for the quadratic functional

Local differential privacy has recently received increasing attention from the statistics community as a valuable tool to protect the privacy of individual data owners without the need of a trusted third party. Similar to the classical notion of randomized response, the idea is that data owners randomize their true information locally and only release the perturbed data. Many different protocols for such local perturbation procedures can be designed. In most estimation problems studied in the literature so far, however, no significant difference in terms of minimax risk between purely non-interactive protocols and protocols that allow for some amount of interaction between individual data providers could be observed. In this paper we show that for estimating the integrated square of a density, sequentially interactive procedures improve substantially over the best possible non-interactive procedure in terms of minimax rate of estimation. In particular, in the non-interactive scenario we identify an elbow in the minimax rate at $s=\frac34$, whereas in the sequentially interactive scenario the elbow is at $s=\frac12$. This is markedly different from both, the case of direct observations, where the elbow is well known to be at $s=\frac14$, as well as from the case where Laplace noise is added to the original data, where an elbow at $s= \frac94$ is obtained. We also provide adaptive estimators that achieve the optimal rate up to log-factors, we draw connections to non-parametric goodness-of-fit testing and estimation of more general integral functionals and conduct a series of numerical experiments. The fact that a particular locally differentially private, but interactive, mechanism improves over the simple non-interactive one is also of great importance for practical implementations of local differential privacy.

preprint2022arXiv

Phase transitions for support recovery under local differential privacy

We address the problem of variable selection in a high-dimensional but sparse mean model, under the additional constraint that only privatised data are available for inference. The original data are vectors with independent entries having a symmetric, strongly log-concave distribution on $\mathbb{R}$. For this purpose, we adopt a recent generalisation of classical minimax theory to the framework of local $α-$differential privacy. We provide lower and upper bounds on the rate of convergence for the expected Hamming loss over classes of at most $s$-sparse vectors whose non-zero coordinates are separated from $0$ by a constant $a>0$. As corollaries, we derive necessary and sufficient conditions (up to log factors) for exact recovery and for almost full recovery. When we restrict our attention to non-interactive mechanisms that act independently on each coordinate our lower bound shows that, contrary to the non-private setting, both exact and almost full recovery are impossible whatever the value of $a$ in the high-dimensional regime such that $n α^2/ d^2\lesssim 1$. However, in the regime $nα^2/d^2\gg \log(d)$ we can exhibit a critical value $a^*$ (up to a logarithmic factor) such that exact and almost full recovery are possible for all $a\gg a^*$ and impossible for $a\leq a^*$. We show that these results can be improved when allowing for all non-interactive (that act globally on all coordinates) locally $α-$differentially private mechanisms in the sense that phase transitions occur at lower levels.

preprint2021arXiv

Fast Non-Asymptotic Testing And Support Recovery For Large Sparse Toeplitz Covariance Matrices

We consider $n$ independent $p$-dimensional Gaussian vectors with covariance matrix having Toeplitz structure. We test that these vectors have independent components against a stationary distribution with sparse Toeplitz covariance matrix, and also select the support of non-zero entries. We assume that the non-zero values can occur in the recent past (time-lag less than $p/2$). We build test procedures that combine a sum and a scan-type procedures, but are computationally fast, and show their non-asymptotic behaviour in both one-sided (only positive correlations) and two-sided alternatives, respectively. We also exhibit a selector of significant lags and bound the Hamming-loss risk of the estimated support. These results can be extended to the case of nearly Toeplitz covariance structure and to sub-Gaussian vectors. Numerical results illustrate the excellent behaviour of both test procedures and support selectors - larger the dimension $p$, faster are the rates.

preprint2021arXiv

Variable selection, monotone likelihood ratio and group sparsity

In the pivotal variable selection problem, we derive the exact non-asymptotic minimax selector over the class of all $s$-sparse vectors, which is also the Bayes selector with respect to the uniform prior. While this optimal selector is, in general, not realizable in polynomial time, we show that its tractable counterpart (the scan selector) attains the minimax expected Hamming risk to within factor 2, and is also exact minimax with respect to the probability of wrong recovery. As a consequence, we establish explicit lower bounds under the monotone likelihood ratio property and we obtain a tight characterization of the minimax risk in terms of the best separable selector risk. We apply these general results to derive necessary and sufficient conditions of exact and almost full recovery in the location model with light tail distributions and in the problem of group variable selection under Gaussian noise.

preprint2020arXiv

Locally private non-asymptotic testing of discrete distributions is faster using interactive mechanisms

We find separation rates for testing multinomial or more general discrete distributions under the constraint of local differential privacy. We construct efficient randomized algorithms and test procedures, in both the case where only non-interactive privacy mechanisms are allowed and also in the case where all sequentially interactive privacy mechanisms are allowed. The separation rates are faster in the latter case. We prove general information theoretical bounds that allow us to establish the optimality of our algorithms among all pairs of privacy mechanisms and test procedures, in most usual cases. Considered examples include testing uniform, polynomially and exponentially decreasing distributions.

preprint2014arXiv

Semiparametric topographical mixture models with symmetric errors

Motivated by the analysis of a Positron Emission Tomography (PET) imaging data considered in Bowen et al. (2012), we introduce a semiparametric topographical mixture model able to capture the characteristics of dichotomous shifted response-type experiments. We propose a local estimation procedure, based on the symmetry of the local noise, for the proportion and locations functions involved in the proposed model. We establish under mild conditions the minimax properties and asymptotic normality of our estimators when Monte Carlo simulations are conducted to examine their finite sample performance. Finally a statistical analysis of the PET imaging data in Bowen et al. (2012) is illustrated for the proposed method.

preprint2013arXiv

Detection of a sparse submatrix of a high-dimensional noisy matrix

We observe a $N\times M$ matrix $Y_{ij}=s_{ij}+ξ_{ij}$ with $ξ_{ij}\sim {\mathcal {N}}(0,1)$ i.i.d. in $i,j$, and $s_{ij}\in \mathbb {R}$. We test the null hypothesis $s_{ij}=0$ for all $i,j$ against the alternative that there exists some submatrix of size $n\times m$ with significant elements in the sense that $s_{ij}\ge a>0$. We propose a test procedure and compute the asymptotical detection boundary $a$ so that the maximal testing risk tends to 0 as $M\to\infty$, $N\to\infty$, $p=n/N\to0$, $q=m/M\to0$. We prove that this boundary is asymptotically sharp minimax under some additional constraints. Relations with other testing problems are discussed. We propose a testing procedure which adapts to unknown $(n,m)$ within some given set and compute the adaptive sharp rates. The implementation of our test procedure on synthetic data shows excellent behavior for sparse, not necessarily squared matrices. We extend our sharp minimax results in different directions: first, to Gaussian matrices with unknown variance, next, to matrices of random variables having a distribution from an exponential family (non-Gaussian) and, finally, to a two-sided alternative for matrices with Gaussian elements.

preprint2013arXiv

Maximum entropy copula with given diagonal section

We consider copulas with a given diagonal section and compute the explicit density of the unique optimal copula which maximizes the entropy. In this sense, this copula is the least informative among the copulas with a given diagonal section. We give an explicit criterion on the diagonal section for the existence of the optimal copula and give a closed formula for its entropy. We also provide examples for some diagonal sections of usual bivariate copulas and illustrate the differences between them and the maximum entropy copula with the same diagonal section.

preprint2013arXiv

Sharp detection of smooth signals in a high-dimensional sparse matrix with indirect observations

We consider a matrix-valued Gaussian sequence model, that is, we observe a sequence of high-dimensional $M \times N$ matrices of heterogeneous Gaussian random variables $x_{ij,k}$ for $i \in\{1,...,M\}$, $j \in \{1,...,N\}$ and $k \in \mathbb{Z}$. The standard deviation of our observations is $\ep k^s$ for some $\ep >0$ and $s \geq 0$. We give sharp rates for the detection of a sparse submatrix of size $m \times n$ with active components. A component $(i,j)$ is said active if the sequence $\{x_{ij,k}\}_k$ have mean $\{θ_{ij,k}\}_k$ within a Sobolev ellipsoid of smoothness $τ>0$ and total energy $\sum_k θ^2_{ij,k} $ larger than some $r^2_\ep$. Our rates involve relationships between $m,\, n, \, M$ and $N$ tending to infinity such that $m/M$, $n/N$ and $\ep$ tend to 0, such that a test procedure that we construct has asymptotic minimax risk tending to 0. We prove corresponding lower bounds under additional assumptions on the relative size of the submatrix in the large matrix of observations. Except for these additional conditions our rates are asymptotically sharp. Lower bounds for hypothesis testing problems mean that no test procedure can distinguish between the null hypothesis (no signal) and the alternative, i.e. the minimax risk for testing tends to 1.

preprint2013arXiv

Sharp Variable Selection of a Sparse Submatrix in a High-Dimensional Noisy Matrix

We observe a $N\times M$ matrix of independent, identically distributed Gaussian random variables which are centered except for elements of some submatrix of size $n\times m$ where the mean is larger than some $a>0$. The submatrix is sparse in the sense that $n/N$ and $m/M$ tend to 0, whereas $n,\, m, \, N$ and $M$ tend to infinity. We consider the problem of selecting the random variables with significantly large mean values. We give sufficient conditions on $a$ as a function of $n,\, m,\,N$ and $M$ and construct a uniformly consistent procedure in order to do sharp variable selection. We also prove the minimax lower bounds under necessary conditions which are complementary to the previous conditions. The critical values $a^*$ separating the necessary and sufficient conditions are sharp (we show exact constants). We note a gap between the critical values $a^*$ for selection of variables and that of detecting that such a submatrix exists given by Butucea and Ingster (2012). When $a^*$ is in this gap, consistent detection is possible but no consistent selector of the corresponding variables can be found.

preprint2011arXiv

Semiparametric mixtures of symmetric distributions

We consider in this paper the semiparametric mixture of two distributions equal up to a shift parameter. The model is said to be semiparametric in the sense that the mixed distribution is not supposed to belong to a parametric family. In order to insure the identifiability of the model it is assumed that the mixed distribution is symmetric, the model being then defined by the mixing proportion, two location parameters, and the probability density function of the mixed distribution. We propose a new class of M-estimators of these parameters based on a Fourier approach, and prove that they are square root consistent under mild regularity conditions. Their finite-sample properties are illustrated by a Monte Carlo study and a benchmark real dataset is also studied with our method.

preprint2010arXiv

Quantum U-statistics

The notion of a $U$-statistic for an $n$-tuple of identical quantum systems is introduced in analogy to the classical (commutative) case: given a selfadjoint `kernel&#39; $K$ acting on $(\mathbb{C}^{d})^{\otimes r}$ with $r<n$, we define the symmetric operator $U_{n}= {n \choose r} \sum_βK^{(β)}$ with $K^{(β)}$ being the kernel acting on the subset $β$ of $\{1,\dots ,n\}$. If the systems are prepared in the i.i.d state $ρ^{\otimes n}$ it is shown that the sequence of properly normalised $U$-statistics converges in moments to a linear combination of Hermite polynomials in canonical variables of a CCR algebra defined through the Quantum Central Limit Theorem. In the special cases of non-degenerate kernels and kernels of order $2$ it is shown that the convergence holds in the stronger distribution sense. Two types of applications in quantum statistics are described: testing beyond the two simple hypotheses scenario, and quantum metrology with interacting hamiltonians.

preprint2007arXiv

Minimax and adaptive estimation of the Wigner function in quantum homodyne tomography with noisy data

We estimate the quantum state of a light beam from results of quantum homodyne measurements performed on identically prepared quantum systems. The state is represented through the Wigner function, a generalized probability density on $\mathbb{R}^2$ which may take negative values and must respect intrinsic positivity constraints imposed by quantum physics. The effect of the losses due to detection inefficiencies, which are always present in a real experiment, is the addition to the tomographic data of independent Gaussian noise. We construct a kernel estimator for the Wigner function, prove that it is minimax efficient for the pointwise risk over a class of infinitely differentiable functions, and implement it for numerical results. We construct adaptive estimators, that is, which do not depend on the smoothness parameters, and prove that in some setups they attain the minimax rates for the corresponding smoothness class.