Researcher profile

Angelika Rohde

Angelika Rohde contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
8works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

8 published item(s)

preprint2022arXiv

A central limit theorem concerning uncertainty in estimates of individual admixture

The concept of individual admixture (IA) assumes that the genome of individuals is composed of alleles inherited from $K$ ancestral populations. Each copy of each allele has the same chance $q_k$ to originate from population $k$, and together with the allele frequencies $p$ in all populations at all $M$ markers, comprises the admixture model. Here, we assume a supervised scheme, i.e.\ allele frequencies $p$ are given through a reference database of size $N$, and $q$ is estimated via maximum likelihood for a single sample. We study laws of large numbers and central limit theorems describing effects of finiteness of both, $M$ and $N$, on the estimate of $q$. We recall results for the effect of finite $M$, and provide a central limit theorem for the effect of finite $N$, introduce a new way to express the uncertainty in estimates in standard barplots, give simulation results, and discuss applications in forensic genetics.

preprint2022arXiv

Interactive versus non-interactive locally differentially private estimation: Two elbows for the quadratic functional

Local differential privacy has recently received increasing attention from the statistics community as a valuable tool to protect the privacy of individual data owners without the need of a trusted third party. Similar to the classical notion of randomized response, the idea is that data owners randomize their true information locally and only release the perturbed data. Many different protocols for such local perturbation procedures can be designed. In most estimation problems studied in the literature so far, however, no significant difference in terms of minimax risk between purely non-interactive protocols and protocols that allow for some amount of interaction between individual data providers could be observed. In this paper we show that for estimating the integrated square of a density, sequentially interactive procedures improve substantially over the best possible non-interactive procedure in terms of minimax rate of estimation. In particular, in the non-interactive scenario we identify an elbow in the minimax rate at $s=\frac34$, whereas in the sequentially interactive scenario the elbow is at $s=\frac12$. This is markedly different from both, the case of direct observations, where the elbow is well known to be at $s=\frac14$, as well as from the case where Laplace noise is added to the original data, where an elbow at $s= \frac94$ is obtained. We also provide adaptive estimators that achieve the optimal rate up to log-factors, we draw connections to non-parametric goodness-of-fit testing and estimation of more general integral functionals and conduct a series of numerical experiments. The fact that a particular locally differentially private, but interactive, mechanism improves over the simple non-interactive one is also of great importance for practical implementations of local differential privacy.

preprint2022arXiv

Non-uniform bounds and Edgeworth expansions in self-normalized limit theorems

We study Edgeworth expansions in limit theorems for self-normalized sums. Non-uniform bounds for expansions in the central limit theorem are established while only imposing minimal moment conditions. Within this result, we address the case of non-integer moments leading to a reduced remainder. Furthermore, we provide non-uniform bounds for expansions in local limit theorems. The enhanced tail-accuracy of our non-uniform bounds allows for deriving an Edgeworth-type expansion in the entropic central limit theorem as well as a central limit theorem in total variation distance for self-normalized sums.

preprint2016arXiv

Adaptation to lowest density regions with application to support recovery

A scheme for locally adaptive bandwidth selection is proposed which sensitively shrinks the bandwidth of a kernel estimator at lowest density regions such as the support boundary which are unknown to the statistician. In case of a Hölder continuous density, this locally minimax-optimal bandwidth is shown to be smaller than the usual rate, even in case of homogeneous smoothness. Some new type of risk bound with respect to a density-dependent standardized loss of this estimator is established. This bound is fully nonasymptotic and allows to deduce convergence rates at lowest density regions that can be substantially faster than $n^{-1/2}$. It is complemented by a weighted minimax lower bound which splits into two regimes depending on the value of the density. The new estimator adapts into the second regime, and it is shown that simultaneous adaptation into the fastest regime is not possible in principle as long as the Hölder exponent is unknown. Consequences on plug-in rules for support recovery are worked out in detail. In contrast to those with classical density estimators, the plug-in rules based on the new construction are minimax-optimal, up to some logarithmic factor.

preprint2016arXiv

Spectral analysis of high-dimensional sample covariance matrices with missing observations

We study high-dimensional sample covariance matrices based on independent random vectors with missing coordinates. The presence of missing observations is common in modern applications such as climate studies or gene expression micro-arrays. A weak approximation on the spectral distribution in the &#34;large dimension $d$ and large sample size $n$&#34; asymptotics is derived for possibly different observation probabilities in the coordinates. The spectral distribution turns out to be strongly influenced by the missingness mechanism. In the null case under the missing at random scenario where each component is observed with the same probability $p$, the limiting spectral distribution is a Marčenko-Pastur law shifted by $(1-p)/p$ to the left. As $d/n\rightarrow y< 1$, the almost sure convergence of the extremal eigenvalues to the respective boundary points of the support of the limiting spectral distribution is proved, which are explicitly given in terms of $y$ and $p$. Eventually, the sample covariance matrix is positive definite if $p$ is larger than $$ 1-\left(1-\sqrt{y}\right)^2, $$ whereas this is not true any longer if $p$ is smaller than this quantity.

preprint2012arXiv

Accuracy of empirical projections of high-dimensional Gaussian matrices

Let $X=C+\mathrm{E}$ with a deterministic matrix $C\in\R^{M\times M}$ and $\mathrm{E}$ some centered Gaussian $M\times M$-matrix whose entries are independent with variance $σ^2$. In the present work, the accuracy of reduced-rank projections of $X$ is studied. Non-asymptotic universal upper and lower bounds are derived, and favorable and unfavorable prototypes of matrices $C$ in terms of the accuracy of approximation are characterized. The approach does not involve analytic perturbation theory of linear operators and allows for multiplicities in the singular value spectrum. Our main result is some general non-asymptotic upper bound on the accuracy of approximation which involves explicitly the singular values of $C$, and which is shown to be sharp in various regimes of $C$. The results are accompanied by lower bounds under diverse assumptions. Consequences on statistical estimation problems, in particular in the recent area of low-rank matrix recovery, are discussed.

preprint2011arXiv

Estimation of high-dimensional low-rank matrices

Suppose that we observe entries or, more generally, linear combinations of entries of an unknown $m\times T$-matrix $A$ corrupted by noise. We are particularly interested in the high-dimensional setting where the number $mT$ of unknown entries can be much larger than the sample size $N$. Motivated by several applications, we consider estimation of matrix $A$ under the assumption that it has small rank. This can be viewed as dimension reduction or sparsity assumption. In order to shrink toward a low-rank representation, we investigate penalized least squares estimators with a Schatten-$p$ quasi-norm penalty term, $p\leq1$. We study these estimators under two possible assumptions---a modified version of the restricted isometry condition and a uniform bound on the ratio &#34;empirical norm induced by the sampling operator/Frobenius norm.&#34; The main results are stated as nonasymptotic upper bounds on the prediction risk and on the Schatten-$q$ risk of the estimators, where $q\in[p,2]$. The rates that we obtain for the prediction risk are of the form $rm/N$ (for $m=T$), up to logarithmic factors, where $r$ is the rank of $A$. The particular examples of multi-task learning and matrix completion are worked out in detail. The proofs are based on tools from the theory of empirical processes. As a by-product, we derive bounds for the $k$th entropy numbers of the quasi-convex Schatten class embeddings $S_p^M\hookrightarrow S_2^M$, $p<1$, which are of independent interest.

preprint2011arXiv

Uniform Central Limit Theorems for Multidimensional Diffusions

It has recently been shown that there are substantial differences in the regularity behavior of the empirical process based on scalar diffusions as compared to the classical empirical process, due to the existence of diffusion local time. Besides establishing strong parallels to classical theory such as Ossiander&#39;s bracketing CLT and the general Giné-Zinn CLT for uniformly bounded families of functions, we find increased regularity also for multivariate ergodic diffusions, assuming that the invariant measure is finite with Lebesgue density $π$. The effect is diminishing for growing dimension but always present. The fine differences to the classical iid setting are worked out using exponential inequalities for martingales and additive functionals of continuous Markov processes as well as the characterization of the sample path behavior of Gaussian processes by means of the generic chaining bound. To uncover the phenomenon, we study a smoothed version of the empirical diffusion process. It turns out that uniform weak convergence of the smoothed empirical diffusion process under necessary and sufficient conditions can take place with even exponentially small bandwidth in dimension $d=2$, and with strongly undersmoothing bandwidth choice for parameters $β> d/2$ in case $d\geq 3$, assuming that the coordinates of drift and diffusion coefficient belong to some Hölder ball with parameter $β$.