Researcher profile

Arnak Dalalyan

Arnak Dalalyan contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
9works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

9 published item(s)

preprint2013arXiv

Minimax testing of a composite null hypothesis defined via a quadratic functional in the model of regression

We consider the problem of testing a particular type of composite null hypothesis under a nonparametric multivariate regression model. For a given quadratic functional $Q$, the null hypothesis states that the regression function $f$ satisfies the constraint $Q[f]=0$, while the alternative corresponds to the functions for which $Q[f]$ is bounded away from zero. On the one hand, we provide minimax rates of testing and the exact separation constants, along with a sharp-optimal testing procedure, for diagonal and nonnegative quadratic functionals. We consider smoothness classes of ellipsoidal form and check that our conditions are fulfilled in the particular case of ellipsoids corresponding to anisotropic Sobolev classes. In this case, we present a closed form of the minimax rate and the separation constant. On the other hand, minimax rates for quadratic functionals which are neither positive nor negative makes appear two different regimes: "regular" and "irregular". In the "regular" case, the minimax rate is equal to $n^{-1/4}$ while in the "irregular" case, the rate depends on the smoothness class and is slower than in the "regular" case. We apply this to the issue of testing the equality of norms of two functions observed in noisy environments.

preprint2013arXiv

Sharp Oracle Inequalities for Aggregation of Affine Estimators

We consider the problem of combining a (possibly uncountably infinite) set of affine estimators in non-parametric regression model with heteroscedastic Gaussian noise. Focusing on the exponentially weighted aggregate, we prove a PAC-Bayesian type inequality that leads to sharp oracle inequalities in discrete but also in continuous settings. The framework is general enough to cover the combinations of various procedures such as least square regression, kernel ridge regression, shrinking estimators and many other estimators used in the literature on statistical inverse problems. As a consequence, we show that the proposed aggregate provides an adaptive estimator in the exact minimax sense without neither discretizing the range of tuning parameters nor splitting the set of observations. We also illustrate numerically the good performance achieved by the exponentially weighted aggregate.

preprint2013arXiv

Statistical inference in compound functional models

We consider a general nonparametric regression model called the compound model. It includes, as special cases, sparse additive regression and nonparametric (or linear) regression with many covariates but possibly a small number of relevant covariates. The compound model is characterized by three main parameters: the structure parameter describing the "macroscopic" form of the compound function, the "microscopic" sparsity parameter indicating the maximal number of relevant covariates in each component and the usual smoothness parameter corresponding to the complexity of the members of the compound. We find non-asymptotic minimax rate of convergence of estimators in such a model as a function of these three parameters. We also show that this rate can be attained in an adaptive way.

preprint2013arXiv

Tight conditions for consistency of variable selection in the context of high dimensionality

We address the issue of variable selection in the regression model with very high ambient dimension, that is, when the number of variables is very large. The main focus is on the situation where the number of relevant variables, called intrinsic dimension, is much smaller than the ambient dimension d. Without assuming any parametric form of the underlying regression function, we get tight conditions making it possible to consistently estimate the set of relevant variables. These conditions relate the intrinsic dimension to the ambient dimension and to the sample size. The procedure that is provably consistent under these tight conditions is based on comparing quadratic functionals of the empirical Fourier coefficients with appropriately chosen threshold values. The asymptotic analysis reveals the presence of two quite different re gimes. The first regime is when the intrinsic dimension is fixed. In this case the situation in nonparametric regression is the same as in linear regression, that is, consistent variable selection is possible if and only if log d is small compared to the sample size n. The picture is different in the second regime, that is, when the number of relevant variables denoted by s tends to infinity as $n\to\infty$. Then we prove that consistent variable selection in nonparametric set-up is possible only if s+loglog d is small compared to log n. We apply these results to derive minimax separation rates for the problem of variable

preprint2011arXiv

Tight conditions for consistent variable selection in high dimensional nonparametric regression

We address the issue of variable selection in the regression model with very high ambient dimension, i.e., when the number of covariates is very large. The main focus is on the situation where the number of relevant covariates, called intrinsic dimension, is much smaller than the ambient dimension. Without assuming any parametric form of the underlying regression function, we get tight conditions making it possible to consistently estimate the set of relevant variables. These conditions relate the intrinsic dimension to the ambient dimension and to the sample size. The procedure that is provably consistent under these tight conditions is simple and is based on comparing the empirical Fourier coefficients with an appropriately chosen threshold value.

preprint2010arXiv

Second-order asymptotic expansion for a non-synchronous covariation estimator

In this paper, we consider the problem of estimating the covariation of two diffusion processes when observations are subject to non-synchronicity. Building on recent papers \cite{Hay-Yos03, Hay-Yos04}, we derive second-order asymptotic expansions for the distribution of the Hayashi-Yoshida estimator in a fairly general setup including random sampling schemes and non-anticipative random drifts. The key steps leading to our results are a second-order decomposition of the estimator's distribution in the Gaussian set-up, a stochastic decomposition of the estimator itself and an accurate evaluation of the Malliavin covariance. To give a concrete example, we compute the constants involved in the resulting expansions for the particular case of sampling scheme generated by two independent Poisson processes.

preprint2010arXiv

Sparse Regression Learning by Aggregation and Langevin Monte-Carlo

We consider the problem of regression learning for deterministic design and independent random errors. We start by proving a sharp PAC-Bayesian type bound for the exponentially weighted aggregate (EWA) under the expected squared empirical loss. For a broad class of noise distributions the presented bound is valid whenever the temperature parameter $β$ of the EWA is larger than or equal to $4σ^2$, where $σ^2$ is the noise variance. A remarkable feature of this result is that it is valid even for unbounded regression functions and the choice of the temperature parameter depends exclusively on the noise level. Next, we apply this general bound to the problem of aggregating the elements of a finite-dimensional linear space spanned by a dictionary of functions $ϕ_1,...,ϕ_M$. We allow $M$ to be much larger than the sample size $n$ but we assume that the true regression function can be well approximated by a sparse linear combination of functions $ϕ_j$. Under this sparsity scenario, we propose an EWA with a heavy tailed prior and we show that it satisfies a sparsity oracle inequality with leading constant one. Finally, we propose several Langevin Monte-Carlo algorithms to approximately compute such an EWA when the number $M$ of aggregated functions can be large. We discuss in some detail the convergence of these algorithms and present numerical experiments that confirm our theoretical findings.

preprint2005arXiv

Asymptotic statistical equivalence for ergodic diffusions: the multidimensional case

Asymptotic local equivalence in the sense of Le Cam is established for inference on the drift in multidimensional ergodic diffusions and an accompanying sequence of Gaussian shift experiments. The nonparametric local neighbourhoods can be attained for any dimension, provided the regularity of the drift is sufficiently large. In addition, a heteroskedastic Gaussian regression experiment is given, which is also locally asymptotically equivalent and which does not depend on the centre of localisation. For one direction of the equivalence an explicit Markov kernel is constructed.