Source author record

Vladimir Spokoiny

Vladimir Spokoiny appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.ST Statistics Theory math.OC Machine Learning Methodology Computation math.NA math.PR Numerical Analysis

Catalog footprint

What is connected

25works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

High-Dimensional Change Point Detection using Graph Spanning Ratio

Inspired by graph-based methodologies, we introduce a novel graph-spanning algorithm designed to identify changes in both offline and online data across low to high dimensions. This versatile approach is applicable to Euclidean and graph-structured data with unknown distributions, while maintaining control over error probabilities. Theoretically, we demonstrate that the algorithm achieves high detection power when the magnitude of the change surpasses the lower bound of the minimax separation rate, which scales on the order of $\sqrt{nd}$. Our method outperforms other techniques in terms of accuracy for both Gaussian and non-Gaussian data. Notably, it maintains strong detection power even with small observation windows, making it particularly effective for online environments where timely and precise change detection is critical.

preprint2023arXiv

Accelerated gradient methods with absolute and relative noise in the gradient

In this paper, we investigate accelerated first-order methods for smooth convex optimization problems under inexact information on the gradient of the objective. The noise in the gradient is considered to be additive with two possibilities: absolute noise bounded by a constant, and relative noise proportional to the norm of the gradient. We investigate the accumulation of the errors in the convex and strongly convex settings with the main difference with most of the previous works being that the feasible set can be unbounded. The key to the latter is to prove a bound on the trajectory of the algorithm. We also give a stopping criterion for the algorithm and consider extensions to the cases of stochastic optimization and composite nonsmooth problems.

preprint2022arXiv

Adaptive Manifold Clustering

Clustering methods seek to partition data such that elements are more similar to elements in the same cluster than to elements in different clusters. The main challenge in this task is the lack of a unified definition of a cluster, especially for high dimensional data. Different methods and approaches have been proposed to address this problem. This paper continues the study originated by Efimov, Adamyan and Spokoiny (2019) where a novel approach to adaptive nonparametric clustering called Adaptive Weights Clustering (AWC) was offered. The method allows analyzing high-dimensional data with an unknown number of unbalanced clusters of arbitrary shape under very weak modeling assumptions. The procedure demonstrates a state-of-the-art performance and is very efficient even for large data dimension D. However, the theoretical study in Efimov, Adamyan and Spokoiny (2019) is very limited and did not really address the question of efficiency. This paper makes a significant step in understanding the remarkable performance of the AWC procedure, particularly in high dimension. The approach is based on combining the ideas of adaptive clustering and manifold learning. The manifold hypothesis means that high dimensional data can be well approximated by a d-dimensional manifold for small d helping to overcome the curse of dimensionality problem and to get sharp bounds on the cluster separation which only depend on the intrinsic dimension d. We also address the problem of parameter tuning. Our general theoretical results are illustrated by some numerical experiments.

preprint2022arXiv

Adaptive Weights Community Detection

Due to the technological progress of the last decades, Community Detection has become a major topic in machine learning. However, there is still a huge gap between practical and theoretical results, as theoretically optimal procedures often lack a feasible implementation and vice versa. This paper aims to close this gap and presents a novel algorithm that is both numerically and statistically efficient. Our procedure uses a test of homogeneity to compute adaptive weights describing local communities. The approach was inspired by the Adaptive Weights Community Detection (AWCD) algorithm by Adamyan et al. (2019). This algorithm delivered some promising results on artificial and real-life data, but our theoretical analysis reveals its performance to be suboptimal on a stochastic block model. In particular, the involved estimators are biased and the procedure does not work for sparse graphs. We propose significant modifications, addressing both shortcomings and achieving a nearly optimal rate of strong consistency on the stochastic block model. Our theoretical results are illustrated and validated by numerical experiments.

preprint2022arXiv

High dimensional change-point detection: a complete graph approach

The aim of online change-point detection is for a accurate, timely discovery of structural breaks. As data dimension outgrows the number of data in observation, online detection becomes challenging. Existing methods typically test only the change of mean, which omit the practical aspect of change of variance. We propose a complete graph-based, change-point detection algorithm to detect change of mean and variance from low to high-dimensional online data with a variable scanning window. Inspired by complete graph structure, we introduce graph-spanning ratios to map high-dimensional data into metrics, and then test statistically if a change of mean or change of variance occurs. Theoretical study shows that our approach has the desirable pivotal property and is powerful with prescribed error probabilities. We demonstrate that this framework outperforms other methods in terms of detection power. Our approach has high detection power with small and multiple scanning window, which allows timely detection of change-point in the online setting. Finally, we applied the method to financial data to detect change-points in S&P 500 stocks.

preprint2022arXiv

Reinforced optimal control

Least squares Monte Carlo methods are a popular numerical approximation method for solving stochastic control problems. Based on dynamic programming, their key feature is the approximation of the conditional expectation of future rewards by linear least squares regression. Hence, the choice of basis functions is crucial for the accuracy of the method. Earlier work by some of us [Belomestny, Schoenmakers, Spokoiny, Zharkynbay. Commun.~Math.~Sci., 18(1):109-121, 2020](arXiv:1808.02341) proposes to reinforce the basis functions in the case of optimal stopping problems by already computed value functions for later times, thereby considerably improving the accuracy with limited additional computational cost. We extend the reinforced regression method to a general class of stochastic control problems, while considerably improving the method's efficiency, as demonstrated by substantial numerical examples as well as theoretical analysis.

preprint2022arXiv

Structure-adaptive manifold estimation

We consider a problem of manifold estimation from noisy observations. Many manifold learning procedures locally approximate a manifold by a weighted average over a small neighborhood. However, in the presence of large noise, the assigned weights become so corrupted that the averaged estimate shows very poor performance. We suggest a structure-adaptive procedure, which simultaneously reconstructs a smooth manifold and estimates projections of the point cloud onto this manifold. The proposed approach iteratively refines the weights on each step, using the structural information obtained at previous steps. After several iterations, we obtain nearly "oracle" weights, so that the final estimates are nearly efficient even in the presence of relatively large noise. In our theoretical study, we establish tight lower and upper bounds proving asymptotic optimality of the method for manifold estimation under the Hausdorff loss, provided that the noise degrades to zero fast enough.

preprint2020arXiv

Accuracy of Gaussian approximation in nonparametric Bernstein -- von Mises Theorem

The prominent Bernstein -- von Mises (BvM) result claims that the posterior distribution after centering by the efficient estimator and standardizing by the square root of the total Fisher information is nearly standard normal. In particular, the prior completely washes out from the asymptotic posterior distribution. This fact is fundamental and justifies the Bayes approach from the frequentist viewpoint. In the nonparametric setup the situation changes dramatically and the impact of prior becomes essential even for the contraction of the posterior; see [vdV2008], [Bo2011], [CaNi2013,CaNi2014] for different models like Gaussian regression or i.i.d. model in different weak topologies. This paper offers another non-asymptotic approach to studying the behavior of the posterior for a special but rather popular and useful class of statistical models and for Gaussian priors. First we derive tight finite sample bounds on posterior contraction in terms of the so called effective dimension of the parameter space. Our main results describe the accuracy of Gaussian approximation of the posterior. In particular, we show that restricting to the class of all centrally symmetric credible sets around pMLE allows to get Gaussian approximation up to order (n^{-1}). We also show that the posterior distribution mimics well the distribution of the penalized maximum likelihood estimator (pMLE) and reduce the question of reliability of credible sets to consistency of the pMLE-based confidence sets. The obtained results are specified for nonparametric log-density estimation and generalized regression.

preprint2020arXiv

Adaptive Gradient Descent for Convex and Non-Convex Stochastic Optimization

In this paper we propose several adaptive gradient methods for stochastic optimization. Unlike AdaGrad-type of methods, our algorithms are based on Armijo-type line search and they simultaneously adapt to the unknown Lipschitz constant of the gradient and variance of the stochastic approximation for the gradient. We consider an accelerated and non-accelerated gradient descent for convex problems and gradient descent for non-convex problems. In the experiments we demonstrate superiority of our methods to existing adaptive methods, e.g. AdaGrad and Adam.

preprint2020arXiv

Bayesian inference for nonlinear inverse problems

Bayesian methods are actively used for parameter identification and uncertainty quantification when solving nonlinear inverse problems with random noise. However, there are only few theoretical results justifying the Bayesian approach. Recent papers, see e.g. \cite{Nickl2017,lu2017bernsteinvon} and references therein, illustrate the main difficulties and challenges in studying the properties of the posterior distribution in the nonparametric setup. This paper offers a new approach for study the frequentist properties of the nonparametric Bayes procedures. The idea of the approach is to relax the nonlinear structural equation by introducing an auxiliary functional parameter and replacing the structural equation with a penalty and by imposing a prior on the auxiliary parameter. For the such extended model, we state sharp bounds on posterior concentration and on the accuracy of the penalized MLE and on Gaussian approximation of the posterior, and a number of further results. All the bounds are given in terms of effective dimension, and we show that the proposed calming device does not significantly affect this value.

preprint2016arXiv

Efficient numerical algorithms for regularized regression problem with applications to traffic matrix estimations

In this work we collect and compare to each other many different numerical methods for regularized regression problem and for the problem of projection on a hyperplane. Such problems arise, for example, as a subproblem of demand matrix estimation in IP- networks. In this special case matrix of affine constraints has special structure: all elements are 0 or 1 and this matrix is sparse enough. We have to deal with huge-scale convex optimization problem of special type. Using the properties of the problem we try "to look inside the black-box" and to see how the best modern methods work being applied to this problem.

preprint2016arXiv

Universal method with inexact oracle and its applications for searching equillibriums in multistage transport problems

In this paper we propose a new efficient approach for numerical calculation of equillibriums in multistage transport problems. In the very core of our approach lies the proper combination of Universal Gradient Method proposed by Yu. Nesterov (2013) and conception of inexact oracle (Devolder--Glineur--Nesterov, 2011). In particular our technique allows us to calculate Wasserstein's Barycenter in a fast manner (this results generalized M. Cuturi et al. (2014)).

preprint2015arXiv

Bootstrap confidence sets under model misspecification

A multiplier bootstrap procedure for construction of likelihood-based confidence sets is considered for finite samples and a possible model misspecification. Theoretical results justify the bootstrap validity for a small or moderate sample size and allow to control the impact of the parameter dimension $p$: the bootstrap approximation works if $p^3/n$ is small. The main result about bootstrap validity continues to apply even if the underlying parametric model is misspecified under the so-called small modelling bias condition. In the case when the true model deviates significantly from the considered parametric family, the bootstrap procedure is still applicable but it becomes a bit conservative: the size of the constructed confidence sets is increased by the modelling bias. We illustrate the results with numerical examples for misspecified linear and logistic regressions.

preprint2015arXiv

Bootstrap tuning in ordered model selection

In the problem of model selection for a given family of linear estimators, ordered by their variance, we offer a new "smallest accepted" approach motivated by Lepski's method and multiple testing theory. The procedure selects the smallest model which satisfies an acceptance rule based on comparison with all larger models. The method is completely data-driven and does not use any prior information about the variance structure of the noise: its parameters are adjusted to the underlying possibly heterogeneous noise by the so-called "propagation condition" using a wild bootstrap method. The validity of the bootstrap calibration is proved for finite samples with an explicit error bound. We provide a comprehensive theoretical study of the method and describe in detail the set of possible values of the selector $ \hat{m} $. We also establish some precise oracle error bounds for the corresponding estimator $ \hatθ = \tildeθ_{\hat{m}} $ which equally applies to estimation of the whole parameter vectors, some subvector or linear mapping, as well as the estimation of a linear functional.

preprint2015arXiv

Penalized maximum likelihood estimation and effective dimension

This paper extends some prominent statistical results including \emph{Fisher Theorem and Wilks phenomenon} to the penalized maximum likelihood estimation with a quadratic penalization. It appears that sharp expansions for the penalized MLE $\tilde{\thetav}_{G} $ and for the penalized maximum likelihood can be obtained without involving any asymptotic arguments, the results only rely on smoothness and regularity properties of the of the considered log-likelihood function. The error of estimation is specified in terms of the effective dimension $p_G $ of the parameter set which can be much smaller than the true parameter dimension and even allows an infinite dimensional functional parameter. In the i.i.d. case, the Fisher expansion for the penalized MLE can be established under the constraint "$p_G^{2}/n$ is small" while the remainder in the Wilks result is of order $p_G^{3}/n $.

preprint2015arXiv

Two convergence results for an alternation maximization procedure

Andresen and Spokoiny's (2013) ``critical dimension in semiparametric estimation`` provide a technique for the finite sample analysis of profile M-estimators. This paper uses very similar ideas to derive two convergence results for the alternating procedure to approximate the maximizer of random functionals such as the realized log likelihood in MLE estimation. We manage to show that the sequence attains the same deviation properties as shown for the profile M-estimator in Andresen and Spokoiny (2013), i.e. a finite sample Wilks and Fisher theorem. Further under slightly stronger smoothness constraints on the random functional we can show nearly linear convergence to the global maximizer if the starting point for the procedure is well chosen.

preprint2014arXiv

Bernstein - von Mises Theorem for growing parameter dimension

This paper revisits the prominent Fisher, Wilks, and Bernstein -- von Mises (BvM) results from different viewpoints. Particular issues to address are: nonasymptotic framework with just one finite sample, possible model misspecification, and a large parameter dimension. In particular, in the case of an i.i.d. sample, the mentioned results can be stated for any smooth parametric family provided that the dimension $p $ of the parameter space satisfies the condition "$p^{2}/n $ is small" for the Fisher expansion, while the Wilks and the BvM results require "$p^{3}/n $ is small".

preprint2014arXiv

Critical dimension in profile semiparametric estimation

This paper revisits the classical inference results for profile quasi maximum likelihood estimators (profile MLE) in the semiparametric estimation problem. We mainly focus on two prominent theorems: the Wilks phenomenon and Fisher expansion for the profile MLE are stated in a new fashion allowing finite samples and model misspecification. The method of study is also essentially different from the usual analysis of the semiparametric problem based on the notion of the hardest parametric submodel. Instead we derive finite sample deviation bounds for the linear approximation error for the gradient of the loglikelihood. This novel approach particularly allows to address the important issue of the effective target and nuisance dimension. The obtained nonasymptotic results are surprisingly sharp and yield the classical asymptotic statements including the asymptotic normality and efficiency of the profile MLE. The general results are specified to the important special cases of an i.i.d. sample and the analysis is exemplified with a single index model.

preprint2014arXiv

Finite Sample Bernstein -- von Mises Theorem for Semiparametric Problems

The classical parametric and semiparametric Bernstein -- von Mises (BvM) results are reconsidered in a non-classical setup allowing finite samples and model misspecification. In the case of a finite dimensional nuisance parameter we obtain an upper bound on the error of Gaussian approximation of the posterior distribution for the target parameter which is explicit in the dimension of the nuisance and target parameters. This helps to identify the so called \emph{critical dimension} $ p $ of the full parameter for which the BvM result is applicable. In the important i.i.d. case, we show that the condition "$ p^{3} / n $ is small" is sufficient for BvM result to be valid under general assumptions on the model. We also provide an example of a model with the phase transition effect: the statement of the BvM theorem fails when the dimension $ p $ approaches $ n^{1/3} $. The results are extended to the case of infinite dimensional parameters with the nuisance parameter from a Sobolev class. In particular we show near normality of the posterior if the smoothness parameter $s$ exceeds 3/2.

preprint2013arXiv

Concentration inequalities for smooth random fields

In this note we derive a sharp concentration inequality for the supremum of a smooth random field over a finite dimensional set. It is shown that this supremum can be bounded with high probability by the value of the field at some deterministic point plus an intrinsic dimension of the optimisation problem. As an application we prove the exponential inequality for a function of the maximal eigenvalue of a random matrix is proved.

preprint2013arXiv

Parametric estimation. Finite sample theory

The paper aims at reconsidering the famous Le Cam LAN theory. The main features of the approach which make it different from the classical one are as follows: (1) the study is nonasymptotic, that is, the sample size is fixed and does not tend to infinity; (2) the parametric assumption is possibly misspecified and the underlying data distribution can lie beyond the given parametric family. These two features enable to bridge the gap between parametric and nonparametric theory and to build a unified framework for statistical estimation. The main results include large deviation bounds for the (quasi) maximum likelihood and the local quadratic bracketing of the log-likelihood process. The latter yields a number of important corollaries for statistical inference: concentration, confidence and risk bounds, expansion of the maximum likelihood estimate, etc. All these corollaries are stated in a nonclassical way admitting a model misspecification and finite samples. However, the classical asymptotic results including the efficiency bounds can be easily derived as corollaries of the obtained nonasymptotic statements. At the same time, the new bracketing device works well in the situations with large or growing parameter dimension in which the classical parametric theory fails. The general results are illustrated for the i.i.d. setup as well as for generalized linear and median estimation. The results apply for any dimension of the parameter space and provide a quantitative lower bound on the sample size yielding the root-n accuracy.

preprint2013arXiv

Sharp deviation bounds for quadratic forms

This note presents sharp inequalities for deviation probability of a general quadratic form of a random vector $\xiv$ with finite exponential moments. The obtained deviation bounds are similar to the case of a Gaussian random vector. The results are stated under general conditions and do not suppose any special structure of the vector $\xiv$. The obtained bounds are exact (non-asymptotic), all constants are explicit and the leading terms in the bounds are sharp.

preprint2012arXiv

Local Quantile Regression

Quantile regression is a technique to estimate conditional quantile curves. It provides a comprehensive picture of a response contingent on explanatory variables. In a flexible modeling framework, a specific form of the conditional quantile curve is not a priori fixed. % Indeed, the majority of applications do not per se require specific functional forms. This motivates a local parametric rather than a global fixed model fitting approach. A nonparametric smoothing estimator of the conditional quantile curve requires to balance between local curvature and stochastic variability. In this paper, we suggest a local model selection technique that provides an adaptive estimator of the conditional quantile regression curve at each design point. Theoretical results claim that the proposed adaptive procedure performs as good as an oracle which would minimize the local estimation risk for the problem at hand. We illustrate the performance of the procedure by an extensive simulation study and consider a couple of applications: to tail dependence analysis for the Hong Kong stock market and to analysis of the distributions of the risk factors of temperature dynamics.

preprint2012arXiv

Sparse Non Gaussian Component Analysis by Semidefinite Programming

Sparse non-Gaussian component analysis (SNGCA) is an unsupervised method of extracting a linear structure from a high dimensional data based on estimating a low-dimensional non-Gaussian data component. In this paper we discuss a new approach to direct estimation of the projector on the target space based on semidefinite programming which improves the method sensitivity to a broad variety of deviations from normality. We also discuss the procedures which allows to recover the structure when its effective dimension is unknown.

preprint2012arXiv

Spatially Adaptive Density Estimation by Localised Haar Projections

Given a random sample from some unknown density $f_0: \mathbb R \to [0, \infty)$ we devise Haar wavelet estimators for $f_0$ with variable resolution levels constructed from localised test procedures (as in Lepski, Mammen, and Spokoiny (1997, Ann. Statist.)). We show that these estimators adapt to spatially heterogeneous smoothness of $f_0$, simultaneously for every point $x$ in a fixed interval, in sup-norm loss. The thresholding constants involved in the test procedures can be chosen in practice under the idealised assumption that the true density is locally constant in a neighborhood of the point $x$ of estimation, and an information theoretic justification of this practice is given.

Vladimir Spokoiny

What is connected

Connect this record

See the researcher in context

Building this map preview

25 published item(s)

High-Dimensional Change Point Detection using Graph Spanning Ratio

Accelerated gradient methods with absolute and relative noise in the gradient

Adaptive Manifold Clustering

Adaptive Weights Community Detection

High dimensional change-point detection: a complete graph approach

Reinforced optimal control

Structure-adaptive manifold estimation

Accuracy of Gaussian approximation in nonparametric Bernstein -- von Mises Theorem

Adaptive Gradient Descent for Convex and Non-Convex Stochastic Optimization

Bayesian inference for nonlinear inverse problems

Efficient numerical algorithms for regularized regression problem with applications to traffic matrix estimations

Universal method with inexact oracle and its applications for searching equillibriums in multistage transport problems

Bootstrap confidence sets under model misspecification

Bootstrap tuning in ordered model selection

Penalized maximum likelihood estimation and effective dimension

Two convergence results for an alternation maximization procedure

Bernstein - von Mises Theorem for growing parameter dimension

Critical dimension in profile semiparametric estimation

Finite Sample Bernstein -- von Mises Theorem for Semiparametric Problems

Concentration inequalities for smooth random fields

Parametric estimation. Finite sample theory

Sharp deviation bounds for quadratic forms

Local Quantile Regression

Sparse Non Gaussian Component Analysis by Semidefinite Programming

Spatially Adaptive Density Estimation by Localised Haar Projections