Topic overview

math.ST

3384 works5596 researchers

Open map Browse papers

Map preview

Start with the graph, then narrow the list

3384works

5596researchers

Next steps

Use the topic as a working map

Open the full map for clusters, then return here to scan ranked papers and people.

Inspect nearby papers, researchers, institutions and communities without opening a separate graph page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2017arXiv

On exact and optimal recovering of missing values for sequences

The paper studies recoverability of missing values for sequences in a pathwise setting without probabilistic assumptions. This setting is oriented on a situation where the underlying sequence is considered as a sole sequence rather than a member of an ensemble with known statistical properties. Sufficient conditions of recoverability are obtained; it is shown that sequences are recoverable if there is a certain degree of degeneracy of the Z-transforms. We found that, in some cases, this degree can be measured as the number of the derivatives of Z-transform vanishing at a point. For processes with non-degenerate Z-transform, an optimal recovering based on the projection on a set of recoverable sequences is suggested. Some robustness of the solution with respect to noise contamination and truncation is established.

preprint2016arXiv

A framework for statistical network modeling

Basic principles of statistical inference are commonly violated in network data analysis. Under the current approach, it is often impossible to identify a model that accommodates known empirical behaviors, possesses crucial inferential properties, and accurately models the data generating process. In the absence of one or more of these properties, sensible inference from network data cannot be assured. Our proposed framework decomposes every network model into a (relatively) exchangeable data generating process} and a sampling mechanism that relates observed data to the population network. This framework, which encompasses all models in current use as well as many new models, such as edge exchangeable and relationally exchangeable models, that lie outside the existing paradigm, offers a sound context within which to develop theory and methods for network analysis.

preprint2017arXiv

Regularization and the small-ball method I: sparse recovery

We obtain bounds on estimation error rates for regularization procedures of the form \begin{equation*} \hat f \in {\rm argmin}_{f\in F}\left(\frac{1}{N}\sum_{i=1}^N\left(Y_i-f(X_i)\right)^2+λΨ(f)\right) \end{equation*} when $Ψ$ is a norm and $F$ is convex. Our approach gives a common framework that may be used in the analysis of learning problems and regularization problems alike. In particular, it sheds some light on the role various notions of sparsity have in regularization and on their connection with the size of subdifferentials of $Ψ$ in a neighbourhood of the true minimizer. As `proof of concept' we extend the known estimates for the LASSO, SLOPE and trace norm regularization.

preprint2016arXiv

Compressed sensing and optimal denoising of monotone signals

We consider the problems of compressed sensing and optimal denoising for signals $\mathbf{x_0}\in\mathbb{R}^N$ that are monotone, i.e., $\mathbf{x_0}(i+1) \geq \mathbf{x_0}(i)$, and sparsely varying, i.e., $\mathbf{x_0}(i+1) > \mathbf{x_0}(i)$ only for a small number $k$ of indices $i$. We approach the compressed sensing problem by minimizing the total variation norm restricted to the class of monotone signals subject to equality constraints obtained from a number of measurements $A\mathbf{x_0}$. For random Gaussian sensing matrices $A\in\mathbb{R}^{m\times N}$ we derive a closed form expression for the number of measurements $m$ required for successful reconstruction with high probability. We show that the probability undergoes a phase transition as $m$ varies, and depends not only on the number of change points, but also on their location. For denoising we regularize with the same norm and derive a formula for the optimal regularizer weight that depends only mildly on $\mathbf{x_0}$. We obtain our results using the statistical dimension tool.

preprint2017arXiv

Denoising Flows on Trees

We study the estimation of flows on trees, a structured generalization of isotonic regression. A tree flow is defined recursively as a positive flow value into a node that is partitioned into an outgoing flow to the children nodes, with some amount of the flow possibly leaking outside. We study the behavior of the least squares estimator for flows, and the associated minimax lower bounds. We characterize the risk of the least squares estimator in two regimes. In the first regime the diameter of the tree grows at most logarithmically with the number of nodes. In the second regime, the tree contains long paths. The results are compared with known risk bounds for isotonic regression.

preprint2016arXiv

Spatial risk measure for gaussian processes

In this paper, we study the quantitative behavior of a spatial risk measure corresponding to a damage function and a region, taking into account the spatial dependence of the underlying process. This kind of risk measure has already been introduced and studied for some max-stable processes in [Koch2015]. In this paper, we consider isotropic Gaussian processes and the excess damage function over a threshold. We performed a simulation study and a real data study.

preprint2016arXiv

Conditional Central Limit Theorems for Gaussian Projections

This paper addresses the question of when projections of a high-dimensional random vector are approximately Gaussian. This problem has been studied previously in the context of high-dimensional data analysis, where the focus is on low-dimensional projections of high-dimensional point clouds. The focus of this paper is on the typical behavior when the projections are generated by an i.i.d. Gaussian projection matrix. The main results are bounds on the deviation between the conditional distribution of the projections and a Gaussian approximation, where the conditioning is on the projection matrix. The bounds are given in terms of the quadratic Wasserstein distance and relative entropy and are stated explicitly as a function of the number of projections and certain key properties of the random vector. The proof uses Talagrand's transportation inequality and a general integral-moment inequality for mutual information. Applications to random linear estimation and compressed sensing are discussed.

preprint2016arXiv

Data driven estimation of Laplace-Beltrami operator

Approximations of Laplace-Beltrami operators on manifolds through graph Lapla-cians have become popular tools in data analysis and machine learning. These discretized operators usually depend on bandwidth parameters whose tuning remains a theoretical and practical problem. In this paper, we address this problem for the unnormalized graph Laplacian by establishing an oracle inequality that opens the door to a well-founded data-driven procedure for the bandwidth selection. Our approach relies on recent results by Lacour and Massart [LM15] on the so-called Lepski's method.

preprint2016arXiv

Robust regression estimation and inference in the presence of cellwise and casewise contamination

Cellwise outliers are likely to occur together with casewise outliers in modern data sets with relatively large dimension. Recent work has shown that traditional robust regression methods may fail for data sets in this paradigm. The proposed method, called three-step regression, proceeds as follows: first, it uses a consistent univariate filter to detect and eliminate extreme cellwise outliers; second, it applies a robust estimator of multivariate location and scatter to the filtered data to down-weight casewise outliers; third, it computes robust regression coefficients from the estimates obtained in the second step. The three-step estimator is shown to be consistent and asymptotically normal at the central model under some assumptions on the tail distributions of the continuous covariates. The estimator is extended to handle both numerical and dummy covariates using an iterative algorithm. Extensive simulation results show that the three-step estimator is resilient to cellwise outliers. It also performs well under casewise contaminations when comparing with traditional high breakdown point estimators.

preprint2016arXiv

On the concept of Bernoulliness

The first part of this paper is another English translation of a 1986 note. It gives a natural definition of a finite Bernoulli sequence (i.e., a typical realization of a finite sequence of binary IID trials) and compares it with the Kolmogorov--Martin-Lof definition, which is interpreted as defining exchangeable sequences. The appendix gives the historical background and proofs.

preprint2016arXiv

Uniform in bandwidth consistency for the transformation kernel estimator of copulas

In this paper we establish the uniform in bandwidth consistency for the transformation kernel estimator of copulas introduced in [Omelka et al.(2009)]. To this end, we first prove a uniform in bandwidth law of the iterated logarithm for the maximal deviation of this estimator from its expectation. We then show that, as n goes to infinity, the bias of the estimator converges to zero uniformly in the bandwidth h, varying over a suitable interval. A practical method of selecting the optimal bandwidth is also presented. Finally, we make conclusive simulation experiments showing the performance of the estimator in finite samples.

preprint2016arXiv

Outliers, the Law of Large Numbers, Index of Stability and Heavy Tails

We are trying to give a mathematically correct definition of outliers. Our approach is based on the distance between two last order statistics and appears to be connected to the law of large numbers. Key words: outliers, law of large numbers, heavy tails, stability index.

preprint2016arXiv

Multivariate Location and Scatter Matrix Estimation Under Cellwise and Casewise Contamination

We consider the problem of multivariate location and scatter matrix estimation when the data contain cellwise and casewise outliers. Agostinelli et al. (2015) propose a two-step approach to deal with this problem: first, apply a univariate filter to remove cellwise outliers and second, apply a generalized S-estimator to downweight casewise outliers. We improve this proposal in three main directions. First, we introduce a consistent bivariate filter to be used in combination with the univariate filter in the first step. Second, we propose a new fast subsampling procedure to generate starting points for the generalized S-estimator in the second step. Third, we consider a non-monotonic weight function for the generalized S-estimator to better deal with casewise outliers in high dimension. A simulation study and real data example show that, unlike the original two-step procedure, the modified two-step approach performs and scales well for high dimension. Moreover, the modified procedure outperforms the original one and other state-of-the-art robust procedures under cellwise and casewise data contamination.

preprint2016arXiv

Change point estimation based on Wilcoxon tests in the presence of long-range dependence

We consider an estimator for the location of a shift in the mean of long-range dependent sequences. The estimation is based on the two-sample Wilcoxon statistic. Consistency and the rate of convergence for the estimated change point are established. In the case of a constant shift height, the $1/n$ convergence rate (with $n$ denoting the number of observations), which is typical under the assumption of independent observations, is also achieved for long memory sequences. It is proved that if the change point height decreases to $0$ with a certain rate, the suitably standardized estimator converges in distribution to a functional of a fractional Brownian motion. The estimator is tested on two well-known data sets. Finite sample behaviors are investigated in a Monte Carlo simulation study.

preprint2016arXiv

A Mathematical Framework for Feature Selection from Real-World Data with Non-Linear Observations

In this paper, we study the challenge of feature selection based on a relatively small collection of sample pairs $\{(x_i, y_i)\}_{1 \leq i \leq m}$. The observations $y_i \in \mathbb{R}$ are thereby supposed to follow a noisy single-index model, depending on a certain set of signal variables. A major difficulty is that these variables usually cannot be observed directly, but rather arise as hidden factors in the actual data vectors $x_i \in \mathbb{R}^d$ (feature variables). We will prove that a successful variable selection is still possible in this setup, even when the applied estimator does not have any knowledge of the underlying model parameters and only takes the 'raw' samples $\{(x_i, y_i)\}_{1 \leq i \leq m}$ as input. The model assumptions of our results will be fairly general, allowing for non-linear observations, arbitrary convex signal structures as well as strictly convex loss functions. This is particularly appealing for practical purposes, since in many applications, already standard methods, e.g., the Lasso or logistic regression, yield surprisingly good outcomes. Apart from a general discussion of the practical scope of our theoretical findings, we will al

preprint2016arXiv

Fully bilinear generic and lifted random processes comparisons

In our companion paper \cite{Stojnicgscomp16} we introduce a collection of fairly powerful statistical comparison results. They relate to a general comparison concept and its an upgrade that we call lifting procedure. Here we provide a different generic principle (which we call fully bilinear) that in certain cases turns out to be stronger than the corresponding one from \cite{Stojnicgscomp16}. Moreover, we also show how the principle that we introduce here can also be pushed through the lifting machinery of \cite{Stojnicgscomp16}. Finally, as was the case in \cite{Stojnicgscomp16}, here we also show how the well known Slepian's max and Gordon's minmax comparison principles can be obtained as special cases of the mechanisms that we present here. We also create their lifted upgrades which happen to be stronger than the corresponding ones in \cite{Stojnicgscomp16}. A fairly large collection of results obtained through numerical experiments is also provided. It is observed that these results are in an excellent agreement with what the theory predicts.

preprint2016arXiv

Generic and lifted probabilistic comparisons -- max replaces minmax

In this paper we introduce a collection of powerful statistical comparison results. We first present the results that we obtained while developing a general comparison concept. After that we introduce a separate lifting procedure that is a comparison concept on its own. We then show how in certain scenarios the lifting procedure basically represents a substantial upgrade over the general strategy. We complement the introduced results with a fairly large collection of numerical experiments that are in an overwhelming agreement with what the theory predicts. We also show how many well known comparison results (e.g. Slepian's max and Gordon's minmax principle) can be obtained as special cases. Moreover, it turns out that the minmax principle can be viewed as a single max principle as well. The range of applications is enormous. It starts with revisiting many of the results we created in recent years in various mathematical fields and recognizing that they are fully self-contained as their starting blocks are specialized variants of the concepts introduced here. Further upgrades relate to core comparison extensions on the one side and more practically oriented modifications on

preprint2016arXiv

Parametric inference of hidden discrete-time diffusion processes by deconvolution

We study a new parametric approach for hidden discrete-time diffusion models. This method is based on contrast minimization and deconvolution and leads to estimate a large class of stochastic models with nonlinear drift and nonlinear diffusion. It can be applied, for example, for ecological and financial state space models. After proving consistency and asymptotic normality of the estimation, leading to asymptotic confidence intervals, we provide a thorough numerical study, which compares many classical methods used in practice (Non Linear Least Square estimator, Monte Carlo Expectation Maxi-mization Likelihood estimator and Bayesian estimators) to estimate stochastic volatility model. We prove that our estimator clearly outperforms the Maximum Likelihood Estimator in term of computing time, but also most of the other methods. We also show that this contrast method is the most stable and also does not need any tuning parameter.

preprint2016arXiv

Convex clustering via $\ell_1$ fusion penalization

We study the large sample behavior of a convex clustering framework, which minimizes the sample within cluster sum of squares under an~$\ell_1$ fusion constraint on the cluster centroids. This recently proposed approach has been gaining in popularity, however, its asymptotic properties have remained mostly unknown. Our analysis is based on a novel representation of the sample clustering procedure as a sequence of cluster splits determined by a sequence of maximization problems. We use this representation to provide a simple and intuitive formulation for the population clustering procedure. We then demonstrate that the sample procedure consistently estimates its population analog, and derive the corresponding rates of convergence. The proof conducts a careful simultaneous analysis of a collection of M-estimation problems, whose cardinality grows together with the sample size. Based on the new perspectives gained from the asymptotic investigation, we propose a key post-processing modification of the original clustering framework. We show, both theoretically and empirically, that the resulting approach can be successfully used to estimate the number of clusters in the population. Usin

preprint2016arXiv

On a test of normality based on the empirical moment generating function

We provide the lacking theory for a test of normality based on the empirical moment generating function.

preprint2016arXiv

Bayes estimator for multinomial parameters and Bhattacharyya distances

We derive the Bayes estimator for the parameters of a multinomial distribution under two loss functions ($1-B$ and $1-B^2$) that are based on the Bhattacharyya coefficient $B(\vec{p},\vec{q}) = \sum{\sqrt{p_kq_k}}$. We formulate a non-commutative generalization relevant to quantum probability theory as an open problem. As an example application, we use our solution to find minimax estimators for a binomial parameter under Bhattacharyya loss ($1-B^2$).

preprint2016arXiv

Spectral algorithms for tensor completion

In the tensor completion problem, one seeks to estimate a low-rank tensor based on a random sample of revealed entries. In terms of the required sample size, earlier work revealed a large gap between estimation with unbounded computational resources (using, for instance, tensor nuclear norm minimization) and polynomial-time algorithms. Among the latter, the best statistical guarantees have been proved, for third-order tensors, using the sixth level of the sum-of-squares (SOS) semidefinite programming hierarchy (Barak and Moitra, 2014). However, the SOS approach does not scale well to large problem instances. By contrast, spectral methods --- based on unfolding or matricizing the tensor --- are attractive for their low complexity, but have been believed to require a much larger sample size. This paper presents two main contributions. First, we propose a new unfolding-based method, which outperforms naive ones for symmetric $k$-th order tensors of rank $r$. For this result we make a study of singular space estimation for partially revealed matrices of large aspect ratio, which may be of independent interest. For third-order tensors, our algorithm matches the SOS method in terms of sa

preprint2016arXiv

Note on information bias and efficiency of composite likelihood

Does the asymptotic variance of the maximum composite likelihood estimator of a parameter of interest always decrease when the nuisance parameters are known? Will a composite likelihood necessarily become more efficient by incorporating addi- tional independent component likelihoods, or by using component likelihoods with higher dimension? In this note we show through illustrative examples that the an- swer to both questions is no, and indeed the opposite direction might be observed. The role of information bias is highlighted to understand the occurrence of these paradoxical phenomenon.

preprint2016arXiv

Detecting Sparse Mixtures: Rate of Decay of Error Probability

We study the rate of decay of the probability of error for distinguishing between a sparse signal with noise, modeled as a sparse mixture, from pure noise. This problem has many applications in signal processing, evolutionary biology, bioinformatics, astrophysics and feature selection for machine learning. We let the mixture probability tend to zero as the number of observations tends to infinity and derive oracle rates at which the error probability can be driven to zero for a general class of signal and noise distributions via the likelihood ratio test. In contrast to the problem of detection of non-sparse signals, we see the log-probability of error decays sublinearly rather than linearly and is characterized through the $χ^2$-divergence rather than the Kullback-Leibler divergence for "weak" signals and can be independent of divergence for "strong" signals. Our contribution is the first characterization of the rate of decay of the error probability for this problem for both the false alarm and miss probabilities.

602 works