Source author record

Roman Vershynin

Roman Vershynin appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.PR math.FA math.ST Statistics Theory Information Theory math.IT math.NA Machine Learning Social and Information Networks Cryptography and Security math.CO Neural and Evolutionary Computing Artificial Intelligence Data Structures and Algorithms math.OC Methodology Numerical Analysis physics.soc-ph

Catalog footprint

What is connected

50works

18topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2023arXiv

Covariance loss, Szemeredi regularity, and differential privacy

We show how randomized rounding based on Grothendieck's identity can be used to prove a nearly tight bound on the covariance loss--the amount of covariance that is lost by taking conditional expectation. This result yields a new type of weak Szemeredi regularity lemma for positive semidefinite matrices and kernels. Moreover, it can be used to construct differentially private synthetic data.

preprint2022arXiv

Covariance's Loss is Privacy's Gain: Computationally Efficient, Private and Accurate Synthetic Data

The protection of private information is of vital importance in data-driven research, business, and government. The conflict between privacy and utility has triggered intensive research in the computer science and statistics communities, who have developed a variety of methods for privacy-preserving data release. Among the main concepts that have emerged are anonymity and differential privacy. Today, another solution is gaining traction, synthetic data. However, the road to privacy is paved with NP-hard problems. In this paper we focus on the NP-hard challenge to develop a synthetic data generation method that is computationally efficient, comes with provable privacy guarantees, and rigorously quantifies data utility. We solve a relaxed version of this problem by studying a fundamental, but a first glance completely unrelated, problem in probability concerning the concept of covariance loss. Namely, we find a nearly optimal and constructive answer to the question how much information is lost when we take conditional expectation. Surprisingly, this excursion into theoretical probability produces mathematical techniques that allow us to derive constructive, approximately optimal solutions to difficult applied problems concerning microaggregation, privacy, and synthetic data.

preprint2022arXiv

The Quarks of Attention

Attention plays a fundamental role in both natural and artificial intelligence systems. In deep learning, attention-based neural architectures, such as transformer architectures, are widely used to tackle problems in natural language processing and beyond. Here we investigate the fundamental building blocks of attention and their computational properties. Within the standard model of deep learning, we classify all possible fundamental building blocks of attention in terms of their source, target, and computational mechanism. We identify and study three most important mechanisms: additive activation attention, multiplicative output attention (output gating), and multiplicative synaptic attention (synaptic gating). The gating mechanisms correspond to multiplicative extensions of the standard model and are used across all current attention-based deep learning architectures. We study their functional properties and estimate the capacity of several attentional building blocks in the case of linear and polynomial threshold gates. Surprisingly, additive activation attention plays a central role in the proofs of the lower bounds. Attention mechanisms reduce the depth of certain basic circuits and leverage the power of quadratic activations without incurring their full cost.

preprint2021arXiv

A theory of capacity and sparse neural encoding

Motivated by biological considerations, we study sparse neural maps from an input layer to a target layer with sparse activity, and specifically the problem of storing $K$ input-target associations $(x,y)$, or memories, when the target vectors $y$ are sparse. We mathematically prove that $K$ undergoes a phase transition and that in general, and somewhat paradoxically, sparsity in the target layers increases the storage capacity of the map. The target vectors can be chosen arbitrarily, including in random fashion, and the memories can be both encoded and decoded by networks trained using local learning rules, including the simple Hebb rule. These results are robust under a variety of statistical assumptions on the data. The proofs rely on elegant properties of random polytopes and sub-gaussian random vector variables. Open problems and connections to capacity theories and polynomial threshold maps are discussed.

preprint2021arXiv

Marchenko-Pastur law with relaxed independence conditions

We prove the Marchenko-Pastur law for the eigenvalues of $p \times p$ sample covariance matrices in two new situations where the data does not have independent coordinates. In the first scenario - the block-independent model - the $p$ coordinates of the data are partitioned into blocks in such a way that the entries in different blocks are independent, but the entries from the same block may be dependent. In the second scenario - the random tensor model - the data is the homogeneous random tensor of order $d$, i.e. the coordinates of the data are all $\binom{n}{d}$ different products of $d$ variables chosen from a set of $n$ independent random variables. We show that Marchenko-Pastur law holds for the block-independent model as long as the size of the largest block is $o(p)$ and for the random tensor model as long as $d = o(n^{1/3})$. Our main technical tools are new concentration inequalities for quadratic forms in random variables with block-independent coordinates, and for random tensors.

preprint2021arXiv

Private sampling: a noiseless approach for generating differentially private synthetic data

In a world where artificial intelligence and data science become omnipresent, data sharing is increasingly locking horns with data-privacy concerns. Differential privacy has emerged as a rigorous framework for protecting individual privacy in a statistical database, while releasing useful statistical information about the database. The standard way to implement differential privacy is to inject a sufficient amount of noise into the data. However, in addition to other limitations of differential privacy, this process of adding noise will affect data accuracy and utility. Another approach to enable privacy in data sharing is based on the concept of synthetic data. The goal of synthetic data is to create an as-realistic-as-possible dataset, one that not only maintains the nuances of the original data, but does so without risk of exposing sensitive information. The combination of differential privacy with synthetic data has been suggested as a best-of-both-worlds solutions. In this work, we propose the first noisefree method to construct differentially private synthetic data; we do this through a mechanism called "private sampling". Using the Boolean cube as benchmark data model, we derive explicit bounds on accuracy and privacy of the constructed synthetic data. The key mathematical tools are hypercontractivity, duality, and empirical processes. A core ingredient of our private sampling mechanism is a rigorous "marginal correction" method, which has the remarkable property that importance reweighting can be utilized to exactly match the marginals of the sample to the marginals of the population.

preprint2020arXiv

Memory capacity of neural networks with threshold and ReLU activations

Overwhelming theoretical and empirical evidence shows that mildly overparametrized neural networks -- those with more connections than the size of the training data -- are often able to memorize the training data with $100\%$ accuracy. This was rigorously proved for networks with sigmoid activation functions and, very recently, for ReLU activations. Addressing a 1988 open question of Baum, we prove that this phenomenon holds for general multilayered perceptrons, i.e. neural networks with threshold activation functions, or with any mix of threshold and ReLU activations. Our construction is probabilistic and exploits sparsity.

preprint2016arXiv

A simple tool for bounding the deviation of random matrices on geometric sets

Let $A$ be an isotropic, sub-gaussian $m \times n$ matrix. We prove that the process $Z_x := \|Ax\|_2 - \sqrt m \|x\|_2$ has sub-gaussian increments. Using this, we show that for any bounded set $T \subseteq \mathbb{R}^n$, the deviation of $\|Ax\|_2$ around its mean is uniformly bounded by the Gaussian complexity of $T$. We also prove a local version of this theorem, which allows for unbounded sets. These theorems have various applications, some of which are reviewed in this paper. In particular, we give a new result regarding model selection in the constrained linear model.

preprint2016arXiv

Concentration and regularization of random graphs

This paper studies how close random graphs are typically to their expectations. We interpret this question through the concentration of the adjacency and Laplacian matrices in the spectral norm. We study inhomogeneous Erdös-Rényi random graphs on $n$ vertices, where edges form independently and possibly with different probabilities $p_{ij}$. Sparse random graphs whose expected degrees are $o(\log n)$ fail to concentrate; the obstruction is caused by vertices with abnormally high and low degrees. We show that concentration can be restored if we regularize the degrees of such vertices, and one can do this in various ways. As an example, let us reweight or remove enough edges to make all degrees bounded above by $O(d)$ where $d=\max np_{ij}$. Then we show that the resulting adjacency matrix $A'$ concentrates with the optimal rate: $\|A' - \mathbb{E} A\| = O(\sqrt{d})$. Similarly, if we make all degrees bounded below by $d$ by adding weight $d/n$ to all edges, then the resulting Laplacian concentrates with the optimal rate: $\|L(A') - L(\mathbb{E} A')\| = O(1/\sqrt{d})$. Our approach is based on Grothendieck-Pietsch factorization, using which we construct a new decomposition of random graphs. We illustrate the concentration results with an application to the community detection problem in the analysis of networks.

preprint2016arXiv

High-dimensional estimation with geometric constraints

Consider measuring an n-dimensional vector x through the inner product with several measurement vectors, a_1, a_2, ..., a_m. It is common in both signal processing and statistics to assume the linear response model y_i = <a_i, x> + e_i, where e_i is a noise term. However, in practice the precise relationship between the signal x and the observations y_i may not follow the linear model, and in some cases it may not even be known. To address this challenge, in this paper we propose a general model where it is only assumed that each observation y_i may depend on a_i only through <a_i, x>. We do not assume that the dependence is known. This is a form of the semiparametric single index model, and it includes the linear model as well as many forms of the generalized linear model as special cases. We further assume that the signal x has some structure, and we formulate this as a general assumption that x belongs to some known (but arbitrary) feasible set K. We carefully detail the benefit of using the signal structure to improve estimation. The theory is based on the mean width of K, a geometric parameter which can be used to understand its effective dimension in estimation problems. We determine a simple, efficient two-step procedure for estimating the signal based on this model -- a linear estimation followed by metric projection onto K. We give general conditions under which the estimator is minimax optimal up to a constant. This leads to the intriguing conclusion that in the high noise regime, an unknown non-linearity in the observations does not significantly reduce one's ability to determine the signal, even when the non-linearity may be non-invertible. Our results may be specialized to understand the effect of non-linearities in compressed sensing.

preprint2015arXiv

Community detection in sparse networks via Grothendieck's inequality

We present a simple and flexible method to prove consistency of semidefinite optimization problems on random graphs. The method is based on Grothendieck's inequality. Unlike the previous uses of this inequality that lead to constant relative accuracy, we achieve any given relative accuracy by leveraging randomness. We illustrate the method with the problem of community detection in sparse networks, those with bounded average degrees. We demonstrate that even in this regime, various simple and natural semidefinite programs can be used to recover the community structure up to an arbitrarily small fraction of misclassified vertices. The method is general; it can be applied to a variety of stochastic models of networks and semidefinite programs.

preprint2015arXiv

No-gaps delocalization for general random matrices

We prove that with high probability, every eigenvector of a random matrix is delocalized in the sense that any subset of its coordinates carries a non-negligible portion of its $\ell_2$ norm. Our results pertain to a wide class of random matrices, including matrices with independent entries, symmetric and skew-symmetric matrices, as well as some other naturally arising ensembles. The matrices can be real and complex; in the latter case we assume that the real and imaginary parts of the entries are independent.

preprint2015arXiv

On the Effective Measure of Dimension in the Analysis Cosparse Model

Many applications have benefited remarkably from low-dimensional models in the recent decade. The fact that many signals, though high dimensional, are intrinsically low dimensional has given the possibility to recover them stably from a relatively small number of their measurements. For example, in compressed sensing with the standard (synthesis) sparsity prior and in matrix completion, the number of measurements needed is proportional (up to a logarithmic factor) to the signal's manifold dimension. Recently, a new natural low-dimensional signal model has been proposed: the cosparse analysis prior. In the noiseless case, it is possible to recover signals from this model, using a combinatorial search, from a number of measurements proportional to the signal's manifold dimension. However, if we ask for stability to noise or an efficient (polynomial complexity) solver, all the existing results demand a number of measurements which is far removed from the manifold dimension, sometimes far greater. Thus, it is natural to ask whether this gap is a deficiency of the theory and the solvers, or if there exists a real barrier in recovering the cosparse signals by relying only on their manifold dimension. Is there an algorithm which, in the presence of noise, can accurately recover a cosparse signal from a number of measurements proportional to the manifold dimension? In this work, we prove that there is no such algorithm. Further, we show through numerical simulations that even in the noiseless case convex relaxations fail when the number of measurements is comparable to the manifold dimension. This gives a practical counter-example to the growing literature on compressed acquisition of signals based on manifold dimension.

preprint2015arXiv

Optimization via Low-rank Approximation for Community Detection in Networks

Community detection is one of the fundamental problems of network analysis, for which a number of methods have been proposed. Most model-based or criteria-based methods have to solve an optimization problem over a discrete set of labels to find communities, which is computationally infeasible. Some fast spectral algorithms have been proposed for specific methods or models, but only on a case-by-case basis. Here we propose a general approach for maximizing a function of a network adjacency matrix over discrete labels by projecting the set of labels onto a subspace approximating the leading eigenvectors of the expected adjacency matrix. This projection onto a low-dimensional space makes the feasible set of labels much smaller and the optimization problem much easier. We prove a general result about this method and show how to apply it to several previously proposed community detection criteria, establishing its consistency for label estimation in each case and demonstrating the fundamental connection between spectral properties of the network and various model-based approaches to community detection. Simulations and applications to real-world data are included to demonstrate our method performs well for multiple problems over a wide range of parameters.

preprint2015arXiv

Smoothed analysis of symmetric random matrices with continuous distributions

We study invertibility of matrices of the form $D+R$ where $D$ is an arbitrary symmetric deterministic matrix, and $R$ is a symmetric random matrix whose independent entries have continuous distributions with bounded densities. We show that $|(D+R)^{-1}| = O(n^2)$ with high probability. The bound is completely independent of $D$. No moment assumptions are placed on $R$; in particular the entries of $R$ can be arbitrarily heavy-tailed.

preprint2015arXiv

Sparse random graphs: regularization and concentration of the Laplacian

We study random graphs with possibly different edge probabilities in the challenging sparse regime of bounded expected degrees. Unlike in the dense case, neither the graph adjacency matrix nor its Laplacian concentrate around their expectations due to the highly irregular distribution of node degrees. It has been empirically observed that simply adding a constant of order $1/n$ to each entry of the adjacency matrix substantially improves the behavior of Laplacian. Here we prove that this regularization indeed forces Laplacian to concentrate even in sparse graphs. As an immediate consequence in network analysis, we establish the validity of one of the simplest and fastest approaches to community detection -- regularized spectral clustering, under the stochastic block model. Our proof of concentration of regularized Laplacian is based on Grothendieck's inequality and factorization, combined with paving arguments.

preprint2015arXiv

The generalized Lasso with non-linear observations

We study the problem of signal estimation from non-linear observations when the signal belongs to a low-dimensional set buried in a high-dimensional space. A rough heuristic often used in practice postulates that non-linear observations may be treated as noisy linear observations, and thus the signal may be estimated using the generalized Lasso. This is appealing because of the abundance of efficient, specialized solvers for this program. Just as noise may be diminished by projecting onto the lower dimensional space, the error from modeling non-linear observations with linear observations will be greatly reduced when using the signal structure in the reconstruction. We allow general signal structure, only assuming that the signal belongs to some set K in R^n. We consider the single-index model of non-linearity. Our theory allows the non-linearity to be discontinuous, not one-to-one and even unknown. We assume a random Gaussian model for the measurement matrix, but allow the rows to have an unknown covariance matrix. As special cases of our results, we recover near-optimal theory for noisy linear observations, and also give the first theoretical accuracy guarantee for 1-bit compressed sensing with unknown covariance matrix of the measurement vectors.

preprint2014arXiv

Delocalization of eigenvectors of random matrices with independent entries

We prove that an n by n random matrix G with independent entries is completely delocalized. Suppose the entries of G have zero means, variances uniformly bounded below, and a uniform tail decay of exponential type. Then with high probability all unit eigenvectors of G have all coordinates of magnitude O(n^{-1/2}), modulo logarithmic corrections. This comes a consequence of a new, geometric, approach to delocalization for random matrices.

preprint2014arXiv

Estimation in high dimensions: a geometric perspective

This tutorial provides an exposition of a flexible geometric framework for high dimensional estimation problems with constraints. The tutorial develops geometric intuition about high dimensional sets, justifies it with some results of asymptotic convex geometry, and demonstrates connections between geometric results and estimation problems. The theory is illustrated with applications to sparse recovery, matrix completion, quantization, linear and logistic regression and generalized linear models.

preprint2014arXiv

Small ball probabilities for linear images of high dimensional distributions

We study concentration properties of random vectors of the form $AX$, where $X = (X_1, ..., X_n)$ has independent coordinates and $A$ is a given matrix. We show that the distribution of $AX$ is well spread in space whenever the distributions of $X_i$ are well spread on the line. Specifically, assume that the probability that $X_i$ falls in any given interval of length $T$ is at most $p$. Then the probability that $AX$ falls in any given ball of radius $T \|A\|_{HS}$ is at most $(Cp)^{0.9 r(A)}$, where $r(A)$ denotes the stable rank of $A$ and $C$ is an absolute constant.

preprint2013arXiv

Covariance estimation for distributions with $2+\varepsilon$ moments

We study the minimal sample size N=N(n) that suffices to estimate the covariance matrix of an n-dimensional distribution by the sample covariance matrix in the operator norm, with an arbitrary fixed accuracy. We establish the optimal bound N=O(n) for every distribution whose k-dimensional marginals have uniformly bounded $2+\varepsilon$ moments outside the sphere of radius $O(\sqrt{k})$. In the specific case of log-concave distributions, this result provides an alternative approach to the Kannan-Lovasz-Simonovits problem, which was recently solved by Adamczak et al. [J. Amer. Math. Soc. 23 (2010) 535-561]. Moreover, a lower estimate on the covariance matrix holds under a weaker assumption - uniformly bounded $2+\varepsilon$ moments of one-dimensional marginals. Our argument consists of randomizing the spectral sparsifier, a deterministic tool developed recently by Batson, Spielman and Srivastava [SIAM J. Comput. 41 (2012) 1704-1721]. The new randomized method allows one to control the spectral edges of the sample covariance matrix via the Stieltjes transform evaluated at carefully chosen random points.

preprint2013arXiv

Dimension reduction by random hyperplane tessellations

Given a subset K of the unit Euclidean sphere, we estimate the minimal number m = m(K) of hyperplanes that generate a uniform tessellation of K, in the sense that the fraction of the hyperplanes separating any pair x, y in K is nearly proportional to the Euclidean distance between x and y. Random hyperplanes prove to be almost ideal for this problem; they achieve the almost optimal bound m = O(w(K)^2) where w(K) is the Gaussian mean width of K. Using the map that sends x in K to the sign vector with respect to the hyperplanes, we conclude that every bounded subset K of R^n embeds into the Hamming cube {-1, 1}^m with a small distortion in the Gromov-Haussdorf metric. Since for many sets K one has m = m(K) << n, this yields a new discrete mechanism of dimension reduction for sets in Euclidean spaces.

preprint2013arXiv

Hanson-Wright inequality and sub-gaussian concentration

In this expository note, we give a modern proof of Hanson-Wright inequality for quadratic forms in sub-gaussian random variables. We deduce a useful concentration inequality for sub-gaussian random vectors. Two examples are given to illustrate these results: a concentration of distances between random vectors and subspaces, and a bound on the norms of products of random and deterministic matrices.

preprint2013arXiv

Invertibility of random matrices: unitary and orthogonal perturbations

We show that a perturbation of any fixed square matrix D by a random unitary matrix is well invertible with high probability. A similar result holds for perturbations by random orthogonal matrices; the only notable exception is when D is close to orthogonal. As an application, these results completely eliminate a hard-to-check condition from the Single Ring Theorem by Guionnet, Krishnapur and Zeitouni.

preprint2013arXiv

One-bit compressed sensing with non-Gaussian measurements

In one-bit compressed sensing, previous results state that sparse signals may be robustly recovered when the measurements are taken using Gaussian random vectors. In contrast to standard compressed sensing, these results are not extendable to natural non-Gaussian distributions without further assumptions, as can be demonstrated by simple counter-examples. We show that approximately sparse signals that are not extremely sparse can be accurately reconstructed from single-bit measurements sampled according to a sub-gaussian distribution, and the reconstruction comes as the solution to a convex program.

preprint2012arXiv

Approximating the moments of marginals of high-dimensional distributions

For probability distributions on $\mathbb{R}^n$, we study the optimal sample size N = N(n,p) that suffices to uniformly approximate the pth moments of all one-dimensional marginals. Under the assumption that the marginals have bounded 4p moments, we obtain the optimal bound $N=O(n^{p/2})$ for p > 2. This bound goes in the direction of bridging the two recent results: a theorem of Guedon and Rudelson [Adv. Math. 208 (2007) 798-823] which has an extra logarithmic factor in the sample size, and a result of Adamczak et al. [J. Amer. Math. Soc. 23 (2010) 535-561] which requires stronger subexponential moment assumptions.

preprint2012arXiv

Invertibility of symmetric random matrices

We study n by n symmetric random matrices H, possibly discrete, with iid above-diagonal entries. We show that H is singular with probability at most exp(-n^c), and the spectral norm of the inverse of H is O(sqrt{n}). Furthermore, the spectrum of H is delocalized on the optimal scale o(n^{-1/2}). These results improve upon a polynomial singularity bound due to Costello, Tao and Vu, and they generalize, up to constant factors, results of Tao and Vu, and Erdos, Schlein and Yau.

preprint2012arXiv

One-bit compressed sensing by linear programming

We give the first computationally tractable and almost optimal solution to the problem of one-bit compressed sensing, showing how to accurately recover an s-sparse vector x in R^n from the signs of O(s log^2(n/s)) random linear measurements of x. The recovery is achieved by a simple linear program. This result extends to approximately sparse vectors x. Our result is universal in the sense that with high probability, one measurement scheme will successfully recover all sparse vectors simultaneously. The argument is based on solving an equivalent geometric problem on random hyperplane tessellations.

preprint2012arXiv

Robust 1-bit compressed sensing and sparse logistic regression: A convex programming approach

This paper develops theoretical results regarding noisy 1-bit compressed sensing and sparse binomial regression. We show that a single convex program gives an accurate estimate of the signal, or coefficient vector, for both of these models. We demonstrate that an s-sparse signal in R^n can be accurately estimated from m = O(slog(n/s)) single-bit measurements using a simple convex program. This remains true even if each measurement bit is flipped with probability nearly 1/2. Worst-case (adversarial) noise can also be accounted for, and uniform results that hold for all sparse inputs are derived as well. In the terminology of sparse logistic regression, we show that O(slog(n/s)) Bernoulli trials are sufficient to estimate a coefficient vector in R^n which is approximately s-sparse. Moreover, the same convex program works for virtually all generalized linear models, in which the link function may be unknown. To our knowledge, these are the first results that tie together the theory of sparse logistic regression to 1-bit compressed sensing. Our results apply to general signal structures aside from sparsity; one only needs to know the size of the set K where signals reside. The size is given by the mean width of K, a computable quantity whose square serves as a robust extension of the dimension.

preprint2011arXiv

Introduction to the non-asymptotic analysis of random matrices

This is a tutorial on some basic non-asymptotic methods and concepts in random matrix theory. The reader will learn several tools for the analysis of the extreme singular values of random matrices with independent rows or columns. Many of these methods sprung off from the development of geometric functional analysis since the 1970's. They have applications in several fields, most notably in theoretical computer science, statistics and signal processing. A few basic applications are covered in this text, particularly for the problem of estimating covariance matrices in statistics and for validating probabilistic constructions of measurement matrices in compressed sensing. These notes are written particularly for graduate students and beginning researchers in different areas, including functional analysts, probabilists, theoretical statisticians, electrical engineers, and theoretical computer scientists.

preprint2011arXiv

Partial estimation of covariance matrices

A classical approach to accurately estimating the covariance matrix Σof a p-variate normal distribution is to draw a sample of size n > p and form a sample covariance matrix. However, many modern applications operate with much smaller sample sizes, thus calling for estimation guarantees in the regime n << p. We show that a sample of size n = O(m log^6 p) is sufficient to accurately estimate in operator norm an arbitrary symmetric part of Σconsisting of m < n entries per row. This follows from a general result on estimating Hadamard products M.Σ, where M is an arbitrary symmetric matrix.

preprint2010arXiv

How close is the sample covariance matrix to the actual covariance matrix?

Given a probability distribution in R^n with general (non-white) covariance, a classical estimator of the covariance matrix is the sample covariance matrix obtained from a sample of N independent points. What is the optimal sample size N = N(n) that guarantees estimation with a fixed accuracy in the operator norm? Suppose the distribution is supported in a centered Euclidean ball of radius \sqrt{n}. We conjecture that the optimal sample size is N = O(n) for all distributions with finite fourth moment, and we prove this up to an iterated logarithmic factor. This problem is motivated by the optimal theorem of Rudelson which states that N = O(n \log n) for distributions with finite second moment, and a recent result of Adamczak, Litvak, Pajor and Tomczak-Jaegermann which guarantees that N = O(n) for sub-exponential distributions.

preprint2010arXiv

Non-asymptotic theory of random matrices: extreme singular values

The classical random matrix theory is mostly focused on asymptotic spectral properties of random matrices as their dimensions grow to infinity. At the same time many recent applications from convex geometry to functional analysis to information theory operate with random matrices in fixed dimensions. This survey addresses the non-asymptotic theory of extreme singular values of random matrices with independent entries. We focus on recently developed geometric methods for estimating the hard edge of random matrices (the smallest singular value).

preprint2010arXiv

Spectral norm of products of random and deterministic matrices

We study the spectral norm of matrices M that can be factored as M=BA, where A is a random matrix with independent mean zero entries, and B is a fixed matrix. Under the (4+epsilon)-th moment assumption on the entries of A, we show that the spectral norm of such an m by n matrix M is bounded by \sqrt{m} + \sqrt{n}, which is sharp. In other words, in regard to the spectral norm, products of random and deterministic matrices behave similarly to random matrices with independent entries. This result along with the previous work of M. Rudelson and the author implies that the smallest singular value of a random m times n matrix with i.i.d. mean zero entries and bounded (4+epsilon)-th moment is bounded below by \sqrt{m} - \sqrt{n-1} with high probability.

preprint2010arXiv

Uncertainty Principles and Vector Quantization

Given a frame in C^n which satisfies a form of the uncertainty principle (as introduced by Candes and Tao), it is shown how to quickly convert the frame representation of every vector into a more robust Kashin's representation whose coefficients all have the smallest possible dynamic range O(1/\sqrt{n}). The information tends to spread evenly among these coefficients. As a consequence, Kashin's representations have a great power for reduction of errors in their coefficients, including coefficient losses and distortions.

preprint2009arXiv

On the Role of Sparsity in Compressed Sensing and Random Matrix Theory

We discuss applications of some concepts of Compressed Sensing in the recent work on invertibility of random matrices due to Rudelson and the author. We sketch an argument leading to the optimal bound N^{-1/2} on the median of the smallest singular value of an N by N matrix with random independent entries. We highlight the parts of the argument where sparsity ideas played a key role.

preprint2009arXiv

The smallest singular value of a random rectangular matrix

We prove an optimal estimate on the smallest singular value of a random subgaussian matrix, valid for all fixed dimensions. For an N by n matrix A with independent and identically distributed subgaussian entries, the smallest singular value of A is at least of the order \sqrt{N} - \sqrt{n-1} with high probability. A sharp estimate on the probability is also obtained.

preprint2008arXiv

Beyond Hirsch Conjecture: walks on random polytopes and smoothed complexity of the simplex method

The smoothed analysis of algorithms is concerned with the expected running time of an algorithm under slight random perturbations of arbitrary inputs. Spielman and Teng proved that the shadow-vertex simplex method has polynomial smoothed complexity. On a slight random perturbation of an arbitrary linear program, the simplex method finds the solution after a walk on polytope(s) with expected length polynomial in the number of constraints n, the number of variables d and the inverse standard deviation of the perturbation 1/sigma. We show that the length of walk in the simplex method is actually polylogarithmic in the number of constraints n. Spielman-Teng's bound on the walk was O(n^{86} d^{55} sigma^{-30}), up to logarithmic factors. We improve this to O(log^7 n (d^9 + d^3 \s^{-4})). This shows that the tight Hirsch conjecture n-d on the length of walk on polytopes is not a limitation for the smoothed Linear Programming. Random perturbations create short paths between vertices. We propose a randomized phase-I for solving arbitrary linear programs, which is of independent interest. Instead of finding a vertex of a feasible set, we add a vertex at random to the feasible set. This does not affect the solution of the linear program with constant probability. This overcomes one of the major difficulties of smoothed analysis of the simplex method -- one can now statistically decouple the walk from the smoothed linear program. This yields a much better reduction of the smoothed complexity to a geometric quantity -- the size of planar sections of random polytopes. We also improve upon the known estimates for that size, showing that it is polylogarithmic in the number of vertices.

preprint2008arXiv

The least singular value of a random square matrix is O(n^{-1/2})

Let A be a matrix whose entries are real i.i.d. centered random variables with unit variance and suitable moment assumptions. Then the smallest singular value of A is of order n^{-1/2} with high probability. The lower estimate of this type was proved recently by the authors; in this note we establish the matching upper estimate.

preprint2008arXiv

The Littlewood-Offord Problem and invertibility of random matrices

We prove two basic conjectures on the distribution of the smallest singular value of random n times n matrices with independent entries. Under minimal moment assumptions, we show that the smallest singular value is of order n^{-1/2}, which is optimal for Gaussian matrices. Moreover, we give a optimal estimate on the tail probability. This comes as a consequence of a new and essentially sharp estimate in the Littlewood-Offord problem: for i.i.d. random variables X_k and real numbers a_k, determine the probability P that the sum of a_k X_k lies near some number v. For arbitrary coefficients a_k of the same order of magnitude, we show that they essentially lie in an arithmetic progression of length 1/p.

preprint2006arXiv

Random sets of isomorphism of linear operators on Hilbert space

This note deals with a problem of the probabilistic Ramsey theory in functional analysis. Given a linear operator $T$ on a Hilbert space with an orthogonal basis, we define the isomorphic structure $Σ(T)$ as the family of all subsets of the basis so that $T$ restricted to their span is a nice isomorphism. Our main result is a dimension-free optimal estimate of the size of $Σ(T)$. It improves and extends in several ways the principle of restricted invertibility due to Bourgain and Tzafriri. With an appropriate notion of randomness, we obtain a randomized principle of restricted invertibility.

preprint2006arXiv

Sampling from large matrices: an approach through geometric functional analysis

We study random submatrices of a large matrix A. We show how to approximately compute A from its random submatrix of the smallest possible size O(r log r) with a small error in the spectral norm, where r = ||A||_F^2 / ||A||_2^2 is the numerical rank of A. The numerical rank is always bounded by, and is a stable relaxation of, the rank of A. This yields an asymptotically optimal guarantee in an algorithm for computing low-rank approximations of A. We also prove asymptotically optimal estimates on the spectral norm and the cut-norm of random submatrices of A. The result for the cut-norm yields a slight improvement on the best known sample complexity for an approximation algorithm for MAX-2CSP problems. We use methods of Probability in Banach spaces, in particular the law of large numbers for operator-valued random variables.

preprint2006arXiv

Sparse reconstruction by convex relaxation: Fourier and Gaussian measurements

We want to exactly reconstruct a sparse signal f (a vector in R^n of small support) from few linear measurements of f (inner products with some fixed vectors). A nice and intuitive reconstruction by Linear Programming has been advocated since 80-ies by Dave Donoho and his collaborators. Namely, one can relax the reconstruction problem, which is highly nonconvex, to a convex problem -- and, moreover, to a linear program. However, when is exactly the reconstruction problem equivalent to its convex relaxation is an open question. Recent work of many authors shows that the number of measurements k(r,n) needed to exactly reconstruct any r-sparse signal f of length n (a vector in R^n of support r) from its linear measurements with the convex relaxation method is usually O(r polylog(n)). However, known estimates of the number of measurements k(r,n) involve huge constants, in spite of very good performance of the algorithms in practice. In this paper, we consider random Gaussian measurements and random Fourier measurements (a frequency sample of f). For Gaussian measurements, we prove the first guarantees with reasonable constants: k(r,n) < 12 r (2 + log(n/r)), which is optimal up to constants. For Fourier measurements, we prove the best known bound k(r,n) = O(r log(n) . log^2(r) log(r log n)), which is optimal within the log log n and log^3 r factors. Our arguments are based on the technique of Geometric Functional Analysis and Probability in Banach spaces.

preprint2005arXiv

Geometric approach to error correcting codes and reconstruction of signals

We develop an approach through geometric functional analysis to error correcting codes and to reconstruction of signals from few linear measurements. An error correcting code encodes an n-letter word x into an m-letter word y in such a way that x can be decoded correctly when any r letters of y are corrupted. We prove that most linear orthogonal transformations Q from R^n into R^m form efficient and robust robust error correcting codes over reals. The decoder (which corrects the corrupted components of y) is the metric projection onto the range of Q in the L_1 norm. An equivalent problem arises in signal processing: how to reconstruct a signal that belongs to a small class from few linear measurements? We prove that for most sets of Gaussian measurements, all signals of small support can be exactly reconstructed by the L_1 norm minimization. This is a substantial improvement of recent results of Donoho and of Candes and Tao. An equivalent problem in combinatorial geometry is the existence of a polytope with fixed number of facets and maximal number of lower-dimensional facets. We prove that most sections of the cube form such polytopes.

preprint2004arXiv

Combinatorics of random processes and sections of convex bodies

We find a sharp combinatorial bound for the metric entropy of sets in R^n and general classes of functions. This solves two basic combinatorial conjectures on the empirical processes. 1. A class of functions satisfies the uniform Central Limit Theorem if the square root of its combinatorial dimension is integrable. 2. The uniform entropy is equivalent to the combinatorial dimension under minimal regularity. Our method also constructs a nicely bounded coordinate section of a symmetric convex body in R^n. In the operator theory, this essentially proves for all normed spaces the restricted invertibility principle of Bourgain and Tzafriri.

preprint2004arXiv

Integer cells in convex sets

Every convex body K in R^n has a coordinate projection PK that contains at least vol(0.1 K) cells of the integer lattice PZ^n, provided this volume is at least one. Our proof of this counterpart of Minkowski's theorem is based on an extension of the combinatorial density theorem of Sauer, Shelah and Vapnik-Chervonenkis to Z^n. This leads to a new approach to sections of convex bodies. In particular, fundamental results of the asymptotic convex geometry such as the Volume Ratio Theorem and Milman's duality of the diameters admit natural versions for coordinate sections.

preprint2004arXiv

Isoperimetry of waists and local versus global asymptotic convex geometries

Existence of nicely bounded sections of two symmetric convex bodies K and L implies that the intersection of random rotations of K and L is nicely bounded. For L = subspace, this main result immediately yields the unexpected phenomenon: "If K has one nicely bounded section, then most sections of K are nicely bounded". This 'existence implies randomness' consequence was proved independently in [Giannopoulos, Milman and Tsolomitis]. The main result represents a new connection between the local asymptotic convex geometry (study of sections of convex bodies) and the global asymptotic convex geometry (study of convex bodies as a whole). The method relies on the new 'isoperimetry of waists' on the sphere due to Gromov.

preprint2004arXiv

On random intersections of two convex bodies. Appendix to: "Isoperimetry of waists and local versus global asymptotic convex geometries" by R.Vershynin

In the paper "Isoperimetry of waists and local versus global asymptotic convex geometries", it was proved that the existence of nicely bounded sections of two symmetric convex bodies K and L implies that the intersection of randomly rotated K and L is nicely bounded. In this appendix, we achieve a polynomial bound on the diameter of that intersection (in the ratio of the dimensions of the sections).

preprint2004arXiv

Small ball probability and Dvoretzky theorem

Large deviation estimates are by now a standard tool inthe Asymptotic Convex Geometry, contrary to small deviationresults. In this note we present a novel application of a smalldeviations inequality to a problem related to the diameters of random sections of high dimensional convex bodies. Our results imply an unexpected distinction between the lower and the upper inclusions in Dvoretzky Theorem.

preprint1998arXiv

On constructions of strong and uniformly minimal M-bases in Banach spaces

We find a natural class of transformations ("flattened perturbations") of a norming M-basis in a Banach space X, which give a strong norming M-basis in X. This simplifies and generalizes the positive answer to the "strong M-basis problem" solved by P. Terenzi. We also show that in general one cannot achieve uniformly minimality applying standard transformations to a given norming M-basis, despite of the existence in X a uniformly minimal strong M-bases.

Roman Vershynin

What is connected

Connect this record

See the researcher in context

Building this map preview

50 published item(s)

Covariance loss, Szemeredi regularity, and differential privacy

Covariance's Loss is Privacy's Gain: Computationally Efficient, Private and Accurate Synthetic Data

The Quarks of Attention

A theory of capacity and sparse neural encoding

Marchenko-Pastur law with relaxed independence conditions

Private sampling: a noiseless approach for generating differentially private synthetic data

Memory capacity of neural networks with threshold and ReLU activations

A simple tool for bounding the deviation of random matrices on geometric sets

Concentration and regularization of random graphs

High-dimensional estimation with geometric constraints

Community detection in sparse networks via Grothendieck's inequality

No-gaps delocalization for general random matrices

On the Effective Measure of Dimension in the Analysis Cosparse Model

Optimization via Low-rank Approximation for Community Detection in Networks

Smoothed analysis of symmetric random matrices with continuous distributions

Sparse random graphs: regularization and concentration of the Laplacian

The generalized Lasso with non-linear observations

Delocalization of eigenvectors of random matrices with independent entries

Estimation in high dimensions: a geometric perspective

Small ball probabilities for linear images of high dimensional distributions

Covariance estimation for distributions with $2+\varepsilon$ moments

Dimension reduction by random hyperplane tessellations

Hanson-Wright inequality and sub-gaussian concentration

Invertibility of random matrices: unitary and orthogonal perturbations

One-bit compressed sensing with non-Gaussian measurements

Approximating the moments of marginals of high-dimensional distributions

Invertibility of symmetric random matrices

One-bit compressed sensing by linear programming

Robust 1-bit compressed sensing and sparse logistic regression: A convex programming approach

Introduction to the non-asymptotic analysis of random matrices

Partial estimation of covariance matrices

How close is the sample covariance matrix to the actual covariance matrix?

Non-asymptotic theory of random matrices: extreme singular values

Spectral norm of products of random and deterministic matrices

Uncertainty Principles and Vector Quantization

On the Role of Sparsity in Compressed Sensing and Random Matrix Theory

The smallest singular value of a random rectangular matrix

Beyond Hirsch Conjecture: walks on random polytopes and smoothed complexity of the simplex method

The least singular value of a random square matrix is O(n^{-1/2})

The Littlewood-Offord Problem and invertibility of random matrices

Random sets of isomorphism of linear operators on Hilbert space

Sampling from large matrices: an approach through geometric functional analysis

Sparse reconstruction by convex relaxation: Fourier and Gaussian measurements

Geometric approach to error correcting codes and reconstruction of signals

Combinatorics of random processes and sections of convex bodies

Integer cells in convex sets

Isoperimetry of waists and local versus global asymptotic convex geometries

On random intersections of two convex bodies. Appendix to: "Isoperimetry of waists and local versus global asymptotic convex geometries" by R.Vershynin

Small ball probability and Dvoretzky theorem

On constructions of strong and uniformly minimal M-bases in Banach spaces