Source author record

Raghu Meka

Raghu Meka appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computational Complexity Data Structures and Algorithms Machine Learning math.CO math.PR Discrete Mathematics Computational Geometry math.FA math.ST Statistics Theory Artificial Intelligence Computation and Language Cryptography and Security Information Theory math.IT math.MG

Catalog footprint

What is connected

31works

16topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Simple Mechanisms for Representing, Indexing and Manipulating Concepts

Supervised and unsupervised learning using deep neural networks typically aims to exploit the underlying structure in the training data; this structure is often explained using a latent generative process that produces the data, and the generative process is often hierarchical, involving latent concepts. Despite the significant work on understanding the learning of the latent structure and underlying concepts using theory and experiments, a framework that mathematically captures the definition of a concept and provides ways to operate on concepts is missing. In this work, we propose to characterize a simple primitive concept by the zero set of a collection of polynomials and use moment statistics of the data to uniquely represent the concepts; we show how this view can be used to obtain a signature of the concept. These signatures can be used to discover a common structure across the set of concepts and could recursively produce the signature of higher-level concepts from the signatures of lower-level concepts. To utilize such desired properties, we propose a method by keeping a dictionary of concepts and show that the proposed method can learn different types of hierarchical structures of the data.

preprint2026arXiv

Sparsifying Sums of Positive Semidefinite Matrices

In this paper, we revisit spectral sparsification for sums of arbitrary positive semidefinite (PSD) matrices. Concretely, for any collection of PSD matrices $\mathcal{A} = \{A_1, A_2, \ldots, A_r\} \subset \mathbb{R}^{n \times n}$, given any subset $T \subseteq [r]$, our goal is to find sparse weights $μ\in \mathbb{R}_{\geq 0}^r$ such that $(1 - ε) \sum_{i \in T} A_i \preceq \sum_{i \in T} μ_i A_i \preceq (1 + ε) \sum_{i \in T} A_i.$ This generalizes spectral sparsification of graphs which corresponds to $\mathcal{A}$ being the set of Laplacians of edges. It also captures sparsifying Cayley graphs by choosing a subset of generators. The former has been extensively studied with optimal sparsifiers known. The latter has received attention recently and was solved for a few special groups (e.g., $\mathbb{F}_2^n$). Prior work shows any sum of PSD matrices can be sparsified down to $O(n)$ elements. This bound however turns out to be too coarse and in particular yields no non-trivial bound for building Cayley sparsifiers for Cayley graphs. In this work, we develop a new, instance-specific (i.e., specific to a given collection $\mathcal{A}$) theory of PSD matrix sparsification based on a new parameter $N^*(\mathcal{A})$ which we call connectivity threshold that generalizes the threshold of the number of edges required to make a graph connected. Our main result gives a sparsifier that uses at most $O(ε^{-2}N^*(\mathcal{A}) (\log n)(\log r))$ matrices and is constructible in randomized polynomial time. We also show that we need $N^*(\mathcal{A})$ elements to sparsify for any $ε< 0.99$. As the main application of our framework, we prove that any Cayley graph can be sparsified to $O(ε^{-2}\log^4 N)$ generators. Previously, a non-trivial bound on Cayley sparsifiers was known only in the case when the group is $\mathbb{F}_2^n$.

preprint2024arXiv

Explicit separations between randomized and deterministic Number-on-Forehead communication

We study the power of randomness in the Number-on-Forehead (NOF) model in communication complexity. We construct an explicit 3-player function $f:[N]^3 \to \{0,1\}$, such that: (i) there exist a randomized NOF protocol computing it that sends a constant number of bits; but (ii) any deterministic or nondeterministic NOF protocol computing it requires sending about $(\log N)^{1/3}$ many bits. This exponentially improves upon the previously best-known such separation. At the core of our proof is an extension of a recent result of the first and third authors on sets of integers without 3-term arithmetic progressions into a non-arithmetic setting.

preprint2022arXiv

Distributional Hardness Against Preconditioned Lasso via Erasure-Robust Designs

Sparse linear regression with ill-conditioned Gaussian random designs is widely believed to exhibit a statistical/computational gap, but there is surprisingly little formal evidence for this belief, even in the form of examples that are hard for restricted classes of algorithms. Recent work has shown that, for certain covariance matrices, the broad class of Preconditioned Lasso programs provably cannot succeed on polylogarithmically sparse signals with a sublinear number of samples. However, this lower bound only shows that for every preconditioner, there exists at least one signal that it fails to recover successfully. This leaves open the possibility that, for example, trying multiple different preconditioners solves every sparse linear regression problem. In this work, we prove a stronger lower bound that overcomes this issue. For an appropriate covariance matrix, we construct a single signal distribution on which any invertibly-preconditioned Lasso program fails with high probability, unless it receives a linear number of samples. Surprisingly, at the heart of our lower bound is a new positive result in compressed sensing. We show that standard sparse random designs are with high probability robust to adversarial measurement erasures, in the sense that if $b$ measurements are erased, then all but $O(b)$ of the coordinates of the signal are still information-theoretically identifiable. To our knowledge, this is the first time that partial recoverability of arbitrary sparse signals under erasures has been studied in compressed sensing.

preprint2022arXiv

Minimax Optimality (Probably) Doesn't Imply Distribution Learning for GANs

Arguably the most fundamental question in the theory of generative adversarial networks (GANs) is to understand to what extent GANs can actually learn the underlying distribution. Theoretical and empirical evidence suggests local optimality of the empirical training objective is insufficient. Yet, it does not rule out the possibility that achieving a true population minimax optimal solution might imply distribution learning. In this paper, we show that standard cryptographic assumptions imply that this stronger condition is still insufficient. Namely, we show that if local pseudorandom generators (PRGs) exist, then for a large family of natural continuous target distributions, there are ReLU network generators of constant depth and polynomial size which take Gaussian random seeds so that (i) the output is far in Wasserstein distance from the target distribution, but (ii) no polynomially large Lipschitz discriminator ReLU network can detect this. This implies that even achieving a population minimax optimal solution to the Wasserstein GAN objective is likely insufficient for distribution learning in the usual statistical sense. Our techniques reveal a deep connection between GANs and PRGs, which we believe will lead to further insights into the computational landscape of GANs.

preprint2022arXiv

Resolving Matrix Spencer Conjecture Up to Poly-logarithmic Rank

We give a simple proof of the matrix Spencer conjecture up to poly-logarithmic rank: given symmetric $d \times d$ matrices $A_1,\ldots,A_n$ each with $\|A_i\|_{\mathsf{op}} \leq 1$ and rank at most $n/\log^3 n$, one can efficiently find $\pm 1$ signs $x_1,\ldots,x_n$ such that their signed sum has spectral norm $\|\sum_{i=1}^n x_i A_i\|_{\mathsf{op}} = O(\sqrt{n})$. This result also implies a $\log n - Ω( \log \log n)$ qubit lower bound for quantum random access codes encoding $n$ classical bits with advantage $\gg 1/\sqrt{n}$. Our proof uses the recent refinement of the non-commutative Khintchine inequality in [Bandeira, Boedihardjo, van Handel, 2022] for random matrices with correlated Gaussian entries.

preprint2022arXiv

Smoothed Analysis of the Komlós Conjecture

The well-known Komlós conjecture states that given $n$ vectors in $\mathbb{R}^d$ with Euclidean norm at most one, there always exists a $\pm 1$ coloring such that the $\ell_{\infty}$ norm of the signed-sum vector is a constant independent of $n$ and $d$. We prove this conjecture in a smoothed analysis setting where the vectors are perturbed by adding a small Gaussian noise and when the number of vectors $n =ω(d\log d)$. The dependence of $n$ on $d$ is the best possible even in a completely random setting. Our proof relies on a weighted second moment method, where instead of considering uniformly randomly colorings we apply the second moment method on an implicit distribution on colorings obtained by applying the Gram-Schmidt walk algorithm to a suitable set of vectors. The main technical idea is to use various properties of these colorings, including subgaussianity, to control the second moment.

preprint2020arXiv

Efficient Algorithms for Outlier-Robust Regression

We give the first polynomial-time algorithm for performing linear or polynomial regression resilient to adversarial corruptions in both examples and labels. Given a sufficiently large (polynomial-size) training set drawn i.i.d. from distribution D and subsequently corrupted on some fraction of points, our algorithm outputs a linear function whose squared error is close to the squared error of the best-fitting linear function with respect to D, assuming that the marginal distribution of D over the input space is \emph{certifiably hypercontractive}. This natural property is satisfied by many well-studied distributions such as Gaussian, strongly log-concave distributions and, uniform distribution on the hypercube among others. We also give a simple statistical lower bound showing that some distributional assumption is necessary to succeed in this setting. These results are the first of their kind and were not known to be even information-theoretically possible prior to our work. Our approach is based on the sum-of-squares (SoS) method and is inspired by the recent applications of the method for parameter recovery problems in unsupervised learning. Our algorithm can be seen as a natural convex relaxation of the following conceptually simple non-convex optimization problem: find a linear function and a large subset of the input corrupted sample such that the least squares loss of the function over the subset is minimized over all possible large subsets.

preprint2020arXiv

Learning Polynomials of Few Relevant Dimensions

Polynomial regression is a basic primitive in learning and statistics. In its most basic form the goal is to fit a degree $d$ polynomial to a response variable $y$ in terms of an $n$-dimensional input vector $x$. This is extremely well-studied with many applications and has sample and runtime complexity $Θ(n^d)$. Can one achieve better runtime if the intrinsic dimension of the data is much smaller than the ambient dimension $n$? Concretely, we are given samples $(x,y)$ where $y$ is a degree at most $d$ polynomial in an unknown $r$-dimensional projection (the relevant dimensions) of $x$. This can be seen both as a generalization of phase retrieval and as a special case of learning multi-index models where the link function is an unknown low-degree polynomial. Note that without distributional assumptions, this is at least as hard as junta learning. In this work we consider the important case where the covariates are Gaussian. We give an algorithm that learns the polynomial within accuracy $ε$ with sample complexity that is roughly $N = O_{r,d}(n \log^2(1/ε) (\log n)^d)$ and runtime $O_{r,d}(N n^2)$. Prior to our work, no such results were known even for the case of $r=1$. We introduce a new filtered PCA approach to get a warm start for the true subspace and use geodesic SGD to boost to arbitrary accuracy; our techniques may be of independent interest, especially for problems dealing with subspace recovery or analyzing SGD on manifolds.

preprint2020arXiv

Learning Some Popular Gaussian Graphical Models without Condition Number Bounds

Gaussian Graphical Models (GGMs) have wide-ranging applications in machine learning and the natural and social sciences. In most of the settings in which they are applied, the number of observed samples is much smaller than the dimension and they are assumed to be sparse. While there are a variety of algorithms (e.g. Graphical Lasso, CLIME) that provably recover the graph structure with a logarithmic number of samples, they assume various conditions that require the precision matrix to be in some sense well-conditioned. Here we give the first polynomial-time algorithms for learning attractive GGMs and walk-summable GGMs with a logarithmic number of samples without any such assumptions. In particular, our algorithms can tolerate strong dependencies among the variables. Our result for structure recovery in walk-summable GGMs is derived from a more general result for efficient sparse linear regression in walk-summable models without any norm dependencies. We complement our results with experiments showing that many existing algorithms fail even in some simple settings where there are long dependency chains, whereas ours do not.

preprint2020arXiv

Online Discrepancy Minimization for Stochastic Arrivals

In the stochastic online vector balancing problem, vectors $v_1,v_2,\ldots,v_T$ chosen independently from an arbitrary distribution in $\mathbb{R}^n$ arrive one-by-one and must be immediately given a $\pm$ sign. The goal is to keep the norm of the discrepancy vector, i.e., the signed prefix-sum, as small as possible for a given target norm. We consider some of the most well-known problems in discrepancy theory in the above online stochastic setting, and give algorithms that match the known offline bounds up to $\mathsf{polylog}(nT)$ factors. This substantially generalizes and improves upon the previous results of Bansal, Jiang, Singla, and Sinha (STOC' 20). In particular, for the Komlós problem where $\|v_t\|_2\leq 1$ for each $t$, our algorithm achieves $\tilde{O}(1)$ discrepancy with high probability, improving upon the previous $\tilde{O}(n^{3/2})$ bound. For Tusnády's problem of minimizing the discrepancy of axis-aligned boxes, we obtain an $O(\log^{d+4} T)$ bound for arbitrary distribution over points. Previous techniques only worked for product distributions and gave a weaker $O(\log^{2d+1} T)$ bound. We also consider the Banaszczyk setting, where given a symmetric convex body $K$ with Gaussian measure at least $1/2$, our algorithm achieves $\tilde{O}(1)$ discrepancy with respect to the norm given by $K$ for input distributions with sub-exponential tails. Our key idea is to introduce a potential that also enforces constraints on how the discrepancy vector evolves, allowing us to maintain certain anti-concentration properties. For the Banaszczyk setting, we further enhance this potential by combining it with ideas from generic chaining. Finally, we also extend these results to the setting of online multi-color discrepancy.

preprint2016arXiv

Explicit resilient functions matching Ajtai-Linial

A Boolean function on n variables is q-resilient if for any subset of at most q variables, the function is very likely to be determined by a uniformly random assignment to the remaining n-q variables; in other words, no coalition of at most q variables has significant influence on the function. Resilient functions have been extensively studied with a variety of applications in cryptography, distributed computing, and pseudorandomness. The best known balanced resilient function on n variables due to Ajtai and Linial ([AL93]) is Omega(n/(log^2 n))-resilient. However, the construction of Ajtai and Linial is by the probabilistic method and does not give an efficiently computable function. In this work we give an explicit monotone depth three almost-balanced Boolean function on n bits that is Omega(n/(log^2 n))-resilient matching the work of Ajtai and Linial. The best previous explicit construction due to Meka [Meka09] (which only gives a logarithmic depth function) and Chattopadhyay and Zuckermman [CZ15] were only n^{1-c}-resilient for any constant c < 1. Our construction and analysis are motivated by (and simplifies parts of) the recent breakthrough of [CZ15] giving explicit two-sources extractors for polylogarithmic min-entropy; a key ingredient in their result was the construction of explicit constant-depth resilient functions. An important ingredient in our construction is a new randomness optimal oblivious sampler which preserves moment generating functions of sums of variables and could be useful elsewhere.

preprint2015arXiv

A polynomial time approximation scheme for computing the supremum of Gaussian processes

We give a polynomial time approximation scheme (PTAS) for computing the supremum of a Gaussian process. That is, given a finite set of vectors $V\subseteq\mathbb{R}^d$, we compute a $(1+\varepsilon)$-factor approximation to $\mathop {\mathbb{E}}_{X\leftarrow\mathcal{N}^d}[\sup_{v\in V}|\langle v,X\rangle|]$ deterministically in time $\operatorname {poly}(d)\cdot|V|^{O_{\varepsilon}(1)}$. Previously, only a constant factor deterministic polynomial time approximation algorithm was known due to the work of Ding, Lee and Peres [Ann. of Math. (2) 175 (2012) 1409-1471]. This answers an open question of Lee (2010) and Ding [Ann. Probab. 42 (2014) 464-496]. The study of supremum of Gaussian processes is of considerable importance in probability with applications in functional analysis, convex geometry, and in light of the recent breakthrough work of Ding, Lee and Peres [Ann. of Math. (2) 175 (2012) 1409-1471], to random walks on finite graphs. As such our result could be of use elsewhere. In particular, combining with the work of Ding [Ann. Probab. 42 (2014) 464-496], our result yields a PTAS for computing the cover time of bounded-degree graphs. Previously, such algorithms were known only for trees. Along the way, we also give an explicit oblivious estimator for semi-norms in Gaussian space with optimal query complexity. Our algorithm and its analysis are elementary in nature, using two classical comparison inequalities, Slepian's lemma and Kanter's lemma.

preprint2015arXiv

Almost Optimal Pseudorandom Generators for Spherical Caps

Halfspaces or linear threshold functions are widely studied in complexity theory, learning theory and algorithm design. In this work we study the natural problem of constructing pseudorandom generators (PRGs) for halfspaces over the sphere, aka spherical caps, which besides being interesting and basic geometric objects, also arise frequently in the analysis of various randomized algorithms (e.g., randomized rounding). We give an explicit PRG which fools spherical caps within error $ε$ and has an almost optimal seed-length of $O(\log n + \log(1/ε) \cdot \log\log(1/ε))$. For an inverse-polynomially growing error $ε$, our generator has a seed-length optimal up to a factor of $O( \log \log {(n)})$. The most efficient PRG previously known (due to Kane, 2012) requires a seed-length of $Ω(\log^{3/2}{(n)})$ in this setting. We also obtain similar constructions to fool halfspaces with respect to the Gaussian distribution. Our construction and analysis are significantly different from previous works on PRGs for halfspaces and build on the iterative dimension reduction ideas of Kane et. al. (2011) and Celis et. al. (2013), the \emph{classical moment problem} from probability theory and explicit constructions of \emph{orthogonal designs} based on the seminal work of Bourgain and Gamburd (2011) on expansion in Lie groups.

preprint2015arXiv

Anti-concentration for polynomials of independent random variables

We prove anti-concentration results for polynomials of independent random variables with arbitrary degree. Our results extend the classical Littlewood-Offord result for linear polynomials, and improve several earlier estimates. We discuss applications in two different areas. In complexity theory, we prove near optimal lower bounds for computing the Parity, addressing a challenge in complexity theory posed by Razborov and Viola, and also address a problem concerning OR functions. In random graph theory, we derive a general anti-concentration result on the number of copies of a fixed graph in a random graph.

preprint2015arXiv

Pseudorandomness via the discrete Fourier transform

We present a new approach to constructing unconditional pseudorandom generators against classes of functions that involve computing a linear function of the inputs. We give an explicit construction of a pseudorandom generator that fools the discrete Fourier transforms of linear functions with seed-length that is nearly logarithmic (up to polyloglog factors) in the input size and the desired error parameter. Our result gives a single pseudorandom generator that fools several important classes of tests computable in logspace that have been considered in the literature, including halfspaces (over general domains), modular tests and combinatorial shapes. For all these classes, our generator is the first that achieves near logarithmic seed-length in both the input length and the error parameter. Getting such a seed-length is a natural challenge in its own right, which needs to be overcome in order to derandomize RL - a central question in complexity theory. Our construction combines ideas from a large body of prior work, ranging from a classical construction of [NN93] to the recent gradually increasing independence paradigm of [KMN11, CRSW13, GMRTV12], while also introducing some novel analytic machinery which might find other applications.

preprint2015arXiv

Sum-of-squares lower bounds for planted clique

Finding cliques in random graphs and the closely related "planted" clique variant, where a clique of size k is planted in a random G(n, 1/2) graph, have been the focus of substantial study in algorithm design. Despite much effort, the best known polynomial-time algorithms only solve the problem for k ~ sqrt(n). In this paper we study the complexity of the planted clique problem under algorithms from the Sum-of-squares hierarchy. We prove the first average case lower bound for this model: for almost all graphs in G(n,1/2), r rounds of the SOS hierarchy cannot find a planted k-clique unless k > n^{1/2r} (up to logarithmic factors). Thus, for any constant number of rounds planted cliques of size n^{o(1)} cannot be found by this powerful class of algorithms. This is shown via an integrability gap for the natural formulation of maximum clique problem on random graphs for SOS and Lasserre hierarchies, which in turn follow from degree lower bounds for the Positivestellensatz proof system. We follow the usual recipe for such proofs. First, we introduce a natural "dual certificate" (also known as a "vector-solution" or "pseudo-expectation") for the given system of polynomial equations representing the problem for every fixed input graph. Then we show that the matrix associated with this dual certificate is PSD (positive semi-definite) with high probability over the choice of the input graph.This requires the use of certain tools. One is the theory of association schemes, and in particular the eigenspaces and eigenvalues of the Johnson scheme. Another is a combinatorial method we develop to compute (via traces) norm bounds for certain random matrices whose entries are highly dependent; we hope this method will be useful elsewhere.

preprint2014arXiv

Computational Limits for Matrix Completion

Matrix Completion is the problem of recovering an unknown real-valued low-rank matrix from a subsample of its entries. Important recent results show that the problem can be solved efficiently under the assumption that the unknown matrix is incoherent and the subsample is drawn uniformly at random. Are these assumptions necessary? It is well known that Matrix Completion in its full generality is NP-hard. However, little is known if make additional assumptions such as incoherence and permit the algorithm to output a matrix of slightly higher rank. In this paper we prove that Matrix Completion remains computationally intractable even if the unknown matrix has rank $4$ but we are allowed to output any constant rank matrix, and even if additionally we assume that the unknown matrix is incoherent and are shown $90%$ of the entries. This result relies on the conjectured hardness of the $4$-Coloring problem. We also consider the positive semidefinite Matrix Completion problem. Here we show a similar hardness result under the standard assumption that $\mathrm{P}\ne \mathrm{NP}.$ Our results greatly narrow the gap between existing feasibility results and computational lower bounds. In particular, we believe that our results give the first complexity-theoretic justification for why distributional assumptions are needed beyond the incoherence assumption in order to obtain positive results. On the technical side, we contribute several new ideas on how to encode hard combinatorial problems in low-rank optimization problems. We hope that these techniques will be helpful in further understanding the computational limits of Matrix Completion and related problems.

preprint2014arXiv

Pseudorandomness for concentration bounds and signed majorities

The problem of constructing pseudorandom generators that fool halfspaces has been studied intensively in recent times. For fooling halfspaces over the hypercube with polynomially small error, the best construction known requires seed-length O(log^2 n) (MekaZ13). Getting the seed-length down to O(log(n)) is a natural challenge in its own right, which needs to be overcome in order to derandomize RL. In this work we make progress towards this goal by obtaining near-optimal generators for two important special cases: 1) We give a near optimal derandomization of the Chernoff bound for independent, uniformly random bits. Specifically, we show how to generate a x in {1,-1}^n using $\tilde{O}(\log (n/ε))$ random bits such that for any unit vector u, <u,x> matches the sub-Gaussian tail behaviour predicted by the Chernoff bound up to error eps. 2) We construct a generator which fools halfspaces with {0,1,-1} coefficients with error eps with a seed-length of $\tilde{O}(\log(n/ε))$. This includes the important special case of majorities. In both cases, the best previous results required seed-length of $O(\log n + \log^2(1/ε))$. Technically, our work combines new Fourier-analytic tools with the iterative dimension reduction techniques and the gradually increasing independence paradigm of previous works (KaneMN11, CelisRSW13, GopalanMRTV12).

preprint2013arXiv

Association schemes, non-commutative polynomial concentration, and sum-of-squares lower bounds for planted clique

Finding cliques in random graphs and the closely related "planted" clique variant, where a clique of size t is planted in a random G(n,1/2) graph, have been the focus of substantial study in algorithm design. Despite much effort, the best known polynomial-time algorithms only solve the problem for t = Theta(sqrt(n)). Here we show that beating sqrt(n) would require substantially new algorithmic ideas, by proving a lower bound for the problem in the sum-of-squares (or Lasserre) hierarchy, the most powerful class of semi-definite programming algorithms we know of: r rounds of the sum-of-squares hierarchy can only solve the planted clique for t > sqrt(n)/(C log n)^(r^2). Previously, no nontrivial lower bounds were known. Our proof is formulated as a degree lower bound in the Positivstellensatz algebraic proof system, which is equivalent to the sum-of-squares hierarchy. The heart of our (average-case) lower bound is a proof that a certain random matrix derived from the input graph is (with high probability) positive semidefinite. Two ingredients play an important role in this proof. The first is the classical theory of association schemes, applied to the average and variance of that random matrix. The second is a new large deviation inequality for matrix-valued polynomials. Our new tail estimate seems to be of independent interest and may find other applications, as it generalizes both the estimates on real-valued polynomials and on sums of independent random matrices.

preprint2013arXiv

Moment-Matching Polynomials

We give a new framework for proving the existence of low-degree, polynomial approximators for Boolean functions with respect to broad classes of non-product distributions. Our proofs use techniques related to the classical moment problem and deviate significantly from known Fourier-based methods, which require the underlying distribution to have some product structure. Our main application is the first polynomial-time algorithm for agnostically learning any function of a constant number of halfspaces with respect to any log-concave distribution (for any constant accuracy parameter). This result was not known even for the case of learning the intersection of two halfspaces without noise. Additionally, we show that in the "smoothed-analysis" setting, the above results hold with respect to distributions that have sub-exponential tails, a property satisfied by many natural and well-studied distributions in machine learning. Given that our algorithms can be implemented using Support Vector Machines (SVMs) with a polynomial kernel, these results give a rigorous theoretical explanation as to why many kernel methods work so well in practice.

preprint2012arXiv

A PRG for Lipschitz Functions of Polynomials with Applications to Sparsest Cut

We give improved pseudorandom generators (PRGs) for Lipschitz functions of low-degree polynomials over the hypercube. These are functions of the form psi(P(x)), where P is a low-degree polynomial and psi is a function with small Lipschitz constant. PRGs for smooth functions of low-degree polynomials have received a lot of attention recently and play an important role in constructing PRGs for the natural class of polynomial threshold functions. In spite of the recent progress, no nontrivial PRGs were known for fooling Lipschitz functions of degree O(log n) polynomials even for constant error rate. In this work, we give the first such generator obtaining a seed-length of (log n)\tilde{O}(d^2/eps^2) for fooling degree d polynomials with error eps. Previous generators had an exponential dependence on the degree. We use our PRG to get better integrality gap instances for sparsest cut, a fundamental problem in graph theory with many applications in graph optimization. We give an instance of uniform sparsest cut for which a powerful semi-definite relaxation (SDP) first introduced by Goemans and Linial and studied in the seminal work of Arora, Rao and Vazirani has an integrality gap of exp(Ω((log log n)^{1/2})). Understanding the performance of the Goemans-Linial SDP for uniform sparsest cut is an important open problem in approximation algorithms and metric embeddings and our work gives a near-exponential improvement over previous lower bounds which achieved a gap of Ω(log log n).

preprint2012arXiv

An Invariance Principle for Polytopes

Let X be randomly chosen from {-1,1}^n, and let Y be randomly chosen from the standard spherical Gaussian on R^n. For any (possibly unbounded) polytope P formed by the intersection of k halfspaces, we prove that |Pr [X belongs to P] - Pr [Y belongs to P]| < log^{8/5}k * Delta, where Delta is a parameter that is small for polytopes formed by the intersection of "regular" halfspaces (i.e., halfspaces with low influence). The novelty of our invariance principle is the polylogarithmic dependence on k. Previously, only bounds that were at least linear in k were known. We give two important applications of our main result: (1) A polylogarithmic in k bound on the Boolean noise sensitivity of intersections of k "regular" halfspaces (previous work gave bounds linear in k). (2) A pseudorandom generator (PRG) with seed length O((log n)*poly(log k,1/delta)) that delta-fools all polytopes with k faces with respect to the Gaussian distribution. We also obtain PRGs with similar parameters that fool polytopes formed by intersection of regular halfspaces over the hypercube. Using our PRG constructions, we obtain the first deterministic quasi-polynomial time algorithms for approximately counting the number of solutions to a broad class of integer programs, including dense covering problems and contingency tables.

preprint2012arXiv

Better Pseudorandom Generators from Milder Pseudorandom Restrictions

We present an iterative approach to constructing pseudorandom generators, based on the repeated application of mild pseudorandom restrictions. We use this template to construct pseudorandom generators for combinatorial rectangles and read-once CNFs and a hitting set generator for width-3 branching programs, all of which achieve near-optimal seed-length even in the low-error regime: We get seed-length O(log (n/epsilon)) for error epsilon. Previously, only constructions with seed-length O(\log^{3/2} n) or O(\log^2 n) were known for these classes with polynomially small error. The (pseudo)random restrictions we use are milder than those typically used for proving circuit lower bounds in that we only set a constant fraction of the bits at a time. While such restrictions do not simplify the functions drastically, we show that they can be derandomized using small-bias spaces.

preprint2012arXiv

Constructive Discrepancy Minimization by Walking on The Edges

Minimizing the discrepancy of a set system is a fundamental problem in combinatorics. One of the cornerstones in this area is the celebrated six standard deviations result of Spencer (AMS 1985): In any system of n sets in a universe of size n, there always exists a coloring which achieves discrepancy 6\sqrt{n}. The original proof of Spencer was existential in nature, and did not give an efficient algorithm to find such a coloring. Recently, a breakthrough work of Bansal (FOCS 2010) gave an efficient algorithm which finds such a coloring. His algorithm was based on an SDP relaxation of the discrepancy problem and a clever rounding procedure. In this work we give a new randomized algorithm to find a coloring as in Spencer's result based on a restricted random walk we call "Edge-Walk". Our algorithm and its analysis use only basic linear algebra and is "truly" constructive in that it does not appeal to the existential arguments, giving a new proof of Spencer's theorem and the partial coloring lemma.

preprint2012arXiv

DNF Sparsification and a Faster Deterministic Counting Algorithm

Given a DNF formula on n variables, the two natural size measures are the number of terms or size s(f), and the maximum width of a term w(f). It is folklore that short DNF formulas can be made narrow. We prove a converse, showing that narrow formulas can be sparsified. More precisely, any width w DNF irrespective of its size can be $ε$-approximated by a width $w$ DNF with at most $(w\log(1/ε))^{O(w)}$ terms. We combine our sparsification result with the work of Luby and Velikovic to give a faster deterministic algorithm for approximately counting the number of satisfying solutions to a DNF. Given a formula on n variables with poly(n) terms, we give a deterministic $n^{\tilde{O}(\log \log(n))}$ time algorithm that computes an additive $ε$ approximation to the fraction of satisfying assignments of f for $ε= 1/\poly(\log n)$. The previous best result due to Luby and Velickovic from nearly two decades ago had a run-time of $n^{\exp(O(\sqrt{\log \log n}))}$.

preprint2011arXiv

Making the long code shorter, with applications to the Unique Games Conjecture

The long code is a central tool in hardness of approximation, especially in questions related to the unique games conjecture. We construct a new code that is exponentially more efficient, but can still be used in many of these applications. Using the new code we obtain exponential improvements over several known results, including the following: 1. For any eps > 0, we show the existence of an n vertex graph G where every set of o(n) vertices has expansion 1 - eps, but G's adjacency matrix has more than exp(log^delta n) eigenvalues larger than 1 - eps, where delta depends only on eps. This answers an open question of Arora, Barak and Steurer (FOCS 2010) who asked whether one can improve over the noise graph on the Boolean hypercube that has poly(log n) such eigenvalues. 2. A gadget that reduces unique games instances with linear constraints modulo K into instances with alphabet k with a blowup of K^polylog(K), improving over the previously known gadget with blowup of 2^K. 3. An n variable integrality gap for Unique Games that that survives exp(poly(log log n)) rounds of the SDP + Sherali Adams hierarchy, improving on the previously known bound of poly(log log n). We show a connection between the local testability of linear codes and small set expansion in certain related Cayley graphs, and use this connection to derandomize the noise graph on the Boolean hypercube.

preprint2011arXiv

Pseudorandom Generators for Polynomial Threshold Functions

We study the natural question of constructing pseudorandom generators (PRGs) for low-degree polynomial threshold functions (PTFs). We give a PRG with seed-length log n/eps^{O(d)} fooling degree d PTFs with error at most eps. Previously, no nontrivial constructions were known even for quadratic threshold functions and constant error eps. For the class of degree 1 threshold functions or halfspaces, we construct PRGs with much better dependence on the error parameter eps and obtain a PRG with seed-length O(log n + log^2(1/eps)). Previously, only PRGs with seed length O(log n log^2(1/eps)/eps^2) were known for halfspaces. We also obtain PRGs with similar seed lengths for fooling halfspaces over the n-dimensional unit sphere. The main theme of our constructions and analysis is the use of invariance principles to construct pseudorandom generators. We also introduce the notion of monotone read-once branching programs, which is key to improving the dependence on the error rate eps for halfspaces. These techniques may be of independent interest.

preprint2010arXiv

Almost Optimal Explicit Johnson-Lindenstrauss Transformations

The Johnson-Lindenstrauss lemma is a fundamental result in probability with several applications in the design and analysis of algorithms in high dimensional geometry. Most known constructions of linear embeddings that satisfy the Johnson-Lindenstrauss property involve randomness. We address the question of explicitly constructing such embedding families and provide a construction with an almost optimal use of randomness: we use O(log(n/delta)log(log(n/delta)/epsilon)) random bits for embedding n dimensions to O(log(1/delta)/epsilon^2) dimensions with error probability at most delta, and distortion at most epsilon. In particular, for delta = 1/poly(n) and fixed epsilon, we use O(log n loglog n) random bits. Previous constructions required at least O(log^2 n) random bits to get polynomially small error.

preprint2010arXiv

Polynomial-Time Approximation Schemes for Knapsack and Related Counting Problems using Branching Programs

We give a deterministic, polynomial-time algorithm for approximately counting the number of {0,1}-solutions to any instance of the knapsack problem. On an instance of length n with total weight W and accuracy parameter eps, our algorithm produces a (1 + eps)-multiplicative approximation in time poly(n,log W,1/eps). We also give algorithms with identical guarantees for general integer knapsack, the multidimensional knapsack problem (with a constant number of constraints) and for contingency tables (with a constant number of rows). Previously, only randomized approximation schemes were known for these problems due to work by Morris and Sinclair and work by Dyer. Our algorithms work by constructing small-width, read-once branching programs for approximating the underlying solution space under a carefully chosen distribution. As a byproduct of this approach, we obtain new query algorithms for learning functions of k halfspaces with respect to the uniform distribution on {0,1}^n. The running time of our algorithm is polynomial in the accuracy parameter eps. Previously even for the case of k=2, only algorithms with an exponential dependence on eps were known.

preprint2009arXiv

Bounding the Sensitivity of Polynomial Threshold Functions

We give the first non-trivial upper bounds on the average sensitivity and noise sensitivity of polynomial threshold functions. More specifically, for a Boolean function f on n variables equal to the sign of a real, multivariate polynomial of total degree d we prove 1) The average sensitivity of f is at most O(n^{1-1/(4d+6)}) (we also give a combinatorial proof of the bound O(n^{1-1/2^d}). 2) The noise sensitivity of f with noise rate δis at most O(δ^{1/(4d+6)}). Previously, only bounds for the linear case were known. Along the way we show new structural theorems about random restrictions of polynomial threshold functions obtained via hypercontractivity. These structural results may be of independent interest as they provide a generic template for transforming problems related to polynomial threshold functions defined on the Boolean hypercube to polynomial threshold functions defined in Gaussian space.

Raghu Meka

What is connected

Connect this record

See the researcher in context

Building this map preview

31 published item(s)

Simple Mechanisms for Representing, Indexing and Manipulating Concepts

Sparsifying Sums of Positive Semidefinite Matrices

Explicit separations between randomized and deterministic Number-on-Forehead communication

Distributional Hardness Against Preconditioned Lasso via Erasure-Robust Designs

Minimax Optimality (Probably) Doesn't Imply Distribution Learning for GANs

Resolving Matrix Spencer Conjecture Up to Poly-logarithmic Rank

Smoothed Analysis of the Komlós Conjecture

Efficient Algorithms for Outlier-Robust Regression

Learning Polynomials of Few Relevant Dimensions

Learning Some Popular Gaussian Graphical Models without Condition Number Bounds

Online Discrepancy Minimization for Stochastic Arrivals

Explicit resilient functions matching Ajtai-Linial

A polynomial time approximation scheme for computing the supremum of Gaussian processes

Almost Optimal Pseudorandom Generators for Spherical Caps

Anti-concentration for polynomials of independent random variables

Pseudorandomness via the discrete Fourier transform

Sum-of-squares lower bounds for planted clique

Computational Limits for Matrix Completion

Pseudorandomness for concentration bounds and signed majorities

Association schemes, non-commutative polynomial concentration, and sum-of-squares lower bounds for planted clique

Moment-Matching Polynomials

A PRG for Lipschitz Functions of Polynomials with Applications to Sparsest Cut

An Invariance Principle for Polytopes

Better Pseudorandom Generators from Milder Pseudorandom Restrictions

Constructive Discrepancy Minimization by Walking on The Edges

DNF Sparsification and a Faster Deterministic Counting Algorithm

Making the long code shorter, with applications to the Unique Games Conjecture

Pseudorandom Generators for Polynomial Threshold Functions

Almost Optimal Explicit Johnson-Lindenstrauss Transformations

Polynomial-Time Approximation Schemes for Knapsack and Related Counting Problems using Branching Programs

Bounding the Sensitivity of Polynomial Threshold Functions