Researcher profile

Pravesh K. Kothari

Pravesh K. Kothari contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
12works
0followers
10topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

12 published item(s)

preprint2026arXiv

Learning Mixture Models via Efficient High-dimensional Sparse Fourier Transforms

In this work, we give a ${\rm poly}(d,k)$ time and sample algorithm for efficiently learning the parameters of a mixture of $k$ spherical distributions in $d$ dimensions. Unlike all previous methods, our techniques apply to heavy-tailed distributions and include examples that do not even have finite covariances. Our method succeeds whenever the cluster distributions have a characteristic function with sufficiently heavy tails. Such distributions include the Laplace distribution but crucially exclude Gaussians. All previous methods for learning mixture models relied implicitly or explicitly on the low-degree moments. Even for the case of Laplace distributions, we prove that any such algorithm must use super-polynomially many samples. Our method thus adds to the short list of techniques that bypass the limitations of the method of moments. Somewhat surprisingly, our algorithm does not require any minimum separation between the cluster means. This is in stark contrast to spherical Gaussian mixtures where a minimum $\ell_2$-separation is provably necessary even information-theoretically [Regev and Vijayaraghavan '17]. Our methods compose well with existing techniques and allow obtaining ''best of both worlds" guarantees for mixtures where every component either has a heavy-tailed characteristic function or has a sub-Gaussian tail with a light-tailed characteristic function. Our algorithm is based on a new approach to learning mixture models via efficient high-dimensional sparse Fourier transforms. We believe that this method will find more applications to statistical estimation. As an example, we give an algorithm for consistent robust mean estimation against noise-oblivious adversaries, a model practically motivated by the literature on multiple hypothesis testing. It was formally proposed in a recent Master's thesis by one of the authors, and has already inspired follow-up works.

preprint2026arXiv

Rigorous Implications of the Low-Degree Heuristic

Over the past decade, the low-degree heuristic has been used to estimate the algorithmic thresholds for a wide range of average-case planted vs null distinguishing problems. Such results rely on the hypothesis that if the low-degree moments of the planted and null distributions are sufficiently close, then no efficient (noise-tolerant) algorithm can distinguish between them. This hypothesis is appealing due to the simplicity of calculating the low-degree likelihood ratio (LDLR) -- a quantity that measures the similarity between low-degree moments. However, despite sustained interest in the area, it remains unclear whether low-degree indistinguishability actually rules out any interesting class of algorithms. In this work, we initiate the study and develop technical tools for translating LDLR upper bounds to rigorous lower bounds against concrete algorithms. As a consequence, we prove: for any permutation-invariant distribution $\mathsf{P}$, 1. If $\mathsf{P}$ is over $\{0,1\}^n$ and is low-degree indistinguishable from $U = \mathrm{Unif}(\{0,1\}^n)$, then a noisy version of $\mathsf{P}$ is statistically indistinguishable from $U$. 2. If $\mathsf{P}$ is over $\mathbb{R}^n$ and is low-degree indistinguishable from the standard Gaussian ${N}(0, 1)^n$, then no statistic based on symmetric polynomials of degree at most $O(\log n/\log \log n)$ can distinguish between a noisy version of $\mathsf{P}$ from ${N}(0, 1)^n$. 3. If $\mathsf{P}$ is over $\mathbb{R}^{n\times n}$ and is low-degree indistinguishable from ${N}(0,1)^{n\times n}$, then no constant-sized subgraph statistic can distinguish between a noisy version of $\mathsf{P}$ and ${N}(0, 1)^{n\times n}$.

preprint2026arXiv

Sparsifying Sums of Positive Semidefinite Matrices

In this paper, we revisit spectral sparsification for sums of arbitrary positive semidefinite (PSD) matrices. Concretely, for any collection of PSD matrices $\mathcal{A} = \{A_1, A_2, \ldots, A_r\} \subset \mathbb{R}^{n \times n}$, given any subset $T \subseteq [r]$, our goal is to find sparse weights $μ\in \mathbb{R}_{\geq 0}^r$ such that $(1 - ε) \sum_{i \in T} A_i \preceq \sum_{i \in T} μ_i A_i \preceq (1 + ε) \sum_{i \in T} A_i.$ This generalizes spectral sparsification of graphs which corresponds to $\mathcal{A}$ being the set of Laplacians of edges. It also captures sparsifying Cayley graphs by choosing a subset of generators. The former has been extensively studied with optimal sparsifiers known. The latter has received attention recently and was solved for a few special groups (e.g., $\mathbb{F}_2^n$). Prior work shows any sum of PSD matrices can be sparsified down to $O(n)$ elements. This bound however turns out to be too coarse and in particular yields no non-trivial bound for building Cayley sparsifiers for Cayley graphs. In this work, we develop a new, instance-specific (i.e., specific to a given collection $\mathcal{A}$) theory of PSD matrix sparsification based on a new parameter $N^*(\mathcal{A})$ which we call connectivity threshold that generalizes the threshold of the number of edges required to make a graph connected. Our main result gives a sparsifier that uses at most $O(ε^{-2}N^*(\mathcal{A}) (\log n)(\log r))$ matrices and is constructible in randomized polynomial time. We also show that we need $N^*(\mathcal{A})$ elements to sparsify for any $ε< 0.99$. As the main application of our framework, we prove that any Cayley graph can be sparsified to $O(ε^{-2}\log^4 N)$ generators. Previously, a non-trivial bound on Cayley sparsifiers was known only in the case when the group is $\mathbb{F}_2^n$.

preprint2022arXiv

A simple and sharper proof of the hypergraph Moore bound

The hypergraph Moore bound is an elegant statement that characterizes the extremal trade-off between the girth - the number of hyperedges in the smallest cycle or even cover (a subhypergraph with all degrees even) and size - the number of hyperedges in a hypergraph. For graphs (i.e., $2$-uniform hypergraphs), a bound tight up to the leading constant was proven in a classical work of Alon, Hoory and Linial [AHL02]. For hypergraphs of uniformity $k>2$, an appropriate generalization was conjectured by Feige [Fei08]. The conjecture was settled up to an additional $\log^{4k+1} n$ factor in the size in a recent work of Guruswami, Kothari and Manohar [GKM21]. Their argument relies on a connection between the existence of short even covers and the spectrum of a certain randomly signed Kikuchi matrix. Their analysis, especially for the case of odd $k$, is significantly complicated. In this work, we present a substantially simpler and shorter proof of the hypergraph Moore bound. Our key idea is the use of a new reweighted Kikuchi matrix and an edge deletion step that allows us to drop several involved steps in [GKM21]&#39;s analysis such as combinatorial bucketing of rows of the Kikuchi matrix and the use of the Schudy-Sviridenko polynomial concentration. Our simpler proof also obtains tighter parameters: in particular, the argument gives a new proof of the classical Moore bound of [AHL02] with no loss (the proof in [GKM21] loses a $\log^3 n$ factor), and loses only a single logarithmic factor for all $k>2$-uniform hypergraphs. As in [GKM21], our ideas naturally extend to yield a simpler proof of the full trade-off for strongly refuting smoothed instances of constraint satisfaction problems with similarly improved parameters.

preprint2022arXiv

Approximating Max-Cut on Bounded Degree Graphs: Tighter Analysis of the FKL Algorithm

In this note, we describe a $α_{GW} + \tildeΩ(1/d^2)$-factor approximation algorithm for Max-Cut on weighted graphs of degree $\leq d$. Here, $α_{GW}\approx 0.878$ is the worst-case approximation ratio of the Goemans-Williamson rounding for Max-Cut. This improves on previous results for unweighted graphs by Feige, Karpinski, and Langberg and Florén. Our guarantee is obtained by a tighter analysis of the solution obtained by applying a natural local improvement procedure to the Goemans-Williamson rounding of the basic SDP strengthened with triangle inequalities.

preprint2022arXiv

Bypassing the XOR Trick: Stronger Certificates for Hypergraph Clique Number

Let $\mathcal{H}(k,n,p)$ be the distribution on $k$-uniform hypergraphs where every subset of $[n]$ of size $k$ is included as an hyperedge with probability $p$ independently. In this work, we design and analyze a simple spectral algorithm that certifies a bound on the size of the largest clique, $ω(H)$, in hypergraphs $H \sim \mathcal{H}(k,n,p)$. For example, for any constant $p$, with high probability over the choice of the hypergraph, our spectral algorithm certifies a bound of $\tilde{O}(\sqrt{n})$ on the clique number in polynomial time. This matches, up to $\textrm{polylog}(n)$ factors, the best known certificate for the clique number in random graphs, which is the special case of $k = 2$. Prior to our work, the best known refutation algorithms [CGL04, AOW15] rely on a reduction to the problem of refuting random $k$-XOR via Feige&#39;s XOR trick [Fei02], and yield a polynomially worse bound of $\tilde{O}(n^{3/4})$ on the clique number when $p = O(1)$. Our algorithm bypasses the XOR trick and relies instead on a natural generalization of the Lovasz theta semidefinite programming relaxation for cliques in hypergraphs.

preprint2022arXiv

List-Decodable Covariance Estimation

We give the first polynomial time algorithm for \emph{list-decodable covariance estimation}. For any $α> 0$, our algorithm takes input a sample $Y \subseteq \mathbb{R}^d$ of size $n\geq d^{\mathsf{poly}(1/α)}$ obtained by adversarially corrupting an $(1-α)n$ points in an i.i.d. sample $X$ of size $n$ from the Gaussian distribution with unknown mean $μ_*$ and covariance $Σ_*$. In $n^{\mathsf{poly}(1/α)}$ time, it outputs a constant-size list of $k = k(α)= (1/α)^{\mathsf{poly}(1/α)}$ candidate parameters that, with high probability, contains a $(\hatμ,\hatΣ)$ such that the total variation distance $TV(\mathcal{N}(μ_*,Σ_*),\mathcal{N}(\hatμ,\hatΣ))<1-O_α(1)$. This is the statistically strongest notion of distance and implies multiplicative spectral and relative Frobenius distance approximation for parameters with dimension independent error. Our algorithm works more generally for $(1-α)$-corruptions of any distribution $D$ that possesses low-degree sum-of-squares certificates of two natural analytic properties: 1) anti-concentration of one-dimensional marginals and 2) hypercontractivity of degree 2 polynomials. Prior to our work, the only known results for estimating covariance in the list-decodable setting were for the special cases of list-decodable linear regression and subspace recovery due to Karmarkar, Klivans, and Kothari (2019), Raghavendra and Yau (2019 and 2020) and Bakshi and Kothari (2020). These results need superpolynomial time for obtaining any subconstant error in the underlying dimension. Our result implies the first polynomial-time \emph{exact} algorithm for list-decodable linear regression and subspace recovery that allows, in particular, to obtain $2^{-\mathsf{poly}(d)}$ error in polynomial-time. Our result also implies an improved algorithm for clustering non-spherical mixtures.

preprint2022arXiv

Polynomial-Time Power-Sum Decomposition of Polynomials

We give efficient algorithms for finding power-sum decomposition of an input polynomial $P(x)= \sum_{i\leq m} p_i(x)^d$ with component $p_i$s. The case of linear $p_i$s is equivalent to the well-studied tensor decomposition problem while the quadratic case occurs naturally in studying identifiability of non-spherical Gaussian mixtures from low-order moments. Unlike tensor decomposition, both the unique identifiability and algorithms for this problem are not well-understood. For the simplest setting of quadratic $p_i$s and $d=3$, prior work of Ge, Huang and Kakade yields an algorithm only when $m \leq \tilde{O}(\sqrt{n})$. On the other hand, the more general recent result of Garg, Kayal and Saha builds an algebraic approach to handle any $m=n^{O(1)}$ components but only when $d$ is large enough (while yielding no bounds for $d=3$ or even $d=100$) and only handles an inverse exponential noise. Our results obtain a substantial quantitative improvement on both the prior works above even in the base case of $d=3$ and quadratic $p_i$s. Specifically, our algorithm succeeds in decomposing a sum of $m \sim \tilde{O}(n)$ generic quadratic $p_i$s for $d=3$ and more generally the $d$th power-sum of $m \sim n^{2d/15}$ generic degree-$K$ polynomials for any $K \geq 2$. Our algorithm relies only on basic numerical linear algebraic primitives, is exact (i.e., obtain arbitrarily tiny error up to numerical precision), and handles an inverse polynomial noise when the $p_i$s have random Gaussian coefficients. Our main tool is a new method for extracting the linear span of $p_i$s by studying the linear subspace of low-order partial derivatives of the input $P$. For establishing polynomial stability of our algorithm in average-case, we prove inverse polynomial bounds on the smallest singular value of certain correlated random matrices with low-degree polynomial entries that arise in our analyses.

preprint2021arXiv

List-Decodable Subspace Recovery: Dimension Independent Error in Polynomial Time

In list-decodable subspace recovery, the input is a collection of $n$ points $αn$ (for some $α\ll 1/2$) of which are drawn i.i.d. from a distribution $\mathcal{D}$ with a isotropic rank $r$ covariance $Π_*$ (the \emph{inliers}) and the rest are arbitrary, potential adversarial outliers. The goal is to recover a $O(1/α)$ size list of candidate covariances that contains a $\hatΠ$ close to $Π_*$. Two recent independent works (Raghavendra-Yau, Bakshi-Kothari 2020) gave the first efficient algorithm for this problem. These results, however, obtain an error that grows with the dimension (linearly in [RY] and logarithmically in BK) at the cost of quasi-polynomial running time) and rely on \emph{certifiable anti-concentration} - a relatively strict condition satisfied essentially only by the Gaussian distribution. In this work, we improve on these results on all three fronts: \emph{dimension-independent} error via a faster fixed-polynomial running time under less restrictive distributional assumptions. Specifically, we give a $poly(1/α) d^{O(1)}$ time algorithm that outputs a list containing a $\hatΠ$ satisfying $\|\hatΠ -Π_*\|_F \leq O(1/α)$. Our result only needs $\mathcal{D}$ to have \emph{certifiably hypercontractive} degree 2 polynomials. As a result, in addition to Gaussians, our algorithm applies to the uniform distribution on the hypercube and $q$-ary cubes and arbitrary product distributions with subgaussian marginals. Prior work (Raghavendra and Yau, 2020) had identified such distributions as potential hard examples as such distributions do not exhibit strong enough anti-concentration. When $\mathcal{D}$ satisfies certifiable anti-concentration, we obtain a stronger error guarantee of $\|\hatΠ-Π_*\|_F \leq η$ for any arbitrary $η> 0$ in $d^{O(poly(1/α) + \log (1/η))}$ time.

preprint2020arXiv

Efficient Algorithms for Outlier-Robust Regression

We give the first polynomial-time algorithm for performing linear or polynomial regression resilient to adversarial corruptions in both examples and labels. Given a sufficiently large (polynomial-size) training set drawn i.i.d. from distribution D and subsequently corrupted on some fraction of points, our algorithm outputs a linear function whose squared error is close to the squared error of the best-fitting linear function with respect to D, assuming that the marginal distribution of D over the input space is \emph{certifiably hypercontractive}. This natural property is satisfied by many well-studied distributions such as Gaussian, strongly log-concave distributions and, uniform distribution on the hypercube among others. We also give a simple statistical lower bound showing that some distributional assumption is necessary to succeed in this setting. These results are the first of their kind and were not known to be even information-theoretically possible prior to our work. Our approach is based on the sum-of-squares (SoS) method and is inspired by the recent applications of the method for parameter recovery problems in unsupervised learning. Our algorithm can be seen as a natural convex relaxation of the following conceptually simple non-convex optimization problem: find a linear function and a large subset of the input corrupted sample such that the least squares loss of the function over the subset is minimized over all possible large subsets.

preprint2020arXiv

Strongly refuting all semi-random Boolean CSPs

We give an efficient algorithm to strongly refute \emph{semi-random} instances of all Boolean constraint satisfaction problems. The number of constraints required by our algorithm matches (up to polylogarithmic factors) the best-known bounds for efficient refutation of fully random instances. Our main technical contribution is an algorithm to strongly refute semi-random instances of the Boolean $k$-XOR problem on $n$ variables that have $\widetilde{O}(n^{k/2})$ constraints. (In a semi-random $k$-XOR instance, the equations can be arbitrary and only the right-hand sides are random.) One of our key insights is to identify a simple combinatorial property of random XOR instances that makes spectral refutation work. Our approach involves taking an instance that does not satisfy this property (i.e., is \emph{not} pseudorandom) and reducing it to a partitioned collection of $2$-XOR instances. We analyze these subinstances using a carefully chosen quadratic form as a proxy, which in turn is bounded via a combination of spectral methods and semidefinite programming. The analysis of our spectral bounds relies only on an off-the-shelf matrix Bernstein inequality. Even for the purely random case, this leads to a shorter proof compared to the ones in the literature that rely on problem-specific trace-moment computations.

preprint2020arXiv

Time-Space Tradeoffs for Distinguishing Distributions and Applications to Security of Goldreich&#39;s PRG

In this work, we establish lower-bounds against memory bounded algorithms for distinguishing between natural pairs of related distributions from samples that arrive in a streaming setting. In our first result, we show that any algorithm that distinguishes between uniform distribution on $\{0,1\}^n$ and uniform distribution on an $n/2$-dimensional linear subspace of $\{0,1\}^n$ with non-negligible advantage needs $2^{Ω(n)}$ samples or $Ω(n^2)$ memory. Our second result applies to distinguishing outputs of Goldreich&#39;s local pseudorandom generator from the uniform distribution on the output domain. Specifically, Goldreich&#39;s pseudorandom generator $G$ fixes a predicate $P:\{0,1\}^k \rightarrow \{0,1\}$ and a collection of subsets $S_1, S_2, \ldots, S_m \subseteq [n]$ of size $k$. For any seed $x \in \{0,1\}^n$, it outputs $P(x_{S_1}), P(x_{S_2}), \ldots, P(x_{S_m})$ where $x_{S_i}$ is the projection of $x$ to the coordinates in $S_i$. We prove that whenever $P$ is $t$-resilient (all non-zero Fourier coefficients of $(-1)^P$ are of degree $t$ or higher), then no algorithm, with $<n^ε$ memory, can distinguish the output of $G$ from the uniform distribution on $\{0,1\}^m$ with a large inverse polynomial advantage, for stretch $m \le \left(\frac{n}{t}\right)^{\frac{(1-ε)}{36}\cdot t}$ (barring some restrictions on $k$). The lower bound holds in the streaming model where at each time step $i$, $S_i\subseteq [n]$ is a randomly chosen (ordered) subset of size $k$ and the distinguisher sees either $P(x_{S_i})$ or a uniformly random bit along with $S_i$. Our proof builds on the recently developed machinery for proving time-space trade-offs (Raz 2016 and follow-ups) for search/learning problems.