Researcher profile

David Gamarnik

David Gamarnik contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
15works
0followers
13topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

15 published item(s)

preprint2026arXiv

Price of Quality: Sufficient Conditions for Sparse Recovery using Mixed-Quality Data

We study sparse recovery when observations come from mixed-quality sources: a small collection of high-quality measurements with small noise variance and a larger collection of lower-quality measurements with higher variance. For this heterogeneous-noise setting, we establish sample-size conditions for information-theoretic and algorithmic recovery. On the information-theoretic side, we show that it is sufficient for $(n_1, n_2)$ to satisfy a linear trade-off defining the Price of Quality: the number of low-quality samples needed to replace one high-quality sample. In the agnostic setting, where the decoder is completely agnostic to the quality of the data, it is uniformly bounded, and in particular one high-quality sample is never worth more than two low-quality samples for this sufficient condition to hold. In the informed setting, where the decoder is informed of per-sample variances, the price of quality can grow arbitrarily large. On the algorithmic side, we analyze the LASSO in the agnostic setting and show that the recovery threshold matches the homogeneous-noise case and only depends on the average noise level, revealing a striking robustness of computational recovery to data heterogeneity. Together, these results give the first conditions for sparse recovery with mixed-quality data and expose a fundamental difference between how the information-theoretic and algorithmic thresholds adapt to changes in data quality.

preprint2022arXiv

Algorithms and Barriers in the Symmetric Binary Perceptron Model

The symmetric binary perceptron ($\texttt{SBP}$) exhibits a dramatic statistical-to-computational gap: the densities at which known efficient algorithms find solutions are far below the threshold for the existence of solutions. Furthermore, the $\texttt{SBP}$ exhibits a striking structural property: at all positive constraint densities almost all of its solutions are 'totally frozen' singletons separated by large Hamming distance \cite{perkins2021frozen,abbe2021proof}. This suggests that finding a solution to the $\texttt{SBP}$ may be computationally intractable. At the same time, the $\texttt{SBP}$ does admit polynomial-time search algorithms at low enough densities. A conjectural explanation for this conundrum was put forth in \cite{baldassi2020clustering}: efficient algorithms succeed in the face of freezing by finding exponentially rare clusters of large size. However, it was discovered recently that such rare large clusters exist at all subcritical densities, even at those well above the limits of known efficient algorithms \cite{abbe2021binary}. Thus the driver of the statistical-to-computational gap exhibited by this model remains a mystery. In this paper, we conduct a different landscape analysis to explain the algorithmic tractability of this problem. We show that at high enough densities the $\texttt{SBP}$ exhibits the multi Overlap Gap Property ($m-$OGP), an intricate geometrical property known to be a rigorous barrier for large classes of algorithms. Our analysis shows that the $m-$OGP threshold (a) is well below the satisfiability threshold; and (b) matches the best known algorithmic threshold up to logarithmic factors as $m\to\infty$. We then prove that the $m-$OGP rules out the class of stable algorithms for the $\texttt{SBP}$ above this threshold. We conjecture that the $m \to \infty$ limit of the $m$-OGP threshold marks the algorithmic threshold for the problem.

preprint2022arXiv

Circuit Lower Bounds for the p-Spin Optimization Problem

We consider the problem of finding a near ground state of a $p$-spin model with Rademacher couplings by means of a low-depth circuit. As a direct extension of the authors' recent work [Gamarnik, Jagannath, Wein 2020], we establish that any poly-size $n$-output circuit that produces a spin assignment with objective value within a certain constant factor of optimality, must have depth at least $\log n/(2\log\log n)$ as $n$ grows. This is stronger than the known state of the art bounds of the form $Ω(\log n/(k(n)\log\log n))$ for similar combinatorial optimization problems, where $k(n)$ depends on the optimality value. For example, for the largest clique problem $k(n)$ corresponds to the square of the size of the clique [Rossman 2010]. At the same time our results are not quite comparable since in our case the circuits are required to produce a solution itself rather than solving the associated decision problem. As in our earlier work, the approach is based on the overlap gap property (OGP) exhibited by random $p$-spin models, but the derivation of the circuit lower bound relies further on standard facts from Fourier analysis on the Boolean cube, in particular the Linial-Mansour-Nisan Theorem. To the best of our knowledge, this is the first instance when methods from spin glass theory have ramifications for circuit complexity.

preprint2022arXiv

Hardness of Random Optimization Problems for Boolean Circuits, Low-Degree Polynomials, and Langevin Dynamics

We consider the problem of finding nearly optimal solutions of optimization problems with random objective functions. Two concrete problems we consider are (a) optimizing the Hamiltonian of a spherical or Ising $p$-spin glass model, and (b) finding a large independent set in a sparse Erdős-Rényi graph. The following families of algorithms are considered: (a) low-degree polynomials of the input; (b) low-depth Boolean circuits; (c) the Langevin dynamics algorithm. We show that these families of algorithms fail to produce nearly optimal solutions with high probability. For the case of Boolean circuits, our results improve the state-of-the-art bounds known in circuit complexity theory (although we consider the search problem as opposed to the decision problem). Our proof uses the fact that these models are known to exhibit a variant of the overlap gap property (OGP) of near-optimal solutions. Specifically, for both models, every two solutions whose objectives are above a certain threshold are either close or far from each other. The crux of our proof is that the classes of algorithms we consider exhibit a form of stability. We show by an interpolation argument that stable algorithms cannot overcome the OGP barrier. The stability of Langevin dynamics is an immediate consequence of the well-posedness of stochastic differential equations. The stability of low-degree polynomials and Boolean circuits is established using tools from Gaussian and Boolean analysis -- namely hypercontractivity and total influence, as well as a novel lower bound for random walks avoiding certain subsets. In the case of Boolean circuits, the result also makes use of Linal-Mansour-Nisan's classical theorem. Our techniques apply more broadly to low influence functions and may apply more generally.

preprint2021arXiv

Algorithmic Obstructions in the Random Number Partitioning Problem

We consider the algorithmic problem of finding a near-optimal solution for the number partitioning problem (NPP). The NPP appears in many applications, including the design of randomized controlled trials, multiprocessor scheduling, and cryptography; and is also of theoretical significance. It possesses a so-called statistical-to-computational gap: when its input $X$ has distribution $\mathcal{N}(0,I_n)$, its optimal value is $Θ(\sqrt{n}2^{-n})$ w.h.p.; whereas the best polynomial-time algorithm achieves an objective value of only $2^{-Θ(\log^2 n)}$, w.h.p. In this paper, we initiate the study of the nature of this gap. Inspired by insights from statistical physics, we study the landscape of NPP and establish the presence of the Overlap Gap Property (OGP), an intricate geometric property which is known to be a rigorous evidence of an algorithmic hardness for large classes of algorithms. By leveraging the OGP, we establish that (a) any sufficiently stable algorithm, appropriately defined, fails to find a near-optimal solution with energy below $2^{-ω(n \log^{-1/5} n)}$; and (b) a very natural MCMC dynamics fails to find near-optimal solutions. Our simulations suggest that the state of the art algorithm achieving $2^{-Θ(\log^2 n)}$ is indeed stable, but formally verifying this is left as an open problem. OGP regards the overlap structure of $m-$tuples of solutions achieving a certain objective value. When $m$ is constant we prove the presence of OGP in the regime $2^{-Θ(n)}$, and the absence of it in the regime $2^{-o(n)}$. Interestingly, though, by considering overlaps with growing values of $m$ we prove the presence of the OGP up to the level $2^{-ω(\sqrt{n\log n})}$. Our proof of the failure of stable algorithms at values $2^{-ω(n \log^{-1/5} n)}$ employs methods from Ramsey Theory from the extremal combinatorics, and is of independent interest.

preprint2021arXiv

Self-Regularity of Non-Negative Output Weights for Overparameterized Two-Layer Neural Networks

We consider the problem of finding a two-layer neural network with sigmoid, rectified linear unit (ReLU), or binary step activation functions that "fits" a training data set as accurately as possible as quantified by the training error; and study the following question: \emph{does a low training error guarantee that the norm of the output layer (outer norm) itself is small?} We answer affirmatively this question for the case of non-negative output weights. Using a simple covering number argument, we establish that under quite mild distributional assumptions on the input/label pairs; any such network achieving a small training error on polynomially many data necessarily has a well-controlled outer norm. Notably, our results (a) have a polynomial (in $d$) sample complexity, (b) are independent of the number of hidden units (which can potentially be very high), (c) are oblivious to the training algorithm; and (d) require quite mild assumptions on the data (in particular the input vector $X\in\mathbb{R}^d$ need not have independent coordinates). We then leverage our bounds to establish generalization guarantees for such networks through \emph{fat-shattering dimension}, a scale-sensitive measure of the complexity class that the network architectures we investigate belong to. Notably, our generalization bounds also have good sample complexity (polynomials in $d$ with a low degree), and are in fact near-linear for some important cases of interest.

preprint2020arXiv

Estimation of Monotone Multi-Index Models

In a multi-index model with $k$ index vectors, the input variables are transformed by taking inner products with the index vectors. A transfer function $f: \mathbb{R}^k \to \mathbb{R}$ is applied to these inner products to generate the output. Thus, multi-index models are a generalization of linear models. In this paper, we consider monotone multi-index models. Namely, the transfer function is assumed to be coordinate-wise monotone. The monotone multi-index model therefore generalizes both linear regression and isotonic regression, which is the estimation of a coordinate-wise monotone function. We consider the case of nonnegative index vectors. We provide an algorithm based on integer programming for the estimation of monotone multi-index models, and provide guarantees on the $L_2$ loss of the estimated function relative to the ground truth.

preprint2020arXiv

Neural Networks and Polynomial Regression. Demystifying the Overparametrization Phenomena

In the context of neural network models, overparametrization refers to the phenomena whereby these models appear to generalize well on the unseen data, even though the number of parameters significantly exceeds the sample sizes, and the model perfectly fits the in-training data. A conventional explanation of this phenomena is based on self-regularization properties of algorithms used to train the data. In this paper we prove a series of results which provide a somewhat diverging explanation. Adopting a teacher/student model where the teacher network is used to generate the predictions and student network is trained on the observed labeled data, and then tested on out-of-sample data, we show that any student network interpolating the data generated by a teacher network generalizes well, provided that the sample size is at least an explicit quantity controlled by data dimension and approximation guarantee alone, regardless of the number of internal nodes of either teacher or student network. Our claim is based on approximating both teacher and student networks by polynomial (tensor) regression models with degree depending on the desired accuracy and network depth only. Such a parametrization notably does not depend on the number of internal nodes. Thus a message implied by our results is that parametrizing wide neural networks by the number of hidden nodes is misleading, and a more fitting measure of parametrization complexity is the number of regression coefficients associated with tensorized data. In particular, this somewhat reconciles the generalization ability of neural networks with more classical statistical notions of data complexity and generalization bounds. Our empirical results on MNIST and Fashion-MNIST datasets indeed confirm that tensorized regression achieves a good out-of-sample performance, even when the degree of the tensor is at most two.

preprint2020arXiv

Stability, memory, and messaging tradeoffs in heterogeneous service systems

We consider a heterogeneous distributed service system, consisting of $n$ servers with unknown and possibly different processing rates. Jobs with unit mean and independent processing times arrive as a renewal process of rate $λn$, with $0<λ<1$, to the system. Incoming jobs are immediately dispatched to one of several queues associated with the $n$ servers. We assume that the dispatching decisions are made by a central dispatcher endowed with a finite memory, and with the ability to exchange messages with the servers. We study the fundamental resource requirements (memory bits and message exchange rate) in order for a dispatching policy to be {\bf maximally stable}, i.e., stable whenever the processing rates are such that the arrival rate is less than the total available processing rate. First, for the case of Poisson arrivals and exponential service times, we present a policy that is maximally stable while using a positive (but arbitrarily small) message rate, and $\log_2(n)$ bits of memory. Second, we show that within a certain broad class of policies, a dispatching policy that exchanges $o\big(n^2\big)$ messages per unit of time, and with $o(\log(n))$ bits of memory, cannot be maximally stable. Thus, as long as the message rate is not too excessive, a logarithmic memory is necessary and sufficient for maximal stability.

preprint2020arXiv

Stationary Points of Shallow Neural Networks with Quadratic Activation Function

We consider the teacher-student setting of learning shallow neural networks with quadratic activations and planted weight matrix $W^*\in\mathbb{R}^{m\times d}$, where $m$ is the width of the hidden layer and $d\le m$ is the data dimension. We study the optimization landscape associated with the empirical and the population squared risk of the problem. Under the assumption the planted weights are full-rank we obtain the following results. First, we establish that the landscape of the empirical risk admits an &#34;energy barrier&#34; separating rank-deficient $W$ from $W^*$: if $W$ is rank deficient, then its risk is bounded away from zero by an amount we quantify. We then couple this result by showing that, assuming number $N$ of samples grows at least like a polynomial function of $d$, all full-rank approximate stationary points of the empirical risk are nearly global optimum. These two results allow us to prove that gradient descent, when initialized below the energy barrier, approximately minimizes the empirical risk and recovers the planted weights in polynomial-time. Next, we show that initializing below this barrier is in fact easily achieved when the weights are randomly generated under relatively weak assumptions. We show that provided the network is sufficiently overparametrized, initializing with an appropriate multiple of the identity suffices to obtain a risk below the energy barrier. At a technical level, the last result is a consequence of the semicircle law for the Wishart ensemble and could be of independent interest. Finally, we study the minimizers of the empirical risk and identify a simple necessary and sufficient geometric condition on the training data under which any minimizer has necessarily zero generalization error. We show that as soon as $N\ge N^*=d(d+1)/2$, randomly generated data enjoys this geometric condition almost surely, while that ceases to be true if $N<N^*$.

preprint2020arXiv

The Quantum Approximate Optimization Algorithm Needs to See the Whole Graph: A Typical Case

The Quantum Approximate Optimization Algorithm can naturally be applied to combinatorial search problems on graphs. The quantum circuit has p applications of a unitary operator that respects the locality of the graph. On a graph with bounded degree, with p small enough, measurements of distant qubits in the state output by the QAOA give uncorrelated results. We focus on finding big independent sets in random graphs with dn/2 edges keeping d fixed and n large. Using the Overlap Gap Property of almost optimal independent sets in random graphs, and the locality of the QAOA, we are able to show that if p is less than a d-dependent constant times log n, the QAOA cannot do better than finding an independent set of size .854 times the optimal for d large. Because the logarithm is slowly growing, even at one million qubits we can only show that the algorithm is blocked if p is in single digits. At higher p the algorithm &#34;sees&#34; the whole graph and we have no indication that performance is limited.

preprint2020arXiv

The Quantum Approximate Optimization Algorithm Needs to See the Whole Graph: Worst Case Examples

The Quantum Approximate Optimization Algorithm can be applied to search problems on graphs with a cost function that is a sum of terms corresponding to the edges. When conjugating an edge term, the QAOA unitary at depth p produces an operator that depends only on the subgraph consisting of edges that are at most p away from the edge in question. On random d-regular graphs, with d fixed and with p a small constant time log n, these neighborhoods are almost all trees and so the performance of the QAOA is determined only by how it acts on an edge in the middle of tree. Both bipartite random d-regular graphs and general random d-regular graphs locally are trees so the QAOA&#39;s performance is the same on these two ensembles. Using this we can show that the QAOA with $(d-1)^{2p} < n^A$ for any $A<1$, can only achieve an approximation ratio of 1/2 for Max-Cut on bipartite random d-regular graphs for d large. For Maximum Independent Set, in the same setting, the best approximation ratio is a d-dependent constant that goes to 0 as d gets big.

preprint2019arXiv

The Landscape of the Planted Clique Problem: Dense subgraphs and the Overlap Gap Property

In this paper we study the computational-statistical gap of the planted clique problem, where a clique of size $k$ is planted in an Erdos Renyi graph $G(n,\frac{1}{2})$ resulting in a graph $G\left(n,\frac{1}{2},k\right)$. The goal is to recover the planted clique vertices by observing $G\left(n,\frac{1}{2},k\right)$ . It is known that the clique can be recovered as long as $k \geq \left(2+ε\right)\log n $ for any $ε>0$, but no polynomial-time algorithm is known for this task unless $k=Ω\left(\sqrt{n} \right)$. Following a statistical-physics inspired point of view as an attempt to understand this computational-statistical gap, we study the landscape of the &#34;sufficiently dense&#34; subgraphs of $G$ and their overlap with the planted clique. Using the first moment method, we study the densest subgraph problems for subgraphs with fixed, but arbitrary, overlap size with the planted clique, and provide evidence of a phase transition for the presence of Overlap Gap Property (OGP) at $k=Θ\left(\sqrt{n}\right)$. OGP is a concept introduced originally in spin glass theory and known to suggest algorithmic hardness when it appears. We establish the presence of OGP when $k$ is a small positive power of $n$ by using a conditional second moment method. As our main technical tool, we establish the first, to the best of our knowledge, concentration results for the $K$-densest subgraph problem for the Erdos-Renyi model $G\left(n,\frac{1}{2}\right)$ when $K=n^{0.5-ε}$ for arbitrary $ε>0$. Finally, to study the OGP we employ a certain form of overparametrization, which is conceptually aligned with a large body of recent work in learning theory and optimization.

preprint2010arXiv

Performance Analysis of Queueing Networks via Robust Optimization

Performance analysis of queueing networks is one of the most challenging areas of queueing theory. Barring very specialized models such as product-form type queueing networks, there exist very few results which provide provable non-asymptotic upper and lower bounds on key performance measures. In this paper we propose a new performance analysis method, which is based on the robust optimization. The basic premise of our approach is as follows: rather than assuming that the stochastic primitives of a queueing model satisfy certain probability laws, such as, for example, i.i.d. interarrival and service times distributions, we assume that the underlying primitives are deterministic and satisfy the implications of such probability laws. These implications take the form of simple linear constraints, namely, those motivated by the Law of the Iterated Logarithm (LIL). Using this approach we are able to obtain performance bounds on some key performance measures. Furthermore, these performance bounds imply similar bounds in the underlying stochastic queueing models. We demonstrate our approach on two types of queueing networks: a) Tandem Single Class queue- ing network and b) Multiclass Single Server queueing network. In both cases, using the proposed robust optimization approach, we are able to obtain explicit upper bounds on some steady-state performance measures. For example, for the case of TSC system we obtain a bound of the form $\frac{C}{1-ρ} \ln \ln(1/(1-ρ))$ on the expected steady-state sojourn time, where C is an explicit constant and $ρ$ is the bottleneck traffic intensity. This qualitatively agrees with the correct heavy traffic scaling of this performance measure up to the $ln ln(1/(1-ρ))$ correction factor.

preprint2010arXiv

Stability of Skorokhod problem is undecidable

Skorokhod problem arises in studying Reflected Brownian Motion (RBM) on an non-negative orthant, specifically in the context of queueing networks in the heavy traffic regime. One of the key problems is identifying conditions for stability of a Skorokhod problem, defined as the property that trajectories are attracted to the origin. The stability conditions are known in dimension up to three, but not for general dimensions. In this paper we explain the fundamental difficulties encountered in trying to establish stability conditions for general dimensions. We prove that stability of Skorokhod problem is an undecidable property when the starting state is a part of the input. Namely, there does not exist an algorithm (a constructive procedure) for identifying stable Skorokhod problem in general dimensions.