Source author record

David Gamarnik

David Gamarnik appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

37works

24topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Price of Quality: Sufficient Conditions for Sparse Recovery using Mixed-Quality Data

We study sparse recovery when observations come from mixed-quality sources: a small collection of high-quality measurements with small noise variance and a larger collection of lower-quality measurements with higher variance. For this heterogeneous-noise setting, we establish sample-size conditions for information-theoretic and algorithmic recovery. On the information-theoretic side, we show that it is sufficient for $(n_1, n_2)$ to satisfy a linear trade-off defining the Price of Quality: the number of low-quality samples needed to replace one high-quality sample. In the agnostic setting, where the decoder is completely agnostic to the quality of the data, it is uniformly bounded, and in particular one high-quality sample is never worth more than two low-quality samples for this sufficient condition to hold. In the informed setting, where the decoder is informed of per-sample variances, the price of quality can grow arbitrarily large. On the algorithmic side, we analyze the LASSO in the agnostic setting and show that the recovery threshold matches the homogeneous-noise case and only depends on the average noise level, revealing a striking robustness of computational recovery to data heterogeneity. Together, these results give the first conditions for sparse recovery with mixed-quality data and expose a fundamental difference between how the information-theoretic and algorithmic thresholds adapt to changes in data quality.

preprint2022arXiv

Algorithms and Barriers in the Symmetric Binary Perceptron Model

The symmetric binary perceptron ($\texttt{SBP}$) exhibits a dramatic statistical-to-computational gap: the densities at which known efficient algorithms find solutions are far below the threshold for the existence of solutions. Furthermore, the $\texttt{SBP}$ exhibits a striking structural property: at all positive constraint densities almost all of its solutions are 'totally frozen' singletons separated by large Hamming distance \cite{perkins2021frozen,abbe2021proof}. This suggests that finding a solution to the $\texttt{SBP}$ may be computationally intractable. At the same time, the $\texttt{SBP}$ does admit polynomial-time search algorithms at low enough densities. A conjectural explanation for this conundrum was put forth in \cite{baldassi2020clustering}: efficient algorithms succeed in the face of freezing by finding exponentially rare clusters of large size. However, it was discovered recently that such rare large clusters exist at all subcritical densities, even at those well above the limits of known efficient algorithms \cite{abbe2021binary}. Thus the driver of the statistical-to-computational gap exhibited by this model remains a mystery. In this paper, we conduct a different landscape analysis to explain the algorithmic tractability of this problem. We show that at high enough densities the $\texttt{SBP}$ exhibits the multi Overlap Gap Property ($m-$OGP), an intricate geometrical property known to be a rigorous barrier for large classes of algorithms. Our analysis shows that the $m-$OGP threshold (a) is well below the satisfiability threshold; and (b) matches the best known algorithmic threshold up to logarithmic factors as $m\to\infty$. We then prove that the $m-$OGP rules out the class of stable algorithms for the $\texttt{SBP}$ above this threshold. We conjecture that the $m \to \infty$ limit of the $m$-OGP threshold marks the algorithmic threshold for the problem.

preprint2022arXiv

Circuit Lower Bounds for the p-Spin Optimization Problem

We consider the problem of finding a near ground state of a $p$-spin model with Rademacher couplings by means of a low-depth circuit. As a direct extension of the authors' recent work [Gamarnik, Jagannath, Wein 2020], we establish that any poly-size $n$-output circuit that produces a spin assignment with objective value within a certain constant factor of optimality, must have depth at least $\log n/(2\log\log n)$ as $n$ grows. This is stronger than the known state of the art bounds of the form $Ω(\log n/(k(n)\log\log n))$ for similar combinatorial optimization problems, where $k(n)$ depends on the optimality value. For example, for the largest clique problem $k(n)$ corresponds to the square of the size of the clique [Rossman 2010]. At the same time our results are not quite comparable since in our case the circuits are required to produce a solution itself rather than solving the associated decision problem. As in our earlier work, the approach is based on the overlap gap property (OGP) exhibited by random $p$-spin models, but the derivation of the circuit lower bound relies further on standard facts from Fourier analysis on the Boolean cube, in particular the Linial-Mansour-Nisan Theorem. To the best of our knowledge, this is the first instance when methods from spin glass theory have ramifications for circuit complexity.

preprint2022arXiv

Hardness of Random Optimization Problems for Boolean Circuits, Low-Degree Polynomials, and Langevin Dynamics

We consider the problem of finding nearly optimal solutions of optimization problems with random objective functions. Two concrete problems we consider are (a) optimizing the Hamiltonian of a spherical or Ising $p$-spin glass model, and (b) finding a large independent set in a sparse Erdős-Rényi graph. The following families of algorithms are considered: (a) low-degree polynomials of the input; (b) low-depth Boolean circuits; (c) the Langevin dynamics algorithm. We show that these families of algorithms fail to produce nearly optimal solutions with high probability. For the case of Boolean circuits, our results improve the state-of-the-art bounds known in circuit complexity theory (although we consider the search problem as opposed to the decision problem). Our proof uses the fact that these models are known to exhibit a variant of the overlap gap property (OGP) of near-optimal solutions. Specifically, for both models, every two solutions whose objectives are above a certain threshold are either close or far from each other. The crux of our proof is that the classes of algorithms we consider exhibit a form of stability. We show by an interpolation argument that stable algorithms cannot overcome the OGP barrier. The stability of Langevin dynamics is an immediate consequence of the well-posedness of stochastic differential equations. The stability of low-degree polynomials and Boolean circuits is established using tools from Gaussian and Boolean analysis -- namely hypercontractivity and total influence, as well as a novel lower bound for random walks avoiding certain subsets. In the case of Boolean circuits, the result also makes use of Linal-Mansour-Nisan's classical theorem. Our techniques apply more broadly to low influence functions and may apply more generally.

preprint2021arXiv

Algorithmic Obstructions in the Random Number Partitioning Problem

We consider the algorithmic problem of finding a near-optimal solution for the number partitioning problem (NPP). The NPP appears in many applications, including the design of randomized controlled trials, multiprocessor scheduling, and cryptography; and is also of theoretical significance. It possesses a so-called statistical-to-computational gap: when its input $X$ has distribution $\mathcal{N}(0,I_n)$, its optimal value is $Θ(\sqrt{n}2^{-n})$ w.h.p.; whereas the best polynomial-time algorithm achieves an objective value of only $2^{-Θ(\log^2 n)}$, w.h.p. In this paper, we initiate the study of the nature of this gap. Inspired by insights from statistical physics, we study the landscape of NPP and establish the presence of the Overlap Gap Property (OGP), an intricate geometric property which is known to be a rigorous evidence of an algorithmic hardness for large classes of algorithms. By leveraging the OGP, we establish that (a) any sufficiently stable algorithm, appropriately defined, fails to find a near-optimal solution with energy below $2^{-ω(n \log^{-1/5} n)}$; and (b) a very natural MCMC dynamics fails to find near-optimal solutions. Our simulations suggest that the state of the art algorithm achieving $2^{-Θ(\log^2 n)}$ is indeed stable, but formally verifying this is left as an open problem. OGP regards the overlap structure of $m-$tuples of solutions achieving a certain objective value. When $m$ is constant we prove the presence of OGP in the regime $2^{-Θ(n)}$, and the absence of it in the regime $2^{-o(n)}$. Interestingly, though, by considering overlaps with growing values of $m$ we prove the presence of the OGP up to the level $2^{-ω(\sqrt{n\log n})}$. Our proof of the failure of stable algorithms at values $2^{-ω(n \log^{-1/5} n)}$ employs methods from Ramsey Theory from the extremal combinatorics, and is of independent interest.

preprint2021arXiv

Self-Regularity of Non-Negative Output Weights for Overparameterized Two-Layer Neural Networks

We consider the problem of finding a two-layer neural network with sigmoid, rectified linear unit (ReLU), or binary step activation functions that "fits" a training data set as accurately as possible as quantified by the training error; and study the following question: \emph{does a low training error guarantee that the norm of the output layer (outer norm) itself is small?} We answer affirmatively this question for the case of non-negative output weights. Using a simple covering number argument, we establish that under quite mild distributional assumptions on the input/label pairs; any such network achieving a small training error on polynomially many data necessarily has a well-controlled outer norm. Notably, our results (a) have a polynomial (in $d$) sample complexity, (b) are independent of the number of hidden units (which can potentially be very high), (c) are oblivious to the training algorithm; and (d) require quite mild assumptions on the data (in particular the input vector $X\in\mathbb{R}^d$ need not have independent coordinates). We then leverage our bounds to establish generalization guarantees for such networks through \emph{fat-shattering dimension}, a scale-sensitive measure of the complexity class that the network architectures we investigate belong to. Notably, our generalization bounds also have good sample complexity (polynomials in $d$ with a low degree), and are in fact near-linear for some important cases of interest.

preprint2020arXiv

Estimation of Monotone Multi-Index Models

In a multi-index model with $k$ index vectors, the input variables are transformed by taking inner products with the index vectors. A transfer function $f: \mathbb{R}^k \to \mathbb{R}$ is applied to these inner products to generate the output. Thus, multi-index models are a generalization of linear models. In this paper, we consider monotone multi-index models. Namely, the transfer function is assumed to be coordinate-wise monotone. The monotone multi-index model therefore generalizes both linear regression and isotonic regression, which is the estimation of a coordinate-wise monotone function. We consider the case of nonnegative index vectors. We provide an algorithm based on integer programming for the estimation of monotone multi-index models, and provide guarantees on the $L_2$ loss of the estimated function relative to the ground truth.

preprint2020arXiv

Neural Networks and Polynomial Regression. Demystifying the Overparametrization Phenomena

In the context of neural network models, overparametrization refers to the phenomena whereby these models appear to generalize well on the unseen data, even though the number of parameters significantly exceeds the sample sizes, and the model perfectly fits the in-training data. A conventional explanation of this phenomena is based on self-regularization properties of algorithms used to train the data. In this paper we prove a series of results which provide a somewhat diverging explanation. Adopting a teacher/student model where the teacher network is used to generate the predictions and student network is trained on the observed labeled data, and then tested on out-of-sample data, we show that any student network interpolating the data generated by a teacher network generalizes well, provided that the sample size is at least an explicit quantity controlled by data dimension and approximation guarantee alone, regardless of the number of internal nodes of either teacher or student network. Our claim is based on approximating both teacher and student networks by polynomial (tensor) regression models with degree depending on the desired accuracy and network depth only. Such a parametrization notably does not depend on the number of internal nodes. Thus a message implied by our results is that parametrizing wide neural networks by the number of hidden nodes is misleading, and a more fitting measure of parametrization complexity is the number of regression coefficients associated with tensorized data. In particular, this somewhat reconciles the generalization ability of neural networks with more classical statistical notions of data complexity and generalization bounds. Our empirical results on MNIST and Fashion-MNIST datasets indeed confirm that tensorized regression achieves a good out-of-sample performance, even when the degree of the tensor is at most two.

preprint2020arXiv

Stability, memory, and messaging tradeoffs in heterogeneous service systems

We consider a heterogeneous distributed service system, consisting of $n$ servers with unknown and possibly different processing rates. Jobs with unit mean and independent processing times arrive as a renewal process of rate $λn$, with $0<λ<1$, to the system. Incoming jobs are immediately dispatched to one of several queues associated with the $n$ servers. We assume that the dispatching decisions are made by a central dispatcher endowed with a finite memory, and with the ability to exchange messages with the servers. We study the fundamental resource requirements (memory bits and message exchange rate) in order for a dispatching policy to be {\bf maximally stable}, i.e., stable whenever the processing rates are such that the arrival rate is less than the total available processing rate. First, for the case of Poisson arrivals and exponential service times, we present a policy that is maximally stable while using a positive (but arbitrarily small) message rate, and $\log_2(n)$ bits of memory. Second, we show that within a certain broad class of policies, a dispatching policy that exchanges $o\big(n^2\big)$ messages per unit of time, and with $o(\log(n))$ bits of memory, cannot be maximally stable. Thus, as long as the message rate is not too excessive, a logarithmic memory is necessary and sufficient for maximal stability.

preprint2020arXiv

Stationary Points of Shallow Neural Networks with Quadratic Activation Function

We consider the teacher-student setting of learning shallow neural networks with quadratic activations and planted weight matrix $W^*\in\mathbb{R}^{m\times d}$, where $m$ is the width of the hidden layer and $d\le m$ is the data dimension. We study the optimization landscape associated with the empirical and the population squared risk of the problem. Under the assumption the planted weights are full-rank we obtain the following results. First, we establish that the landscape of the empirical risk admits an "energy barrier" separating rank-deficient $W$ from $W^*$: if $W$ is rank deficient, then its risk is bounded away from zero by an amount we quantify. We then couple this result by showing that, assuming number $N$ of samples grows at least like a polynomial function of $d$, all full-rank approximate stationary points of the empirical risk are nearly global optimum. These two results allow us to prove that gradient descent, when initialized below the energy barrier, approximately minimizes the empirical risk and recovers the planted weights in polynomial-time. Next, we show that initializing below this barrier is in fact easily achieved when the weights are randomly generated under relatively weak assumptions. We show that provided the network is sufficiently overparametrized, initializing with an appropriate multiple of the identity suffices to obtain a risk below the energy barrier. At a technical level, the last result is a consequence of the semicircle law for the Wishart ensemble and could be of independent interest. Finally, we study the minimizers of the empirical risk and identify a simple necessary and sufficient geometric condition on the training data under which any minimizer has necessarily zero generalization error. We show that as soon as $N\ge N^*=d(d+1)/2$, randomly generated data enjoys this geometric condition almost surely, while that ceases to be true if $N<N^*$.

preprint2020arXiv

The Quantum Approximate Optimization Algorithm Needs to See the Whole Graph: A Typical Case

The Quantum Approximate Optimization Algorithm can naturally be applied to combinatorial search problems on graphs. The quantum circuit has p applications of a unitary operator that respects the locality of the graph. On a graph with bounded degree, with p small enough, measurements of distant qubits in the state output by the QAOA give uncorrelated results. We focus on finding big independent sets in random graphs with dn/2 edges keeping d fixed and n large. Using the Overlap Gap Property of almost optimal independent sets in random graphs, and the locality of the QAOA, we are able to show that if p is less than a d-dependent constant times log n, the QAOA cannot do better than finding an independent set of size .854 times the optimal for d large. Because the logarithm is slowly growing, even at one million qubits we can only show that the algorithm is blocked if p is in single digits. At higher p the algorithm "sees" the whole graph and we have no indication that performance is limited.

preprint2020arXiv

The Quantum Approximate Optimization Algorithm Needs to See the Whole Graph: Worst Case Examples

The Quantum Approximate Optimization Algorithm can be applied to search problems on graphs with a cost function that is a sum of terms corresponding to the edges. When conjugating an edge term, the QAOA unitary at depth p produces an operator that depends only on the subgraph consisting of edges that are at most p away from the edge in question. On random d-regular graphs, with d fixed and with p a small constant time log n, these neighborhoods are almost all trees and so the performance of the QAOA is determined only by how it acts on an edge in the middle of tree. Both bipartite random d-regular graphs and general random d-regular graphs locally are trees so the QAOA's performance is the same on these two ensembles. Using this we can show that the QAOA with $(d-1)^{2p} < n^A$ for any $A<1$, can only achieve an approximation ratio of 1/2 for Max-Cut on bipartite random d-regular graphs for d large. For Maximum Independent Set, in the same setting, the best approximation ratio is a d-dependent constant that goes to 0 as d gets big.

preprint2019arXiv

The Landscape of the Planted Clique Problem: Dense subgraphs and the Overlap Gap Property

In this paper we study the computational-statistical gap of the planted clique problem, where a clique of size $k$ is planted in an Erdos Renyi graph $G(n,\frac{1}{2})$ resulting in a graph $G\left(n,\frac{1}{2},k\right)$. The goal is to recover the planted clique vertices by observing $G\left(n,\frac{1}{2},k\right)$ . It is known that the clique can be recovered as long as $k \geq \left(2+ε\right)\log n $ for any $ε>0$, but no polynomial-time algorithm is known for this task unless $k=Ω\left(\sqrt{n} \right)$. Following a statistical-physics inspired point of view as an attempt to understand this computational-statistical gap, we study the landscape of the "sufficiently dense" subgraphs of $G$ and their overlap with the planted clique. Using the first moment method, we study the densest subgraph problems for subgraphs with fixed, but arbitrary, overlap size with the planted clique, and provide evidence of a phase transition for the presence of Overlap Gap Property (OGP) at $k=Θ\left(\sqrt{n}\right)$. OGP is a concept introduced originally in spin glass theory and known to suggest algorithmic hardness when it appears. We establish the presence of OGP when $k$ is a small positive power of $n$ by using a conditional second moment method. As our main technical tool, we establish the first, to the best of our knowledge, concentration results for the $K$-densest subgraph problem for the Erdos-Renyi model $G\left(n,\frac{1}{2}\right)$ when $K=n^{0.5-ε}$ for arbitrary $ε>0$. Finally, to study the OGP we employ a certain form of overparametrization, which is conceptually aligned with a large body of recent work in learning theory and optimization.

preprint2016arXiv

A Message Passing Algorithm for the Problem of Path Packing in Graphs

We consider the problem of packing node-disjoint directed paths in a directed graph. We consider a variant of this problem where each path starts within a fixed subset of root nodes, subject to a given bound on the length of paths. This problem is motivated by the so-called kidney exchange problem, but has potential other applications and is interesting in its own right. We propose a new algorithm for this problem based on the message passing/belief propagation technique. A priori this problem does not have an associated graphical model, so in order to apply a belief propagation algorithm we provide a novel representation of the problem as a graphical model. Standard belief propagation on this model has poor scaling behavior, so we provide an efficient implementation that significantly decreases the complexity. We provide numerical results comparing the performance of our algorithm on both artificially created graphs and real world networks to several alternative algorithms, including algorithms based on integer programming (IP) techniques. These comparisons show that our algorithm scales better to large instances than IP-based algorithms and often finds better solutions than a simple algorithm that greedily selects the longest path from each root node. In some cases it also finds better solutions than the ones found by IP-based algorithms even when the latter are allowed to run significantly longer than our algorithm.

preprint2016arXiv

A Note on Alternating Minimization Algorithm for the Matrix Completion Problem

We consider the problem of reconstructing a low rank matrix from a subset of its entries and analyze two variants of the so-called Alternating Minimization algorithm, which has been proposed in the past. We establish that when the underlying matrix has rank $r=1$, has positive bounded entries, and the graph $\mathcal{G}$ underlying the revealed entries has bounded degree and diameter which is at most logarithmic in the size of the matrix, both algorithms succeed in reconstructing the matrix approximately in polynomial time starting from an arbitrary initialization. We further provide simulation results which suggest that the second algorithm which is based on the message passing type updates, performs significantly better.

preprint2016arXiv

Finding a Large Submatrix of a Gaussian Random Matrix

We consider the problem of finding a $k\times k$ submatrix of an $n\times n$ matrix with i.i.d. standard Gaussian entries, which has a large average entry. It was shown earlier by Bhamidi et al. that the largest average value of such a matrix is $2\sqrt{\log n/k}$ with high probability. In the same paper an evidence was provided that a natural greedy algorithm called Largest Average Submatrix ($\LAS$) should produce a matrix with average entry approximately $\sqrt{2}$ smaller. In this paper we show that the matrix produced by the $\LAS$ algorithm is indeed $\sqrt{2\log n/k}$ w.h.p. Then by drawing an analogy with the problem of finding cliques in random graphs, we propose a simple greedy algorithm which produces a $k\times k$ matrix with asymptotically the same average value. Since the greedy algorithm is the best known algorithm for finding cliques in random graphs, it is tempting to believe that beating the factor $\sqrt{2}$ performance gap suffered by both algorithms might be very challenging. Surprisingly, we show the existence of a very simple algorithm which produces a matrix with average value $(4/3)\sqrt{2\log n/k}$. To get an insight into the algorithmic hardness of this problem, and motivated by methods originating in the theory of spin glasses, we conduct the so-called expected overlap analysis of matrices with average value asymptotically $α\sqrt{2\log n/k}$. The overlap corresponds to the number of common rows and common columns for pairs of matrices achieving this value. We discover numerically an intriguing phase transition at $α^*\approx 1.3608..$: when $α<α^*$ the space of overlaps is a continuous subset of $[0,1]^2$, whereas $α=α^*$ marks the onset of discontinuity, and the model exhibits the Overlap Gap Property when $α>α^*$. We conjecture that $α>α^*$ marks the onset of the algorithmic hardness.

preprint2015arXiv

Join the Shortest Queue with Many Servers. The Heavy Traffic Asymptotics

We consider queueing systems with n parallel queues under a Join the Shortest Queue (JSQ) policy in the Halfin-Whitt heavy traffic regime. We use the martingale method to prove that a scaled process counting the number of idle servers and queues of length exactly 2 weakly converges to a two-dimensional reflected Ornstein-Uhlenbeck process, while processes counting longer queues converge to a deterministic system decaying to zero in constant time. This limiting system is comparable to that of the traditional Halfin-Whitt model, but there are key differences in the queueing behavior of the JSQ model. In particular, only a vanishing fraction of customers will have to wait, but those who do will incur a constant order waiting time.

preprint2014arXiv

Giant Component in Random Multipartite Graphs with Given Degree Sequences

We study the problem of the existence of a giant component in a random multipartite graph. We consider a random multipartite graph with $p$ parts generated according to a given degree sequence $n_i^{\mathbf{d}}(n)$ which denotes the number of vertices in part $i$ of the multipartite graph with degree given by the vector $\mathbf{d}$. We assume that the empirical distribution of the degree sequence converges to a limiting probability distribution. Under certain mild regularity assumptions, we characterize the conditions under which, with high probability, there exists a component of linear size. The characterization involves checking whether the Perron-Frobenius norm of the matrix of means of a certain associated edge-biased distribution is greater than unity. We also specify the size of the giant component when it exists. We use the exploration process of Molloy and Reed combined with techniques from the theory of multidimensional Galton-Watson processes to establish this result.

preprint2014arXiv

Hardness of parameter estimation in graphical models

We consider the problem of learning the canonical parameters specifying an undirected graphical model (Markov random field) from the mean parameters. For graphical models representing a minimal exponential family, the canonical parameters are uniquely determined by the mean parameters, so the problem is feasible in principle. The goal of this paper is to investigate the computational feasibility of this statistical task. Our main result shows that parameter estimation is in general intractable: no algorithm can learn the canonical parameters of a generic pair-wise binary graphical model from the mean parameters in time bounded by a polynomial in the number of variables (unless RP = NP). Indeed, such a result has been believed to be true (see the monograph by Wainwright and Jordan (2008)) but no proof was known. Our proof gives a polynomial time reduction from approximating the partition function of the hard-core model, known to be hard, to learning approximate parameters. Our reduction entails showing that the marginal polytope boundary has an inherent repulsive property, which validates an optimization procedure over the polytope that does not use any knowledge of its structure (as required by the ellipsoid method and others).

preprint2014arXiv

Learning graphical models from the Glauber dynamics

In this paper we consider the problem of learning undirected graphical models from data generated according to the Glauber dynamics. The Glauber dynamics is a Markov chain that sequentially updates individual nodes (variables) in a graphical model and it is frequently used to sample from the stationary distribution (to which it converges given sufficient time). Additionally, the Glauber dynamics is a natural dynamical model in a variety of settings. This work deviates from the standard formulation of graphical model learning in the literature, where one assumes access to i.i.d. samples from the distribution. Much of the research on graphical model learning has been directed towards finding algorithms with low computational cost. As the main result of this work, we establish that the problem of reconstructing binary pairwise graphical models is computationally tractable when we observe the Glauber dynamics. Specifically, we show that a binary pairwise graphical model on $p$ nodes with maximum degree $d$ can be learned in time $f(d)p^2\log p$, for a function $f(d)$, using nearly the information-theoretic minimum number of samples.

preprint2014arXiv

Local Algorithms for Graphs

We are going to analyze local algorithms over sparse random graphs. These algorithms are based on local information where local regards to a decision made by the exploration of a small neighbourhood of a certain vertex plus a believe of the structure of the whole graph and maybe added some randomness. This kind of algorithms can be a natural response to the given problem or an efficient approximation such as the Belief Propagation Algorithm.

preprint2014arXiv

Performance of the Survey Propagation-guided decimation algorithm for the random NAE-K-SAT problem

We show that the Survey Propagation-guided decimation algorithm fails to find satisfying assignments on random instances of the "Not-All-Equal-$K$-SAT" problem if the number of message passing iterations is bounded by a constant independent of the size of the instance and the clause-to-variable ratio is above $(1+o_K(1)){2^{K-1}\over K}\log^2 K$ for sufficiently large $K$. Our analysis in fact applies to a broad class of algorithms described as "sequential local algorithms". Such algorithms iteratively set variables based on some local information and then recurse on the reduced instance. Survey Propagation-guided as well as Belief Propagation-guided decimation algorithms - two widely studied message passing based algorithms, fall under this category of algorithms provided the number of message passing iterations is bounded by a constant. Another well-known algorithm falling into this category is the Unit Clause algorithm. Our work constitutes the first rigorous analysis of the performance of the SP-guided decimation algorithm. The approach underlying our paper is based on an intricate geometry of the solution space of random NAE-$K$-SAT problem. We show that above the $(1+o_K(1)){2^{K-1}\over K}\log^2 K$ threshold, the overlap structure of $m$-tuples of satisfying assignments exhibit a certain clustering behavior expressed in the form of constraints on distances between the $m$ assignments, for appropriately chosen $m$. We further show that if a sequential local algorithm succeeds in finding a satisfying assignment with probability bounded away from zero, then one can construct an $m$-tuple of solutions violating these constraints, thus leading to a contradiction. Along with (citation), this result is the first work which directly links the clustering property of random constraint satisfaction problems to the computational hardness of finding satisfying assignments.

preprint2014arXiv

Structure learning of antiferromagnetic Ising models

In this paper we investigate the computational complexity of learning the graph structure underlying a discrete undirected graphical model from i.i.d. samples. We first observe that the notoriously difficult problem of learning parities with noise can be captured as a special case of learning graphical models. This leads to an unconditional computational lower bound of $Ω(p^{d/2})$ for learning general graphical models on $p$ nodes of maximum degree $d$, for the class of so-called statistical algorithms recently introduced by Feldman et al (2013). The lower bound suggests that the $O(p^d)$ runtime required to exhaustively search over neighborhoods cannot be significantly improved without restricting the class of models. Aside from structural assumptions on the graph such as it being a tree, hypertree, tree-like, etc., many recent papers on structure learning assume that the model has the correlation decay property. Indeed, focusing on ferromagnetic Ising models, Bento and Montanari (2009) showed that all known low-complexity algorithms fail to learn simple graphs when the interaction strength exceeds a number related to the correlation decay threshold. Our second set of results gives a class of repelling (antiferromagnetic) models that have the opposite behavior: very strong interaction allows efficient learning in time $O(p^2)$. We provide an algorithm whose performance interpolates between $O(p^2)$ and $O(p^{d+2})$ depending on the strength of the repulsion.

preprint2013arXiv

Combinatorial approach to the interpolation method and scaling limits in sparse random graphs

We establish the existence of free energy limits for several combinatorial models on Erdös-Rényi graph $\mathbb {G}(N,\lfloor cN\rfloor)$ and random $r$-regular graph $\mathbb {G}(N,r)$. For a variety of models, including independent sets, MAX-CUT, coloring and K-SAT, we prove that the free energy both at a positive and zero temperature, appropriately rescaled, converges to a limit as the size of the underlying graph diverges to infinity. In the zero temperature case, this is interpreted as the existence of the scaling limit for the corresponding combinatorial optimization problem. For example, as a special case we prove that the size of a largest independent set in these graphs, normalized by the number of nodes converges to a limit w.h.p. This resolves an open problem which was proposed by Aldous (Some open problems) as one of his six favorite open problems. It was also mentioned as an open problem in several other places: Conjecture 2.20 in Wormald [In Surveys in Combinatorics, 1999 (Canterbury) (1999) 239-298 Cambridge Univ. Press]; Bollobás and Riordan [Random Structures Algorithms 39 (2011) 1-38]; Janson and Thomason [Combin. Probab. Comput. 17 (2008) 259-264] and Aldous and Steele [In Probability on Discrete Structures (2004) 1-72 Springer].

preprint2013arXiv

Convergent sequences of sparse graphs: A large deviations approach

In this paper we introduce a new notion of convergence of sparse graphs which we call Large Deviations or LD-convergence and which is based on the theory of large deviations. The notion is introduced by "decorating" the nodes of the graph with random uniform i.i.d. weights and constructing random measures on $[0,1]$ and $[0,1]^2$ based on the decoration of nodes and edges. A graph sequence is defined to be converging if the corresponding sequence of random measures satisfies the Large Deviations Principle with respect to the topology of weak convergence on bounded measures on $[0,1]^d, d=1,2$. We then establish that LD-convergence implies several previous notions of convergence, namely so-called right-convergence, left-convergence, and partition-convergence. The corresponding large deviation rate function can be interpreted as the limit object of the sparse graph sequence. In particular, we can express the limiting free energies in terms of this limit object.

preprint2013arXiv

Limits of local algorithms over sparse random graphs

Local algorithms on graphs are algorithms that run in parallel on the nodes of a graph to compute some global structural feature of the graph. Such algorithms use only local information available at nodes to determine local aspects of the global structure, while also potentially using some randomness. Recent research has shown that such algorithms show significant promise in computing structures like large independent sets in graphs locally. Indeed the promise led to a conjecture by Hatami, \Lovasz and Szegedy \cite{HatamiLovaszSzegedy} that local algorithms may be able to compute maximum independent sets in (sparse) random $d$-regular graphs. In this paper we refute this conjecture and show that every independent set produced by local algorithms is multiplicative factor $1/2+1/(2\sqrt{2})$ smaller than the largest, asymptotically as $d\rightarrow\infty$. Our result is based on an important clustering phenomena predicted first in the literature on spin glasses, and recently proved rigorously for a variety of constraint satisfaction problems on random graphs. Such properties suggest that the geometry of the solution space can be quite intricate. The specific clustering property, that we prove and apply in this paper shows that typically every two large independent sets in a random graph either have a significant intersection, or have a nearly empty intersection. As a result, large independent sets are clustered according to the proximity to each other. While the clustering property was postulated earlier as an obstruction for the success of local algorithms, such as for example, the Belief Propagation algorithm, our result is the first one where the clustering property is used to formally prove limits on local algorithms.

preprint2013arXiv

On the rate of convergence to stationarity of the M/M/N queue in the Halfin-Whitt regime

We prove several results about the rate of convergence to stationarity, that is, the spectral gap, for the M/M/n queue in the Halfin-Whitt regime. We identify the limiting rate of convergence to steady-state, and discover an asymptotic phase transition that occurs w.r.t. this rate. In particular, we demonstrate the existence of a constant $B^*\approx1.85772$ s.t. when a certain excess parameter $B\in(0,B^*]$, the error in the steady-state approximation converges exponentially fast to zero at rate $\frac{B^2}{4}$. For $B>B^*$, the error in the steady-state approximation converges exponentially fast to zero at a different rate, which is the solution to an explicit equation given in terms of special functions. This result may be interpreted as an asymptotic version of a phase transition proven to occur for any fixed n by van Doorn [Stochastic Monotonicity and Queueing Applications of Birth-death Processes (1981) Springer]. We also prove explicit bounds on the distance to stationarity for the M/M/n queue in the Halfin-Whitt regime, when $B<B^*$. Our bounds scale independently of $n$ in the Halfin-Whitt regime, and do not follow from the weak-convergence theory.

preprint2013arXiv

Steady-state $\mathit{GI}/\mathit{GI}/\mathit{n}$ queue in the Halfin-Whitt regime

We consider the FCFS $\mathit{GI}/\mathit{GI}/n$ queue in the so-called Halfin-Whitt heavy traffic regime. We prove that under minor technical conditions the associated sequence of steady-state queue length distributions, normalized by $n^{1/2}$, is tight. We derive an upper bound on the large deviation exponent of the limiting steady-state queue length matching that conjectured by Gamarnik and Momcilovic [Adv. in Appl. Probab. 40 (2008) 548-577]. We also prove a matching lower bound when the arrival process is Poisson. Our main proof technique is the derivation of new and simple bounds for the FCFS $\mathit{GI}/\mathit{GI}/n$ queue. Our bounds are of a structural nature, hold for all $n$ and all times $t\geq0$, and have intuitive closed-form representations as the suprema of certain natural processes which converge weakly to Gaussian processes. We further illustrate the utility of this methodology by deriving the first nontrivial bounds for the weak limit process studied in [Ann. Appl. Probab. 19 (2009) 2211-2269].

preprint2012arXiv

Belief Propagation for Min-cost Network Flow: Convergence and Correctness

Message passing type algorithms such as the so-called Belief Propagation algorithm have recently gained a lot of attention in the statistics, signal processing and machine learning communities as attractive algorithms for solving a variety of optimization and inference problems. As a decentralized, easy to implement and empirically successful algorithm, BP deserves attention from the theoretical standpoint, and here not much is known at the present stage. In order to fill this gap we consider the performance of the BP algorithm in the context of the capacitated minimum-cost network flow problem - the classical problem in the operations research field. We prove that BP converges to the optimal solution in the pseudo-polynomial time, provided that the optimal solution of the underlying problem is unique and the problem input is integral. Moreover, we present a simple modification of the BP algorithm which gives a fully polynomial-time randomized approximation scheme (FPRAS) for the same problem, which no longer requires the uniqueness of the optimal solution. This is the first instance where BP is proved to have fully-polynomial running time. Our results thus provide a theoretical justification for the viability of BP as an attractive method to solve an important class of optimization problems.

preprint2012arXiv

Right-convergence of sparse random graphs

The paper is devoted to the problem of establishing right-convergence of sparse random graphs. This concerns the convergence of the logarithm of number of homomorphisms from graphs or hyper-graphs $\G_N, N\ge 1$ to some target graph $W$. The theory of dense graph convergence, including random dense graphs, is now well understood, but its counterpart for sparse random graphs presents some fundamental difficulties. Phrased in the statistical physics terminology, the issue is the existence of the log-partition function limits, also known as free energy limits, appropriately normalized for the Gibbs distribution associated with $W$. In this paper we prove that the sequence of sparse \ER graphs is right-converging when the tensor product associated with the target graph $W$ satisfies certain convexity property. We treat the case of discrete and continuous target graphs $W$. The latter case allows us to prove a special case of Talagrand's recent conjecture (more accurately stated as level III Research Problem 6.7.2 in his recent book), concerning the existence of the limit of the measure of a set obtained from $\R^N$ by intersecting it with linearly in $N$ many subsets, generated according to some common probability law. Our proof is based on the interpolation technique, introduced first by Guerra and Toninelli and developed further in a series of papers. Specifically, Bayati et al establish the right-convergence property for Erdos-Renyi graphs for some special cases of $W$. In this paper most of the results in this paper follow as a special case of our main theorem.

preprint2012arXiv

Strong spatial mixing for list coloring of graphs

The property of spatial mixing and strong spatial mixing in spin systems has been of interest because of its implications on uniqueness of Gibbs measures on infinite graphs and efficient approximation of counting problems that are otherwise known to be #P hard. In the context of coloring, strong spatial mixing has been established for regular trees when $q \geq α^{*} Δ+ 1$ where $q$ the number of colors, $Δ$ is the degree and $α^* = 1.763..$ is the unique solution to $xe^{-1/x} = 1$. It has also been established for bounded degree lattice graphs whenever $q \geq α^* Δ- β$ for some constant $β$, where $Δ$ is the maximum vertex degree of the graph. The latter uses a technique based on recursively constructed coupling of Markov chains whereas the former is based on establishing decay of correlations on the tree. We establish strong spatial mixing of list colorings on arbitrary bounded degree triangle-free graphs whenever the size of the list of each vertex $v$ is at least $αΔ(v) + β$ where $Δ(v)$ is the degree of vertex $v$ and $α> α^*$ and $β$ is a constant that only depends on $α$. We do this by proving the decay of correlations via recursive contraction of the distance between the marginals measured with respect to a suitably chosen error function.

preprint2011arXiv

Multiclass multiserver queueing system in the Halfin-Whitt heavy traffic regime. Asymptotics of the stationary distribution

We consider a heterogeneous queueing system consisting of one large pool of $O(r)$ identical servers, where $r\to\infty$ is the scaling parameter. The arriving customers belong to one of several classes which determines the service times in the distributional sense. The system is heavily loaded in the Halfin-Whitt sense, namely the nominal utilization is $1-a/\sqrt{r}$ where $a>0$ is the spare capacity parameter. Our goal is to obtain bounds on the steady state performance metrics such as the number of customers waiting in the queue $Q^r(\infty)$. While there is a rich literature on deriving process level (transient) scaling limits for such systems, the results for steady state are primarily limited to the single class case. This paper is the first one to address the case of heterogeneity in the steady state regime. Moreover, our results hold for any service policy which does not admit server idling when there are customers waiting in the queue. We assume that the interarrival and service times have exponential distribution, and that customers of each class may abandon while waiting in the queue at a certain rate (which may be zero). We obtain upper bounds of the form $O(\sqrt{r})$ on both $Q^r(\infty)$ and the number of idle servers. The bounds are uniform w.r.t. parameter $r$ and the service policy. In particular, we show that $\limsup_r E \exp(θr^{-1/2}Q^r(\infty))<\infty$. Therefore, the sequence $r^{-1/2}Q^r(\infty)$ is tight and has a uniform exponential tail bound. We further consider the system with strictly positive abandonment rates, and show that in this case every weak limit $\hat{Q}(\infty)$ of $r^{-1/2}Q^r(\infty)$ has a sub-Gaussian tail. Namely $E[\exp(θ(\hat{Q}(\infty))^2)]<\infty$, for some $θ>0$.

preprint2010arXiv

Performance Analysis of Queueing Networks via Robust Optimization

Performance analysis of queueing networks is one of the most challenging areas of queueing theory. Barring very specialized models such as product-form type queueing networks, there exist very few results which provide provable non-asymptotic upper and lower bounds on key performance measures. In this paper we propose a new performance analysis method, which is based on the robust optimization. The basic premise of our approach is as follows: rather than assuming that the stochastic primitives of a queueing model satisfy certain probability laws, such as, for example, i.i.d. interarrival and service times distributions, we assume that the underlying primitives are deterministic and satisfy the implications of such probability laws. These implications take the form of simple linear constraints, namely, those motivated by the Law of the Iterated Logarithm (LIL). Using this approach we are able to obtain performance bounds on some key performance measures. Furthermore, these performance bounds imply similar bounds in the underlying stochastic queueing models. We demonstrate our approach on two types of queueing networks: a) Tandem Single Class queue- ing network and b) Multiclass Single Server queueing network. In both cases, using the proposed robust optimization approach, we are able to obtain explicit upper bounds on some steady-state performance measures. For example, for the case of TSC system we obtain a bound of the form $\frac{C}{1-ρ} \ln \ln(1/(1-ρ))$ on the expected steady-state sojourn time, where C is an explicit constant and $ρ$ is the bottleneck traffic intensity. This qualitatively agrees with the correct heavy traffic scaling of this performance measure up to the $ln ln(1/(1-ρ))$ correction factor.

preprint2010arXiv

Stability of Skorokhod problem is undecidable

Skorokhod problem arises in studying Reflected Brownian Motion (RBM) on an non-negative orthant, specifically in the context of queueing networks in the heavy traffic regime. One of the key problems is identifying conditions for stability of a Skorokhod problem, defined as the property that trajectories are attracted to the origin. The stability conditions are known in dimension up to three, but not for general dimensions. In this paper we explain the fundamental difficulties encountered in trying to establish stability conditions for general dimensions. We prove that stability of Skorokhod problem is an undecidable property when the starting state is a part of the input. Namely, there does not exist an algorithm (a constructive procedure) for identifying stable Skorokhod problem in general dimensions.

preprint2008arXiv

Sequential cavity method for computing free energy and surface pressure

We propose a new method for the problems of computing free energy and surface pressure for various statistical mechanics models on a lattice $\Z^d$. Our method is based on representing the free energy and surface pressure in terms of certain marginal probabilities in a suitably modified sublattice of $\Z^d$. Then recent deterministic algorithms for computing marginal probabilities are used to obtain numerical estimates of the quantities of interest. The method works under the assumption of Strong Spatial Mixing (SSP), which is a form of a correlation decay. We illustrate our method for the hard-core and monomer-dimer models, and improve several earlier estimates. For example we show that the exponent of the monomer-dimer coverings of $\Z^3$ belongs to the interval $[0.78595,0.78599]$, improving best previously known estimate of (approximately) $[0.7850,0.7862]$ obtained in \cite{FriedlandPeled},\cite{FriedlandKropLundowMarkstrom}. Moreover, we show that given a target additive error $ε>0$, the computational effort of our method for these two models is $(1/ε)^{O(1)}$ \emph{both} for free energy and surface pressure. In contrast, prior methods, such as transfer matrix method, require $\exp\big((1/ε)^{O(1)}\big)$ computation effort.

preprint2004arXiv

Dynamics of exponential linear map in functional space

We consider the question of existence of a unique invariant probability distribution which satisfies some evolutionary property. The problem arises from the random graph theory but to answer it we treat it as a dynamical system in the functional space, where we look for a global attractor. We consider the following bifurcation problem: Given a probability measure $μ$, which corresponds to the weight distribution of a link of a random graph we form a positive linear operator $Φ$ (convolution) on distribution functions and then we analyze a family of its exponents with a parameter $λ$ which corresponds to connectivity of a sparse random graph. We prove that for every measure $μ$ (\emph{i.e.}, convolution $Φ$) and every $λ< e$ there exists a unique globally attracting fixed point of the operator, which yields the existence and uniqueness of the limit probability distribution on the random graph. This estimate was established earlier \cite{KarpSipser} for deterministic weight distributions (Dirac measures $μ$) and is known as $e$-cutoff phenomena, as for such distributions and $λ>e$ there is no fixed point attractor. We thus establish this phenomenon in a much more general sense.

preprint2003arXiv

Random MAX SAT, Random MAX CUT, and Their Phase Transitions

Given a 2-SAT formula $F$ consisting of $n$ variables and $\cn$ random clauses, what is the largest number of clauses $\max F$ satisfiable by a single assignment of the variables? We bound the answer away from the trivial bounds of $(3/4)cn$ and $cn$. We prove that for $c<1$, the expected number of clauses satisfiable is $\cn-Θ(1/n)$; for large $c$, it is $((3/4)c + Θ(\sqrt{c}))n$; for $c = 1+\eps$, it is at least $(1+\eps-O(\eps^3))n$ and at most $(1+\eps-Ω(\eps^3/\ln \eps))n$; and in the ``scaling window'' $c= 1+Θ(n^{-1/3})$, it is $cn-Θ(1)$. In particular, just as the decision problem undergoes a phase transition, our optimization problem also undergoes a phase transition at the same critical value $c=1$. Nearly all of our results are established without reference to the analogous propositions for decision 2-SAT, and as a byproduct we reproduce many of those results, including much of what is known about the 2-SAT scaling window. We consider ``online'' versions of MAX-2-SAT, and show that for one version, the obvious greedy algorithm is optimal. We can extend only our simplest MAX-2-SAT results to MAX-k-SAT, but we conjecture a ``MAX-k-SAT limiting function conjecture'' analogous to the folklore satisfiability threshold conjecture, but open even for $k=2$. Neither conjecture immediately implies the other, but it is natural to further conjecture a connection between them. Finally, for random MAXCUT (the size of a maximum cut in a sparse random graph) we prove analogous results.

David Gamarnik

What is connected

Connect this record

See the researcher in context

Building this map preview

37 published item(s)

Price of Quality: Sufficient Conditions for Sparse Recovery using Mixed-Quality Data

Algorithms and Barriers in the Symmetric Binary Perceptron Model

Circuit Lower Bounds for the p-Spin Optimization Problem

Hardness of Random Optimization Problems for Boolean Circuits, Low-Degree Polynomials, and Langevin Dynamics

Algorithmic Obstructions in the Random Number Partitioning Problem

Self-Regularity of Non-Negative Output Weights for Overparameterized Two-Layer Neural Networks

Estimation of Monotone Multi-Index Models

Neural Networks and Polynomial Regression. Demystifying the Overparametrization Phenomena

Stability, memory, and messaging tradeoffs in heterogeneous service systems

Stationary Points of Shallow Neural Networks with Quadratic Activation Function

The Quantum Approximate Optimization Algorithm Needs to See the Whole Graph: A Typical Case

The Quantum Approximate Optimization Algorithm Needs to See the Whole Graph: Worst Case Examples

The Landscape of the Planted Clique Problem: Dense subgraphs and the Overlap Gap Property

A Message Passing Algorithm for the Problem of Path Packing in Graphs

A Note on Alternating Minimization Algorithm for the Matrix Completion Problem

Finding a Large Submatrix of a Gaussian Random Matrix

Join the Shortest Queue with Many Servers. The Heavy Traffic Asymptotics

Giant Component in Random Multipartite Graphs with Given Degree Sequences

Hardness of parameter estimation in graphical models

Learning graphical models from the Glauber dynamics

Local Algorithms for Graphs

Performance of the Survey Propagation-guided decimation algorithm for the random NAE-K-SAT problem

Structure learning of antiferromagnetic Ising models

Combinatorial approach to the interpolation method and scaling limits in sparse random graphs

Convergent sequences of sparse graphs: A large deviations approach

Limits of local algorithms over sparse random graphs

On the rate of convergence to stationarity of the M/M/N queue in the Halfin-Whitt regime

Steady-state $\mathit{GI}/\mathit{GI}/\mathit{n}$ queue in the Halfin-Whitt regime

Belief Propagation for Min-cost Network Flow: Convergence and Correctness

Right-convergence of sparse random graphs

Strong spatial mixing for list coloring of graphs

Multiclass multiserver queueing system in the Halfin-Whitt heavy traffic regime. Asymptotics of the stationary distribution

Performance Analysis of Queueing Networks via Robust Optimization

Stability of Skorokhod problem is undecidable

Sequential cavity method for computing free energy and surface pressure

Dynamics of exponential linear map in functional space

Random MAX SAT, Random MAX CUT, and Their Phase Transitions