Source author record

Emmanuel Abbe

Emmanuel Abbe appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Information Theory math.IT math.PR Machine Learning Social and Information Networks Computational Complexity Discrete Mathematics math.CO math.ST Statistics Theory cond-mat.stat-mech Applications Data Structures and Algorithms Genomics Logic in Computer Science Multiagent Systems nlin.AO physics.soc-ph

Catalog footprint

What is connected

38works

18topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

An $\ell_p$ theory of PCA and spectral clustering

Principal Component Analysis (PCA) is a powerful tool in statistics and machine learning. While existing study of PCA focuses on the recovery of principal components and their associated eigenvalues, there are few precise characterizations of individual principal component scores that yield low-dimensional embedding of samples. That hinders the analysis of various spectral methods. In this paper, we first develop an $\ell_p$ perturbation theory for a hollowed version of PCA in Hilbert spaces which provably improves upon the vanilla PCA in the presence of heteroscedastic noises. Through a novel $\ell_p$ analysis of eigenvectors, we investigate entrywise behaviors of principal component score vectors and show that they can be approximated by linear functionals of the Gram matrix in $\ell_p$ norm, which includes $\ell_2$ and $\ell_\infty$ as special examples. For sub-Gaussian mixture models, the choice of $p$ giving optimal bounds depends on the signal-to-noise ratio, which further yields optimality guarantees for spectral clustering. For contextual community detection, the $\ell_p$ theory leads to a simple spectral algorithm that achieves the information threshold for exact recovery. These also provide optimal recovery results for Gaussian mixture and stochastic block models as special cases.

preprint2022arXiv

An initial alignment between neural network and target is needed for gradient descent to learn

This paper introduces the notion of ``Initial Alignment'' (INAL) between a neural network at initialization and a target function. It is proved that if a network and a Boolean target function do not have a noticeable INAL, then noisy gradient descent on a fully connected network with normalized i.i.d. initialization will not learn in polynomial time. Thus a certain amount of knowledge about the target (measured by the INAL) is needed in the architecture design. This also provides an answer to an open problem posed in [AS20]. The results are based on deriving lower-bounds for descent algorithms on symmetric neural networks without explicit knowledge of the target function beyond its INAL.

preprint2022arXiv

On the Power of Differentiable Learning versus PAC and SQ Learning

We study the power of learning via mini-batch stochastic gradient descent (SGD) on the population loss, and batch Gradient Descent (GD) on the empirical loss, of a differentiable model or neural network, and ask what learning problems can be learnt using these paradigms. We show that SGD and GD can always simulate learning with statistical queries (SQ), but their ability to go beyond that depends on the precision $ρ$ of the gradient calculations relative to the minibatch size $b$ (for SGD) and sample size $m$ (for GD). With fine enough precision relative to minibatch size, namely when $b ρ$ is small enough, SGD can go beyond SQ learning and simulate any sample-based learning algorithm and thus its learning power is equivalent to that of PAC learning; this extends prior work that achieved this result for $b=1$. Similarly, with fine enough precision relative to the sample size $m$, GD can also simulate any sample-based learning algorithm based on $m$ samples. In particular, with polynomially many bits of precision (i.e. when $ρ$ is exponentially small), SGD and GD can both simulate PAC learning regardless of the mini-batch size. On the other hand, when $b ρ^2$ is large enough, the power of SGD is equivalent to that of SQ learning.

preprint2021arXiv

Quantifying the Benefit of Using Differentiable Learning over Tangent Kernels

We study the relative power of learning with gradient descent on differentiable models, such as neural networks, versus using the corresponding tangent kernels. We show that under certain conditions, gradient descent achieves small error only if a related tangent kernel method achieves a non-trivial advantage over random guessing (a.k.a. weak learning), though this advantage might be very small even when gradient descent can achieve arbitrarily high accuracy. Complementing this, we show that without these conditions, gradient descent can in fact learn with small error even when no kernel method, in particular using the tangent kernel, can achieve a non-trivial advantage over random guessing.

preprint2021arXiv

Stochastic block model entropy and broadcasting on trees with survey

The limit of the entropy in the stochastic block model (SBM) has been characterized in the sparse regime for the special case of disassortative communities [COKPZ17] and for the classical case of assortative communities but in the dense regime [DAM16]. The problem has not been closed in the classical sparse and assortative case. This paper establishes the result in this case for any SNR besides for the interval (1, 3.513). It further gives an approximation to the limit in this window. The result is obtained by expressing the global SBM entropy as an integral of local tree entropies in a broadcasting on tree model with erasure side-information. The main technical advancement then relies on showing the irrelevance of the boundary in such a model, also studied with variants in [KMS16], [MNS16] and [MX15]. In particular, we establish the uniqueness of the BP fixed point in the survey model for any SNR above 3.513 or below 1. This only leaves a narrow region in the plane between SNR and survey strength where the uniqueness of BP conjectured in these papers remains unproved.

preprint2020arXiv

An Alon-Boppana theorem for powered graphs and generalized Ramanujan graphs

The r-th power of a graph modifies a graph by connecting every vertex pair within distance r. This paper gives a generalization of the Alon-Boppana Theorem for the r-th power of graphs, including irregular graphs. This leads to a generalized notion of Ramanujan graphs, those for which the powered graph has a spectral gap matching the derived Alon-Boppana bound. In particular, we show that certain graphs that are not good expanders due to local irregularities, such as Erdos-Renyi random graphs, become almost Ramanujan once powered. A different generalization of Ramanujan graphs can also be obtained from the nonbacktracking operator. We next argue that the powering operator gives a more robust notion than the latter: Sparse Erdos-Renyi random graphs with an adversary modifying a subgraph of log(n)^c$ vertices are still almost Ramanujan in the powered sense, but not in the nonbacktracking sense. As an application, this gives robust community testing for different block models.

preprint2020arXiv

Community Detection on Euclidean Random Graphs

We study the problem of community detection (CD) on Euclidean random geometric graphs where each vertex has two latent variables: a binary community label and a $\mathbb{R}^d$ valued location label which forms the support of a Poisson point process of intensity $λ$. A random graph is then drawn with edge probabilities dependent on both the community and location labels. In contrast to the stochastic block model (SBM) that has no location labels, the resulting random graph contains many more short loops due to the geometric embedding. We consider the recovery of the community labels, partial and exact, using the random graph and the location labels. We establish phase transitions for both sparse and logarithmic degree regimes, and provide bounds on the location of the thresholds, conjectured to be tight in the case of exact recovery. We also show that the threshold of the distinguishability problem, i.e., the testing between our model and the null model without community labels exhibits no phase-transition and in particular, does not match the weak recovery threshold (in contrast to the SBM).

preprint2020arXiv

Learning Sparse Graphons and the Generalized Kesten-Stigum Threshold

The problem of learning graphons has attracted considerable attention across several scientific communities, with significant progress over the recent years in sparser regimes. Yet, the current techniques still require diverging degrees in order to succeed with efficient algorithms in the challenging cases where the local structure of the graph is homogeneous. This paper provides an efficient algorithm to learn graphons in the constant expected degree regime. The algorithm is shown to succeed in estimating the rank-$k$ projection of a graphon in the $L_2$ metric if the top $k$ eigenvalues of the graphon satisfy a generalized Kesten-Stigum condition.

preprint2020arXiv

Maximum Multiscale Entropy and Neural Network Regularization

A well-known result across information theory, machine learning, and statistical physics shows that the maximum entropy distribution under a mean constraint has an exponential form called the Gibbs-Boltzmann distribution. This is used for instance in density estimation or to achieve excess risk bounds derived from single-scale entropy regularizers (Xu-Raginsky '17). This paper investigates a generalization of these results to a multiscale setting. We present different ways of generalizing the maximum entropy result by incorporating the notion of scale. For different entropies and arbitrary scale transformations, it is shown that the distribution maximizing a multiscale entropy is characterized by a procedure which has an analogy to the renormalization group procedure in statistical physics. For the case of decimation transformation, it is further shown that this distribution is Gaussian whenever the optimal single-scale distribution is Gaussian. This is then applied to neural networks, and it is shown that in a teacher-student scenario, the multiscale Gibbs posterior can achieve a smaller excess risk than the single-scale Gibbs posterior.

preprint2020arXiv

Polarization in Attraction-Repulsion Models

This paper introduces a model for opinion dynamics, where at each time step, randomly selected agents see their opinions - modeled as scalars in [0,1] - evolve depending on a local interaction function. In the classical Bounded Confidence Model, agents opinions get attracted when they are close enough. The proposed model extends this by adding a repulsion component, which models the effect of opinions getting further pushed away when dissimilar enough. With this repulsion component added, and under a repulsion-attraction cleavage assumption, it is shown that a new stable configuration emerges beyond the classical consensus configuration, namely the polarization configuration. More specifically, it is shown that total consensus and total polarization are the only two possible limiting configurations. The paper further provides an analysis of the infinite population regime in dimension 1 and higher, with a phase transition phenomenon conjectured and backed heuristically.

preprint2020arXiv

Poly-time universality and limitations of deep learning

The goal of this paper is to characterize function distributions that deep learning can or cannot learn in poly-time. A universality result is proved for SGD-based deep learning and a non-universality result is proved for GD-based deep learning; this also gives a separation between SGD-based deep learning and statistical query algorithms: (1) {\it Deep learning with SGD is efficiently universal.} Any function distribution that can be learned from samples in poly-time can also be learned by a poly-size neural net trained with SGD on a poly-time initialization with poly-steps, poly-rate and possibly poly-noise. Therefore deep learning provides a universal learning paradigm: it was known that the approximation and estimation errors could be controlled with poly-size neural nets, using ERM that is NP-hard; this new result shows that the optimization error can also be controlled with SGD in poly-time. The picture changes for GD with large enough batches: (2) {\it Result (1) does not hold for GD:} Neural nets of poly-size trained with GD (full gradients or large enough batches) on any initialization with poly-steps, poly-range and at least poly-noise cannot learn any function distribution that has super-polynomial {\it cross-predictability,} where the cross-predictability gives a measure of ``average'' function correlation -- relations and distinctions to the statistical dimension are discussed. In particular, GD with these constraints can learn efficiently monomials of degree $k$ if and only if $k$ is constant. Thus (1) and (2) point to an interesting contrast: SGD is universal even with some poly-noise while full GD or SQ algorithms are not (e.g., parities).

preprint2020arXiv

Recursive projection-aggregation decoding of Reed-Muller codes

We propose a new class of efficient decoding algorithms for Reed-Muller (RM) codes over binary-input memoryless channels. The algorithms are based on projecting the code on its cosets, recursively decoding the projected codes (which are lower-order RM codes), and aggregating the reconstructions (e.g., using majority votes). We further provide extensions of the algorithms using list-decoding. We run our algorithm for AWGN channels and Binary Symmetric Channels at the short code length ($\le 1024$) regime for a wide range of code rates. Simulation results show that in both low code rate and high code rate regimes, the new algorithm outperforms the widely used decoder for polar codes (SCL+CRC) with the same parameters. The performance of the new algorithm for RM codes in those regimes is in fact close to that of the maximal likelihood decoder. Finally, the new decoder naturally allows for parallel implementations.

preprint2020arXiv

Reed-Muller Codes: Theory and Algorithms

Reed-Muller (RM) codes are among the oldest, simplest and perhaps most ubiquitous family of codes. They are used in many areas of coding theory in both electrical engineering and computer science. Yet, many of their important properties are still under investigation. This paper covers some of the recent developments regarding the weight enumerator and the capacity-achieving properties of RM codes, as well as some of the algorithmic developments. In particular, the paper discusses the recent connections established between RM codes, thresholds of Boolean functions, polarization theory, hypercontractivity, and the techniques of approximating low weight codewords using lower degree polynomials. It then overviews some of the algorithms with performance guarantees, as well as some of the algorithms with state-of-the-art performances in practical regimes. Finally, the paper concludes with a few open problems.

preprint2016arXiv

Detection in the stochastic block model with multiple clusters: proof of the achievability conjectures, acyclic BP, and the information-computation gap

In a paper that initiated the modern study of the stochastic block model, Decelle et al., backed by Mossel et al., made the following conjecture: Denote by $k$ the number of balanced communities, $a/n$ the probability of connecting inside communities and $b/n$ across, and set $\mathrm{SNR}=(a-b)^2/(k(a+(k-1)b)$; for any $k \geq 2$, it is possible to detect communities efficiently whenever $\mathrm{SNR}>1$ (the KS threshold), whereas for $k\geq 4$, it is possible to detect communities information-theoretically for some $\mathrm{SNR}<1$. Massoulié, Mossel et al.\ and Bordenave et al.\ succeeded in proving that the KS threshold is efficiently achievable for $k=2$, while Mossel et al.\ proved that it cannot be crossed information-theoretically for $k=2$. The above conjecture remained open for $k \geq 3$. This paper proves this conjecture, further extending the efficient detection to non-symmetrical SBMs with a generalized notion of detection and KS threshold. For the efficient part, a linearized acyclic belief propagation (ABP) algorithm is developed and proved to detect communities for any $k$ down to the KS threshold in time $O(n \log n)$. Achieving this requires showing optimality of ABP in the presence of cycles, a challenge for message passing algorithms. The paper further connects ABP to a power iteration method with a nonbacktracking operator of generalized order, formalizing the interplay between message passing and spectral methods. For the information-theoretic (IT) part, a non-efficient algorithm sampling a typical clustering is shown to break down the KS threshold at $k=4$. The emerging gap is shown to be large in some cases; if $a=0$, the KS threshold reads $b \gtrsim k^2$ whereas the IT bound reads $b \gtrsim k \ln(k)$, making the SBM a good study-case for information-computation gaps.

preprint2015arXiv

Asymptotic Mutual Information for the Two-Groups Stochastic Block Model

We develop an information-theoretic view of the stochastic block model, a popular statistical model for the large-scale structure of complex networks. A graph $G$ from such a model is generated by first assigning vertex labels at random from a finite alphabet, and then connecting vertices with edge probabilities depending on the labels of the endpoints. In the case of the symmetric two-group model, we establish an explicit `single-letter' characterization of the per-vertex mutual information between the vertex labels and the graph. The explicit expression of the mutual information is intimately related to estimation-theoretic quantities, and --in particular-- reveals a phase transition at the critical point for community detection. Below the critical point the per-vertex mutual information is asymptotically the same as if edges were independent. Correspondingly, no algorithm can estimate the partition better than random guessing. Conversely, above the threshold, the per-vertex mutual information is strictly smaller than the independent-edges upper bound. In this regime there exists a procedure that estimates the vertex labels better than random guessing.

preprint2015arXiv

Community detection in general stochastic block models: fundamental limits and efficient recovery algorithms

New phase transition phenomena have recently been discovered for the stochastic block model, for the special case of two non-overlapping symmetric communities. This gives raise in particular to new algorithmic challenges driven by the thresholds. This paper investigates whether a general phenomenon takes place for multiple communities, without imposing symmetry. In the general stochastic block model $\text{SBM}(n,p,Q)$, $n$ vertices are split into $k$ communities of relative size $\{p_i\}_{i \in [k]}$, and vertices in community $i$ and $j$ connect independently with probability $\{Q_{i,j}\}_{i,j \in [k]}$. This paper investigates the partial and exact recovery of communities in the general SBM (in the constant and logarithmic degree regimes), and uses the generality of the results to tackle overlapping communities. The contributions of the paper are: (i) an explicit characterization of the recovery threshold in the general SBM in terms of a new divergence function $D_+$, which generalizes the Hellinger and Chernoff divergences, and which provides an operational meaning to a divergence function analog to the KL-divergence in the channel coding theorem, (ii) the development of an algorithm that recovers the communities all the way down to the optimal threshold and runs in quasi-linear time, showing that exact recovery has no information-theoretic to computational gap for multiple communities, in contrast to the conjectures made for detection with more than 4 communities; note that the algorithm is optimal both in terms of achieving the threshold and in having quasi-linear complexity, (iii) the development of an efficient algorithm that detects communities in the constant degree regime with an explicit accuracy bound that can be made arbitrarily close to 1 when a prescribed signal-to-noise ratio (defined in term of the spectrum of $\diag(p)Q$) tends to infinity.

preprint2015arXiv

Concentration of the number of solutions of random planted CSPs and Goldreich's one-way candidates

This paper shows that the logarithm of the number of solutions of a random planted $k$-SAT formula concentrates around a deterministic $n$-independent threshold. Specifically, if $F^*_{k}(α,n)$ is a random $k$-SAT formula on $n$ variables, with clause density $α$ and with a uniformly drawn planted solution, there exists a function $ϕ_k(\cdot)$ such that, besides for some $α$ in a set of Lesbegue measure zero, we have $ \frac{1}{n}\log Z(F^*_{k}(α,n)) \to ϕ_k(α)$ in probability, where $Z(F)$ is the number of solutions of the formula $F$. This settles a problem left open in Abbe-Montanari RANDOM 2013, where the concentration is obtained only for the expected logarithm over the clause distribution. The result is also extended to a more general class of random planted CSPs; in particular, it is shown that the number of pre-images for the Goldreich one-way function model concentrates for some choices of the predicates.

preprint2015arXiv

Detecting Community Structures in Hi-C Genomic Data

Community detection (CD) algorithms are applied to Hi-C data to discover new communities of loci in the 3D conformation of human and mouse DNA. We find that CD has some distinct advantages over pre-existing methods: (1) it is capable of finding a variable number of communities, (2) it can detect communities of DNA loci either adjacent or distant in the 1D sequence, and (3) it allows us to obtain a principled value of k, the number of communities present. Forcing k = 2, our method recovers earlier findings of Lieberman-Aiden, et al. (2009), but letting k be a parameter, our method obtains as optimal value k = 6, discovering new candidate communities. In addition to discovering large communities that partition entire chromosomes, we also show that CD can detect small-scale topologically associating domains (TADs) such as those found in Dixon, et al. (2012). CD thus provides a natural and flexible statistical framework for understanding the folding structure of DNA at multiple scales in Hi-C data.

preprint2015arXiv

High-Girth Matrices and Polarization

The girth of a matrix is the least number of linearly dependent columns, in contrast to the rank which is the largest number of linearly independent columns. This paper considers the construction of {\it high-girth} matrices, whose probabilistic girth is close to its rank. Random matrices can be used to show the existence of high-girth matrices with constant relative rank, but the construction is non-explicit. This paper uses a polar-like construction to obtain a deterministic and efficient construction of high-girth matrices for arbitrary fields and relative ranks. Applications to coding and sparse recovery are discussed.

preprint2015arXiv

Linear Boolean classification, coding and "the critical problem"

The problem of constructing a minimal rank matrix over GF(2) whose kernel does not intersect a given set S is considered. In the case where S is a Hamming ball centered at 0, this is equivalent to finding linear codes of largest dimension. For a general set, this is an instance of "the critical problem" posed by Crapo and Rota in 1970. This work focuses on the case where S is an annulus. As opposed to balls, it is shown that an optimal kernel is composed not only of dense but also of sparse vectors, and the optimal mixture is identified in various cases. These findings corroborate a proposed conjecture that for annulus of inner and outer radius nq and np respectively, the optimal relative rank is given by (1-q)H(p/(1-q)), an extension of the Gilbert-Varshamov bound H(p) conjectured for Hamming balls of radius np.

preprint2015arXiv

Polar Coding for Secret-Key Generation

Practical implementations of secret-key generation are often based on sequential strategies, which handle reliability and secrecy in two successive steps, called reconciliation and privacy amplification. In this paper, we propose an alternative approach based on polar codes that jointly deals with reliability and secrecy. Specifically, we propose secret-key capacity-achieving polar coding schemes for the following models: (i) the degraded binary memoryless source (DBMS) model with rate-unlimited public communication, (ii) the DBMS model with one-way rate-limited public communication, (iii) the 1-to-m broadcast model and (iv) the Markov tree model with uniform marginals. For models (i) and (ii) our coding schemes remain valid for non-degraded sources, although they may not achieve the secret-key capacity. For models (i), (ii) and (iii), our schemes rely on pre-shared secret seed of negligible rate; however, we provide special cases of these models for which no seed is required. Finally, we show an application of our results to secrecy and privacy for biometric systems. We thus provide the first examples of low-complexity secret-key capacity-achieving schemes that are able to handle vector quantization for model (ii), or multiterminal communication for models (iii) and (iv).

preprint2015arXiv

Recovering communities in the general stochastic block model without knowing the parameters

Most recent developments on the stochastic block model (SBM) rely on the knowledge of the model parameters, or at least on the number of communities. This paper introduces efficient algorithms that do not require such knowledge and yet achieve the optimal information-theoretic tradeoffs identified in [AS15] for linear size communities. The results are three-fold: (i) in the constant degree regime, an algorithm is developed that requires only a lower-bound on the relative sizes of the communities and detects communities with an optimal accuracy scaling for large degrees; (ii) in the regime where degrees are scaled by $ω(1)$ (diverging degrees), this is enhanced into a fully agnostic algorithm that only takes the graph in question and simultaneously learns the model parameters (including the number of communities) and detects communities with accuracy $1-o(1)$, with an overall quasi-linear complexity; (iii) in the logarithmic degree regime, an agnostic algorithm is developed that learns the parameters and achieves the optimal CH-limit for exact recovery, in quasi-linear time. These provide the first algorithms affording efficiency, universality and information-theoretic optimality for strong and weak consistency in the general SBM with linear size communities.

preprint2014arXiv

Decoding binary node labels from censored edge measurements: Phase transition and efficient recovery

We consider the problem of clustering a graph $G$ into two communities by observing a subset of the vertex correlations. Specifically, we consider the inverse problem with observed variables $Y=B_G x \oplus Z$, where $B_G$ is the incidence matrix of a graph $G$, $x$ is the vector of unknown vertex variables (with a uniform prior) and $Z$ is a noise vector with Bernoulli$(\varepsilon)$ i.i.d. entries. All variables and operations are Boolean. This model is motivated by coding, synchronization, and community detection problems. In particular, it corresponds to a stochastic block model or a correlation clustering problem with two communities and censored edges. Without noise, exact recovery (up to global flip) of $x$ is possible if and only the graph $G$ is connected, with a sharp threshold at the edge probability $\log(n)/n$ for Erdős-Rényi random graphs. The first goal of this paper is to determine how the edge probability $p$ needs to scale to allow exact recovery in the presence of noise. Defining the degree (oversampling) rate of the graph by $α=np/\log(n)$, it is shown that exact recovery is possible if and only if $α>2/(1-2\varepsilon)^2+ o(1/(1-2\varepsilon)^2)$. In other words, $2/(1-2\varepsilon)^2$ is the information theoretic threshold for exact recovery at low-SNR. In addition, an efficient recovery algorithm based on semidefinite programming is proposed and shown to succeed in the threshold regime up to twice the optimal rate. For a deterministic graph $G$, defining the degree rate as $α=d/\log(n)$, where $d$ is the minimum degree of the graph, it is shown that the proposed method achieves the rate $α> 4((1+λ)/(1-λ)^2)/(1-2\varepsilon)^2+ o(1/(1-2\varepsilon)^2)$, where $1-λ$ is the spectral gap of the graph $G$.

preprint2014arXiv

Exact Recovery in the Stochastic Block Model

The stochastic block model (SBM) with two communities, or equivalently the planted bisection model, is a popular model of random graph exhibiting a cluster behaviour. In the symmetric case, the graph has two equally sized clusters and vertices connect with probability $p$ within clusters and $q$ across clusters. In the past two decades, a large body of literature in statistics and computer science has focused on providing lower-bounds on the scaling of $|p-q|$ to ensure exact recovery. In this paper, we identify a sharp threshold phenomenon for exact recovery: if $α=pn/\log(n)$ and $β=qn/\log(n)$ are constant (with $α>β$), recovering the communities with high probability is possible if $\frac{α+β}{2} - \sqrt{αβ}>1$ and impossible if $\frac{α+β}{2} - \sqrt{αβ}<1$. In particular, this improves the existing bounds. This also sets a new line of sight for efficient clustering algorithms. While maximum likelihood (ML) achieves the optimal threshold (by definition), it is in the worst-case NP-hard. This paper proposes an efficient algorithm based on a semidefinite programming relaxation of ML, which is proved to succeed in recovering the communities close to the threshold, while numerical experiments suggest it may achieve the threshold. An efficient algorithm which succeeds all the way down to the threshold is also obtained using a partial recovery algorithm combined with a local improvement procedure.

preprint2014arXiv

Polynomial complexity of polar codes for non-binary alphabets, key agreement and Slepian-Wolf coding

We consider polar codes for memoryless sources with side information and show that the blocklength, construction, encoding and decoding complexities are bounded by a polynomial of the reciprocal of the gap between the compression rate and the conditional entropy. This extends the recent results of Guruswami and Xia to a slightly more general setting, which in turn can be applied to (1) sources with non-binary alphabets, (2) key generation for discrete and Gaussian sources, and (3) Slepian-Wolf coding and multiple accessing. In each of these cases, the complexity scaling with respect to the number of users is also controlled. In particular, we construct coding schemes for these multi-user information theory problems which achieve optimal rates with an overall polynomial complexity.

preprint2014arXiv

Randomness and dependencies extraction via polarization, with applications to Slepian-Wolf coding and secrecy

The polarization phenomenon for a single source is extended to a framework with multiple correlated sources. It is shown in addition to extracting the randomness of the source, the polar transforms takes the original arbitrary dependencies to extremal dependencies. Polar coding schemes for the Slepian-Wolf problem and for secret key generations are then proposed based on this phenomenon. In particular, constructions of secret keys achieving the secrecy capacity and compression schemes achieving the Slepian-Wolf capacity region are obtained with a complexity of $O(n \log (n))$.

preprint2014arXiv

Reed-Muller codes for random erasures and errors

This paper studies the parameters for which Reed-Muller (RM) codes over $GF(2)$ can correct random erasures and random errors with high probability, and in particular when can they achieve capacity for these two classical channels. Necessarily, the paper also studies properties of evaluations of multi-variate $GF(2)$ polynomials on random sets of inputs. For erasures, we prove that RM codes achieve capacity both for very high rate and very low rate regimes. For errors, we prove that RM codes achieve capacity for very low rate regimes, and for very high rates, we show that they can uniquely decode at about square root of the number of errors at capacity. The proofs of these four results are based on different techniques, which we find interesting in their own right. In particular, we study the following questions about $E(m,r)$, the matrix whose rows are truth tables of all monomials of degree $\leq r$ in $m$ variables. What is the most (resp. least) number of random columns in $E(m,r)$ that define a submatrix having full column rank (resp. full row rank) with high probability? We obtain tight bounds for very small (resp. very large) degrees $r$, which we use to show that RM codes achieve capacity for erasures in these regimes. Our decoding from random errors follows from the following novel reduction. For every linear code $C$ of sufficiently high rate we construct a new code $C'$, also of very high rate, such that for every subset $S$ of coordinates, if $C$ can recover from erasures in $S$, then $C'$ can recover from errors in $S$. Specializing this to RM codes and using our results for erasures imply our result on unique decoding of RM codes at high rate. Finally, two of our capacity achieving results require tight bounds on the weight distribution of RM codes. We obtain such bounds extending the recent \cite{KLP} bounds from constant degree to linear degree polynomials.

preprint2013arXiv

A new entropy power inequality for integer-valued random variables

The entropy power inequality (EPI) provides lower bounds on the differential entropy of the sum of two independent real-valued random variables in terms of the individual entropies. Versions of the EPI for discrete random variables have been obtained for special families of distributions with the differential entropy replaced by the discrete entropy, but no universal inequality is known (beyond trivial ones). More recently, the sumset theory for the entropy function provides a sharp inequality $H(X+X')-H(X)\geq 1/2 -o(1)$ when $X,X'$ are i.i.d. with high entropy. This paper provides the inequality $H(X+X')-H(X) \geq g(H(X))$, where $X,X'$ are arbitrary i.i.d. integer-valued random variables and where $g$ is a universal strictly positive function on $\mR_+$ satisfying $g(0)=0$. Extensions to non identically distributed random variables and to conditional entropies are also obtained.

preprint2013arXiv

Conditional Random Fields, Planted Constraint Satisfaction, and Entropy Concentration

This paper studies a class of probabilistic models on graphs, where edge variables depend on incident node variables through a fixed probability kernel. The class includes planted con- straint satisfaction problems (CSPs), as well as more general structures motivated by coding and community clustering problems. It is shown that under mild assumptions on the kernel and for sparse random graphs, the conditional entropy of the node variables given the edge variables concentrates around a deterministic threshold. This implies in particular the concentration of the number of solutions in a broad class of planted CSPs, the existence of a threshold function for the disassortative stochastic block model, and the proof of a conjecture on parity check codes. It also establishes new connections among coding, clustering and satisfiability.

preprint2013arXiv

Polar Codes For Broadcast Channels

Polar codes are introduced for discrete memoryless broadcast channels. For $m$-user deterministic broadcast channels, polarization is applied to map uniformly random message bits from $m$ independent messages to one codeword while satisfying broadcast constraints. The polarization-based codes achieve rates on the boundary of the private-message capacity region. For two-user noisy broadcast channels, polar implementations are presented for two information-theoretic schemes: i) Cover's superposition codes; ii) Marton's codes. Due to the structure of polarization, constraints on the auxiliary and channel-input distributions are identified to ensure proper alignment of polarization indices in the multi-user setting. The codes achieve rates on the capacity boundary of a few classes of broadcast channels (e.g., binary-input stochastically degraded). The complexity of encoding and decoding is $O(n*log n)$ where $n$ is the block length. In addition, polar code sequences obtain a stretched-exponential decay of $O(2^{-n^β})$ of the average block error probability where $0 < β< 0.5$.

preprint2012arXiv

Adaptive sensing using deterministic partial Hadamard matrices

This paper investigates the construction of deterministic matrices preserving the entropy of random vectors with a given probability distribution. In particular, it is shown that for random vectors having i.i.d. discrete components, this is achieved by selecting a subset of rows of a Hadamard matrix such that (i) the selection is deterministic (ii) the fraction of selected rows is vanishing. In contrast, it is shown that for random vectors with i.i.d. continuous components, no partial Hadamard matrix of reduced dimension allows to preserve the entropy. These results are in agreement with the results of Wu-Verdu on almost lossless analog compression. This paper is however motivated by the complexity attribute of Hadamard matrices, which allows the use of efficient and stable reconstruction algorithms. The proof technique is based on a polar code martingale argument and on a new entropy power inequality for integer-valued random variables.

preprint2011arXiv

Proof of the outage probability conjecture for MISO channels

In Telatar 1999, it is conjectured that the covariance matrices minimizing the outage probability for MIMO channels with Gaussian fading are diagonal with either zeros or constant values on the diagonal. In the MISO setting, this is equivalent to conjecture that the Gaussian quadratic forms having largest tale probability correspond to such diagonal matrices. We prove here the conjecture in the MISO setting.

preprint2010arXiv

A Coordinate System for Gaussian Networks

This paper studies network information theory problems where the external noise is Gaussian distributed. In particular, the Gaussian broadcast channel with coherent fading and the Gaussian interference channel are investigated. It is shown that in these problems, non-Gaussian code ensembles can achieve higher rates than the Gaussian ones. It is also shown that the strong Shamai-Laroia conjecture on the Gaussian ISI channel does not hold. In order to analyze non-Gaussian code ensembles over Gaussian networks, a geometrical tool using the Hermite polynomials is proposed. This tool provides a coordinate system to analyze a class of non-Gaussian input distributions that are invariant over Gaussian networks.

preprint2010arXiv

Mutual information, matroids and extremal dependencies

In this paper, it is shown that the rank function of a matroid can be represented by a "mutual information function" if and only if the matroid is binary. The mutual information function considered is the one measuring the amount of information between the inputs (binary uniform) and the output of a multiple access channel (MAC). Moreover, it is shown that a MAC whose mutual information function is integer valued is "equivalent" to a linear deterministic MAC, in the sense that it essentially contains at the output no more information than some linear forms of the inputs. These notes put emphasis on the connection between mutual information functionals and rank functions in matroid theory, without assuming prior knowledge on these two subjects. The first section introduces mutual information functionals, the second section introduces basic notions of matroid theory, and the third section connects these two subjects. It is also shown that entropic matroids studied in the literature correspond to specific cases of MAC matroids.

preprint2010arXiv

On the concentration of the number of solutions of random satisfiability formulas

Let $Z(F)$ be the number of solutions of a random $k$-satisfiability formula $F$ with $n$ variables and clause density $α$. Assume that the probability that $F$ is unsatisfiable is $O(1/\log(n)^{1+\e})$ for $\e>0$. We show that (possibly excluding a countable set of `exceptional' $α$'s) the number of solutions concentrate in the logarithmic scale, i.e., there exists a non-random function $ϕ(α)$ such that, for any $δ>0$, $(1/n)\log Z(F)\in [ϕ-δ,ϕ+δ]$ with high probability. In particular, the assumption holds for all $α<1$, which proves the above concentration claim in the whole satisfiability regime of random $2$-SAT. We also extend these results to a broad class of constraint satisfaction problems. The proof is based on an interpolation technique from spin-glass theory, and on an application of Friedgut's theorem on sharp thresholds for graph properties.

preprint2010arXiv

Polar Codes for the m-User MAC

In this paper, polar codes for the $m$-user multiple access channel (MAC) with binary inputs are constructed. It is shown that Arıkan's polarization technique applied individually to each user transforms independent uses of a $m$-user binary input MAC into successive uses of extremal MACs. This transformation has a number of desirable properties: (i) the `uniform sum rate' of the original MAC is preserved, (ii) the extremal MACs have uniform rate regions that are not only polymatroids but matroids and thus (iii) their uniform sum rate can be reached by each user transmitting either uncoded or fixed bits; in this sense they are easy to communicate over. A polar code can then be constructed with an encoding and decoding complexity of $O(n \log n)$ (where $n$ is the block length), a block error probability of $o(\exp(- n^{1/2 - \e}))$, and capable of achieving the uniform sum rate of any binary input MAC with arbitrary many users. An application of this polar code construction to communicating on the AWGN channel is also discussed.

preprint2010arXiv

Universal A Posteriori Metrics Game

Over binary input channels, uniform distribution is a universal prior, in the sense that it allows to maximize the worst case mutual information over all binary input channels, ensuring at least 94.2% of the capacity. In this paper, we address a similar question, but with respect to a universal generalized linear decoder. We look for the best collection of finitely many a posteriori metrics, to maximize the worst case mismatched mutual information achieved by decoding with these metrics (instead of an optimal decoder such as the Maximum Likelihood (ML) tuned to the true channel). It is shown that for binary input and output channels, two metrics suffice to actually achieve the same performance as an optimal decoder. In particular, this implies that there exist a decoder which is generalized linear and achieves at least 94.2% of the compound capacity on any compound set, without the knowledge of the underlying set.

preprint2010arXiv

Universal polar coding and sparse recovery

This paper investigates universal polar coding schemes. In particular, a notion of ordering (called convolutional path) is introduced between probability distributions to determine when a polar compression (or communication) scheme designed for one distribution can also succeed for another one. The original polar decoding algorithm is also generalized to an algorithm allowing to learn information about the source distribution using the idea of checkers. These tools are used to construct a universal compression algorithm for binary sources, operating at the lowest achievable rate (entropy), with low complexity and with guaranteed small error probability. In a second part of the paper, the problem of sketching high dimensional discrete signals which are sparse is approached via the polarization technique. It is shown that the number of measurements required for perfect recovery is competitive with the $O(k \log (n/k))$ bound (with optimal constant for binary signals), meanwhile affording a deterministic low complexity measurement matrix.

Emmanuel Abbe

What is connected

Connect this record

See the researcher in context

Building this map preview

38 published item(s)

An $\ell_p$ theory of PCA and spectral clustering

An initial alignment between neural network and target is needed for gradient descent to learn

On the Power of Differentiable Learning versus PAC and SQ Learning

Quantifying the Benefit of Using Differentiable Learning over Tangent Kernels

Stochastic block model entropy and broadcasting on trees with survey

An Alon-Boppana theorem for powered graphs and generalized Ramanujan graphs

Community Detection on Euclidean Random Graphs

Learning Sparse Graphons and the Generalized Kesten-Stigum Threshold

Maximum Multiscale Entropy and Neural Network Regularization

Polarization in Attraction-Repulsion Models

Poly-time universality and limitations of deep learning

Recursive projection-aggregation decoding of Reed-Muller codes

Reed-Muller Codes: Theory and Algorithms

Detection in the stochastic block model with multiple clusters: proof of the achievability conjectures, acyclic BP, and the information-computation gap

Asymptotic Mutual Information for the Two-Groups Stochastic Block Model

Community detection in general stochastic block models: fundamental limits and efficient recovery algorithms

Concentration of the number of solutions of random planted CSPs and Goldreich's one-way candidates

Detecting Community Structures in Hi-C Genomic Data

High-Girth Matrices and Polarization

Linear Boolean classification, coding and "the critical problem"

Polar Coding for Secret-Key Generation

Recovering communities in the general stochastic block model without knowing the parameters

Decoding binary node labels from censored edge measurements: Phase transition and efficient recovery

Exact Recovery in the Stochastic Block Model

Polynomial complexity of polar codes for non-binary alphabets, key agreement and Slepian-Wolf coding

Randomness and dependencies extraction via polarization, with applications to Slepian-Wolf coding and secrecy

Reed-Muller codes for random erasures and errors

A new entropy power inequality for integer-valued random variables

Conditional Random Fields, Planted Constraint Satisfaction, and Entropy Concentration

Polar Codes For Broadcast Channels

Adaptive sensing using deterministic partial Hadamard matrices

Proof of the outage probability conjecture for MISO channels

A Coordinate System for Gaussian Networks

Mutual information, matroids and extremal dependencies

On the concentration of the number of solutions of random satisfiability formulas

Polar Codes for the m-User MAC

Universal A Posteriori Metrics Game

Universal polar coding and sparse recovery