Researcher profile

Guy Bresler

Guy Bresler contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
7works
0followers
8topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

7 published item(s)

preprint2022arXiv

Threshold for Detecting High Dimensional Geometry in Anisotropic Random Geometric Graphs

In the anisotropic random geometric graph model, vertices correspond to points drawn from a high-dimensional Gaussian distribution and two vertices are connected if their distance is smaller than a specified threshold. We study when it is possible to hypothesis test between such a graph and an Erdős-Rényi graph with the same edge probability. If $n$ is the number of vertices and $α$ is the vector of eigenvalues, Eldan and Mikulincer show that detection is possible when $n^3 \gg (\|α\|_2/\|α\|_3)^6$ and impossible when $n^3 \ll (\|α\|_2/\|α\|_4)^4$. We show detection is impossible when $n^3 \ll (\|α\|_2/\|α\|_3)^6$, closing this gap and affirmatively resolving the conjecture of Eldan and Mikulincer.

preprint2021arXiv

Sharp Representation Theorems for ReLU Networks with Precise Dependence on Depth

We prove sharp dimension-free representation results for neural networks with $D$ ReLU layers under square loss for a class of functions $\mathcal{G}_D$ defined in the paper. These results capture the precise benefits of depth in the following sense: 1. The rates for representing the class of functions $\mathcal{G}_D$ via $D$ ReLU layers is sharp up to constants, as shown by matching lower bounds. 2. For each $D$, $\mathcal{G}_{D} \subseteq \mathcal{G}_{D+1}$ and as $D$ grows the class of functions $\mathcal{G}_{D}$ contains progressively less smooth functions. 3. If $D^{\prime} < D$, then the approximation rate for the class $\mathcal{G}_D$ achieved by depth $D^{\prime}$ networks is strictly worse than that achieved by depth $D$ networks. This constitutes a fine-grained characterization of the representation power of feedforward networks of arbitrary depth $D$ and number of neurons $N$, in contrast to existing representation results which either require $D$ growing quickly with $N$ or assume that the function being represented is highly smooth. In the latter case similar rates can be obtained with a single nonlinear layer. Our results confirm the prevailing hypothesis that deeper networks are better at representing less smooth functions, and indeed, the main technical novelty is to fully exploit the fact that deep networks can produce highly oscillatory functions with few activation functions.

preprint2020arXiv

A Corrective View of Neural Networks: Representation, Memorization and Learning

We develop a corrective mechanism for neural network approximation: the total available non-linear units are divided into multiple groups and the first group approximates the function under consideration, the second group approximates the error in approximation produced by the first group and corrects it, the third group approximates the error produced by the first and second groups together and so on. This technique yields several new representation and learning results for neural networks. First, we show that two-layer neural networks in the random features regime (RF) can memorize arbitrary labels for arbitrary points under under Euclidean distance separation condition using $\tilde{O}(n)$ ReLUs which is optimal in $n$ up to logarithmic factors. Next, we give a powerful representation result for two-layer neural networks with ReLUs and smoothed ReLUs which can achieve a squared error of at most $ε$ with $O(C(a,d)ε^{-1/(a+1)})$ for $a \in \mathbb{N}\cup\{0\}$ when the function is smooth enough (roughly when it has $Θ(ad)$ bounded derivatives). In certain cases $d$ can be replaced with effective dimension $q \ll d$. Previous results of this type implement Taylor series approximation using deep architectures. We also consider three-layer neural networks and show that the corrective mechanism yields faster representation rates for smooth radial functions. Lastly, we obtain the first $O(\mathrm{subpoly}(1/ε))$ upper bound on the number of neurons required for a two layer network to learn low degree polynomials up to squared error $ε$ via gradient descent. Even though deep networks can express these polynomials with $O(\mathrm{polylog}(1/ε))$ neurons, the best learning bounds on this problem require $\mathrm{poly}(1/ε)$ neurons.

preprint2020arXiv

Average-Case Lower Bounds for Learning Sparse Mixtures, Robust Estimation and Semirandom Adversaries

This paper develops several average-case reduction techniques to show new hardness results for three central high-dimensional statistics problems, implying a statistical-computational gap induced by robustness, a detection-recovery gap and a universality principle for these gaps. A main feature of our approach is to map to these problems via a common intermediate problem that we introduce, which we call Imbalanced Sparse Gaussian Mixtures. We assume the planted clique conjecture for a version of the planted clique problem where the position of the planted clique is mildly constrained, and from this obtain the following computational lower bounds: (1) a $k$-to-$k^2$ statistical-computational gap for robust sparse mean estimation, providing the first average-case evidence for a conjecture of Li (2017) and Balakrishnan et al. (2017); (2) a tight lower bound for semirandom planted dense subgraph, which shows that a semirandom adversary shifts the detection threshold in planted dense subgraph to the conjectured recovery threshold; and (3) a universality principle for $k$-to-$k^2$ gaps in a broad class of sparse mixture problems that includes many natural formulations such as the spiked covariance model. Our main approach is to introduce several average-case techniques to produce structured and Gaussianized versions of an input graph problem, and then to rotate these high-dimensional Gaussians by matrices carefully constructed from hyperplanes in $\mathbb{F}_r^t$. For our universality result, we introduce a new method to perform an algorithmic change of measure tailored to sparse mixtures. We also provide evidence that the mild promise in our variant of planted clique does not change the complexity of the problem.

preprint2020arXiv

Least Squares Regression with Markovian Data: Fundamental Limits and Algorithms

We study the problem of least squares linear regression where the data-points are dependent and are sampled from a Markov chain. We establish sharp information theoretic minimax lower bounds for this problem in terms of $τ_{\mathsf{mix}}$, the mixing time of the underlying Markov chain, under different noise settings. Our results establish that in general, optimization with Markovian data is strictly harder than optimization with independent data and a trivial algorithm (SGD-DD) that works with only one in every $\tildeΘ(τ_{\mathsf{mix}})$ samples, which are approximately independent, is minimax optimal. In fact, it is strictly better than the popular Stochastic Gradient Descent (SGD) method with constant step-size which is otherwise minimax optimal in the regression with independent data setting. Beyond a worst case analysis, we investigate whether structured datasets seen in practice such as Gaussian auto-regressive dynamics can admit more efficient optimization schemes. Surprisingly, even in this specific and natural setting, Stochastic Gradient Descent (SGD) with constant step-size is still no better than SGD-DD. Instead, we propose an algorithm based on experience replay--a popular reinforcement learning technique--that achieves a significantly better error rate. Our improved rate serves as one of the first results where an algorithm outperforms SGD-DD on an interesting Markov chain and also provides one of the first theoretical analyses to support the use of experience replay in practice.

preprint2020arXiv

Phase Transitions for Detecting Latent Geometry in Random Graphs

Random graphs with latent geometric structure are popular models of social and biological networks, with applications ranging from network user profiling to circuit design. These graphs are also of purely theoretical interest within computer science, probability and statistics. A fundamental initial question regarding these models is: when are these random graphs affected by their latent geometry and when are they indistinguishable from simpler models without latent structure, such as the Erdős-Rényi graph $\mathcal{G}(n, p)$? We address this question for two of the most well-studied models of random graphs with latent geometry -- the random intersection and random geometric graph. Our results are as follows: (1) we prove that the random intersection graph converges in total variation to $\mathcal{G}(n, p)$ when $d = \tildeω(n^3)$, and does not if $d = o(n^3)$, resolving an open problem in Fill et al. (2000), Rybarczyk (2011) and Kim et al. (2018); (2) we provide conditions under which the matrix of intersection sizes of random family of sets converges in total variation to a symmetric matrix with independent Poisson entries, yielding the first total variation convergence result for $τ$-random intersection graphs to $\mathcal{G}(n, p)$; and (3) we show that the random geometric graph on $\mathbb{S}^{d - 1}$ with edge density $p$ converges in total variation to $\mathcal{G}(n, p)$ when $d = \tildeω\left(\min\{ pn^3, p^2 n^{7/2} \} \right)$, yielding the first progress towards a conjecture of Bubeck et al. (2016). The first of these three results was obtained simultaneously and independently by Bubeck, Racz and Richey.

preprint2020arXiv

Reducibility and Statistical-Computational Gaps from Secret Leakage

Inference problems with conjectured statistical-computational gaps are ubiquitous throughout modern statistics, computer science and statistical physics. While there has been success evidencing these gaps from the failure of restricted classes of algorithms, progress towards a more traditional reduction-based approach to computational complexity in statistical inference has been limited. Existing reductions have largely been limited to inference problems with similar structure -- primarily mapping among problems representable as a sparse submatrix signal plus a noise matrix, which are similar to the common hardness assumption of planted clique. The insight in this work is that a slight generalization of the planted clique conjecture -- secret leakage planted clique -- gives rise to a variety of new average-case reduction techniques, yielding a web of reductions among problems with very different structure. Using variants of the planted clique conjecture for specific forms of secret leakage planted clique, we deduce tight statistical-computational tradeoffs for a diverse range of problems including robust sparse mean estimation, mixtures of sparse linear regressions, robust sparse linear regression, tensor PCA, variants of dense $k$-block stochastic block models, negatively correlated sparse PCA, semirandom planted dense subgraph, detection in hidden partition models and a universality principle for learning sparse mixtures. In particular, a $k$-partite hypergraph variant of the planted clique conjecture is sufficient to establish all of our computational lower bounds. Our techniques also reveal novel connections to combinatorial designs and to random matrix theory. This work gives the first evidence that an expanded set of hardness assumptions, such as for secret leakage planted clique, may be a key first step towards a more complete theory of reductions among statistical problems.