Source author record

Gérard Ben Arous

Gérard Ben Arous appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.PR math-ph math.MP Machine Learning cond-mat.dis-nn Data Structures and Algorithms math.OC math.ST nlin.AO Statistics Theory

Catalog footprint

What is connected

16works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2025arXiv

Learning quadratic neural networks in high dimensions: SGD dynamics and scaling laws

We study the optimization and sample complexity of gradient-based training of a two-layer neural network with quadratic activation function in the high-dimensional regime, where the data is generated as $f_*(\boldsymbol{x}) \propto \sum_{j=1}^{r}λ_j σ\left(\langle \boldsymbol{θ_j}, \boldsymbol{x}\rangle\right), \boldsymbol{x} \sim N(0,\boldsymbol{I}_d)$, $σ$ is the 2nd Hermite polynomial, and $\lbrace\boldsymbolθ_j \rbrace_{j=1}^{r} \subset \mathbb{R}^d$ are orthonormal signal directions. We consider the extensive-width regime $r \asymp d^β$ for $β\in [0, 1)$, and assume a power-law decay on the (non-negative) second-layer coefficients $λ_j\asymp j^{-α}$ for $α\geq 0$. We present a sharp analysis of the SGD dynamics in the feature learning regime, for both the population limit and the finite-sample (online) discretization, and derive scaling laws for the prediction risk that highlight the power-law dependencies on the optimization time, sample size, and model width. Our analysis combines a precise characterization of the associated matrix Riccati differential equation with novel matrix monotonicity arguments to establish convergence guarantees for the infinite-dimensional effective dynamics.

preprint2023arXiv

Landscape Complexity for the Empirical Risk of Generalized Linear Models

We present a method to obtain the average and the typical value of the number of critical points of the empirical risk landscape for generalized linear estimation problems and variants. This represents a substantial extension of previous applications of the Kac-Rice method since it allows to analyze the critical points of high dimensional non-Gaussian random functions. Under a technical hypothesis, we obtain a rigorous explicit variational formula for the annealed complexity, which is the logarithm of the average number of critical points at fixed value of the empirical risk. This result is simplified, and extended, using the non-rigorous Kac-Rice replicated method from theoretical physics. In this way we find an explicit variational formula for the quenched complexity, which is generally different from its annealed counterpart, and allows to obtain the number of critical points for typical instances up to exponential accuracy.

preprint2021arXiv

Counting equilibria of large complex systems by instability index

We consider a nonlinear autonomous system of $N\gg 1$ degrees of freedom randomly coupled by both relaxational ('gradient') and non-relaxational ('solenoidal') random interactions. We show that with increased interaction strength such systems generically undergo an abrupt transition from a trivial phase portrait with a single stable equilibrium into a topologically non-trivial regime of 'absolute instability' where equilibria are on average exponentially abundant, but typically all of them are unstable, unless the dynamics is purely gradient. When interactions increase even further the stable equilibria eventually become on average exponentially abundant unless the interaction is purely solenoidal. We further calculate the mean proportion of equilibria which have a fixed fraction of unstable directions.

preprint2020arXiv

Free Energy Wells and Overlap Gap Property in Sparse PCA

We study a variant of the sparse PCA (principal component analysis) problem in the "hard" regime, where the inference task is possible yet no polynomial-time algorithm is known to exist. Prior work, based on the low-degree likelihood ratio, has conjectured a precise expression for the best possible (sub-exponential) runtime throughout the hard regime. Following instead a statistical physics inspired point of view, we show bounds on the depth of free energy wells for various Gibbs measures naturally associated to the problem. These free energy wells imply hitting time lower bounds that corroborate the low-degree conjecture: we show that a class of natural MCMC (Markov chain Monte Carlo) methods (with worst-case initialization) cannot solve sparse PCA with less than the conjectured runtime. These lower bounds apply to a wide range of values for two tuning parameters: temperature and sparsity misparametrization. Finally, we prove that the Overlap Gap Property (OGP), a structural property that implies failure of certain local search algorithms, holds in a significant part of the hard regime.

preprint2018arXiv

Spectral gap estimates in mean field spin glasses

We show that mixing for local, reversible dynamics of mean field spin glasses is exponentially slow in the low temperature regime. We introduce a notion of free energy barriers for the overlap, and prove that their existence imply that the spectral gap is exponentially small, and thus that mixing is exponentially slow. We then exhibit sufficient conditions on the equilibrium Gibbs measure which guarantee the existence of these barriers, using the notion of replicon eigenvalue and 2D Guerra Talagrand bounds. We show how these sufficient conditions cover large classes of Ising spin models for reversible nearest-neighbor dynamics and spherical models for Langevin dynamics. Finally, in the case of Ising spins, Panchenko's recent rigorous calculation [79] of the free energy for a system of "two real replica" enables us to prove a quenched LDP for the overlap distribution, which gives us a wider criterion for slow mixing directly related to the Franz-Parisi-Virasoro approach [43,60]. This condition holds in a wider range of temperatures.

preprint2016arXiv

Scaling limit for the ant in a simple labyrinth

We prove that, after suitable rescaling, the simple random walk on the trace of a large critical branching random walk converges to the Brownian motion on the integrated super-Brownian excursion.

preprint2016arXiv

Scaling limit for the ant in high-dimensional labyrinths

We study here a detailed conjecture regarding one of the most important cases of anomalous diffusion, i.e the behavior of the "ant in the labyrinth". It is natural to conjecture (see [16] and [8]) that the scaling limit for random walks on large critical random graphs exists in high dimensions, and is universal. This scaling limit is simply the natural Brownian Motion on the Integrated Super-Brownian Excursion. We give here a set of four natural sufficient conditions on the critical graphs and prove that this set of assumptions ensures the validity of this conjecture. The remaining future task is to prove that these sufficient conditions hold for the various classical cases of critical random structures, like the usual Bernoulli bond percolation, oriented percolation, spread-out percolation in high enough dimension. In the companion paper [10], we do precisely that in a first case, the random walk on the trace of a large critical branching random walk. We verify the validity of these sufficient conditions and thus obtain the scaling limit mentioned above, in dimensions larger than 14.

preprint2015arXiv

Randomly trapped random walks

We introduce a general model of trapping for random walks on graphs. We give the possible scaling limits of these Randomly Trapped Random Walks on $\mathbb {Z}$. These scaling limits include the well-known fractional kinetics process, the Fontes-Isopi-Newman singular diffusion as well as a new broad class we call spatially subordinated Brownian motions. We give sufficient conditions for convergence and illustrate these on two important examples.

preprint2015arXiv

The Loss Surfaces of Multilayer Networks

We study the connection between the highly non-convex loss function of a simple model of the fully-connected feed-forward neural network and the Hamiltonian of the spherical spin-glass model under the assumptions of: i) variable independence, ii) redundancy in network parametrization, and iii) uniformity. These assumptions enable us to explain the complexity of the fully decoupled neural network through the prism of the results from random matrix theory. We show that for large-size decoupled networks the lowest critical values of the random loss function form a layered structure and they are located in a well-defined band lower-bounded by the global minimum. The number of local minima outside that band diminishes exponentially with the size of the network. We empirically verify that the mathematical model exhibits similar behavior as the computer simulations, despite the presence of high dependencies in real networks. We conjecture that both simulated annealing and SGD converge to the band of low critical points, and that all critical points found there are local minima of high quality measured by the test error. This emphasizes a major difference between large- and small-size networks where for the latter poor quality local minima have non-zero probability of being recovered. Finally, we prove that recovering the global minimum becomes harder as the network size increases and that it is in practice irrelevant as global minimum often leads to overfitting.

preprint2014arXiv

Smallest Singular Value for Perturbations of Random Permutation Matrices

We take a first small step to extend the validity of Rudelson-Vershynin type estimates to some sparse random matrices, here random permutation matrices. We give lower (and upper) bounds on the smallest singular value of a large random matrix D+M where M is a random permutation matrix, sampled uniformly, and D is diagonal. When D is itself random with i.i.d terms on the diagonal, we obtain a Rudelson-Vershynin type estimate, using the classical theory of random walks with negative drift.

preprint2013arXiv

Extreme gaps between eigenvalues of random matrices

This paper studies the extreme gaps between eigenvalues of random matrices. We give the joint limiting law of the smallest gaps for Haar-distributed unitary matrices and matrices from the Gaussian unitary ensemble. In particular, the kth smallest gap, normalized by a factor $n^{-4/3}$, has a limiting density proportional to $x^{3k-1}e^{-x^3}$. Concerning the largest gaps, normalized by $n/\sqrt{\log n}$, they converge in ${\mathrm{L}}^p$ to a constant for all $p>0$. These results are compared with the extreme gaps between zeros of the Riemann zeta function.

preprint2011arXiv

On fluctuations of eigenvalues of random permutation matrices

Smooth linear statistics of random permutation matrices, sampled under a general Ewens distribution, exhibit an interesting non-universality phenomenon. Though they have bounded variance, their fluctuations are asymptotically non-Gaussian but infinitely divisible. The fluctuations are asymptotically Gaussian for less smooth linear statistics for which the variance diverges. The degree of smoothness is measured in terms of the quality of the trapezoidal approximations of the integral of the observable.

preprint2010arXiv

Biased random walks on a Galton-Watson tree with leaves

We consider a biased random walk $X_n$ on a Galton-Watson tree with leaves in the sub-ballistic regime. We prove that there exists an explicit constant $γ= γ(β) \in (0,1)$, depending on the bias $β$, such that $X_n$ is of order $n^γ$. Denoting $Δ_n$ the hitting time of level $n$, we prove that $Δ_n/n^{1/γ}$ is tight. Moreover we show that $Δ_n/n^{1/γ}$ does not converge in law (at least for large values of $β$). We prove that along the sequences $n_λ(k)=\lfloor λβ^{γk}\rfloor$, $Δ_n/n^{1/γ}$ converges to certain infinitely divisible laws. Key tools for the proof are the classical Harris decomposition for Galton-Watson trees, a new variant of regeneration times and the careful analysis of triangular arrays of i.i.d. heavy-tailed random variables.

preprint2010arXiv

Current fluctuations for TASEP: A proof of the Prähofer--Spohn conjecture

We consider the family of two-sided Bernoulli initial conditions for TASEP which, as the left and right densities ($ρ_-,ρ_+$) are varied, give rise to shock waves and rarefaction fans---the two phenomena which are typical to TASEP. We provide a proof of Conjecture 7.1 of [Progr. Probab. 51 (2002) 185--204] which characterizes the order of and scaling functions for the fluctuations of the height function of two-sided TASEP in terms of the two densities $ρ_-,ρ_+$ and the speed $y$ around which the height is observed. In proving this theorem for TASEP, we also prove a fluctuation theorem for a class of corner growth processes with external sources, or equivalently for the last passage time in a directed last passage percolation model with two-sided boundary conditions: $ρ_-$ and $1-ρ_+$. We provide a complete characterization of the order of and the scaling functions for the fluctuations of this model's last passage time $L(N,M)$ as a function of three parameters: the two boundary/source rates $ρ_-$ and $1-ρ_+$, and the scaling ratio $γ^2=M/N$. The proof of this theorem draws on the results of [Comm. Math. Phys. 265 (2006) 1--44] and extensively on the work of [Ann. Probab. 33 (2005) 1643--1697] on finite rank perturbations of Wishart ensembles in random matrix theory.

preprint2007arXiv

Scaling limit for trap models on $\mathbb{Z}^d$

We give the ``quenched'' scaling limit of Bouchaud's trap model in ${d\ge 2}$. This scaling limit is the fractional-kinetics process, that is the time change of a $d$-dimensional Brownian motion by the inverse of an independent $α$-stable subordinator.

preprint2006arXiv

Transition from the annealed to the quenched asymptotics for a random walk on random obstacles

In this work we study a natural transition mechanism describing the passage from a quenched (almost sure) regime to an annealed (in average) one, for a symmetric simple random walk on random obstacles on sites having an identical and independent law. The transition mechanism we study was first proposed in the context of sums of identical independent random exponents by Ben Arous, Bogachev and Molchanov in [Probab. Theory Related Fields 132 (2005) 579--612]. Let $p(x,t)$ be the survival probability at time $t$ of the random walk, starting from site $x$, and let $L(t)$ be some increasing function of time. We show that the empirical average of $p(x,t)$ over a box of side $L(t)$ has different asymptotic behaviors depending on $L(t)$. T here are constants $0<γ_1<γ_2$ such that if $L(t)\ge e^{γt^{d/(d+2)}}$, with $γ>γ_1$, a law of large numbers is satisfied and the empirical survival probability decreases like the annealed one; if $L(t)\ge e^{γt^{d/(d+2)}}$, with $γ>γ_2$, also a central limit theorem is satisfied. If ${L(t)\ll t}$, the averaged survival probability decreases like the quenched survival probability. If $t\ll L(t)$ and $\log L(t)\ll t^{d/(d+2)}$ we obtain an intermediate regime. Furthermore, when the dimension $d=1$ it is possible to describe the fluctuations of the averaged survival probability when $L(t)=e^{γt^{d/(d+2)}}$ with $γ<γ_2$: it is shown that they are infinitely divisible laws with a Lévy spectral function which explodes when $x\to0$ as stable laws of characteristic exponent $α<2$. These results show that the quenched and annealed survival probabilities correspond to a low- and high-temperature behavior of a mean-field type phase transition mechanism.

Gérard Ben Arous

What is connected

Connect this record

See the researcher in context

Building this map preview

16 published item(s)

Learning quadratic neural networks in high dimensions: SGD dynamics and scaling laws

Landscape Complexity for the Empirical Risk of Generalized Linear Models

Counting equilibria of large complex systems by instability index

Free Energy Wells and Overlap Gap Property in Sparse PCA

Spectral gap estimates in mean field spin glasses

Scaling limit for the ant in a simple labyrinth

Scaling limit for the ant in high-dimensional labyrinths

Randomly trapped random walks

The Loss Surfaces of Multilayer Networks

Smallest Singular Value for Perturbations of Random Permutation Matrices

Extreme gaps between eigenvalues of random matrices

On fluctuations of eigenvalues of random permutation matrices

Biased random walks on a Galton-Watson tree with leaves

Current fluctuations for TASEP: A proof of the Prähofer--Spohn conjecture

Scaling limit for trap models on $\mathbb{Z}^d$

Transition from the annealed to the quenched asymptotics for a random walk on random obstacles