Researcher profile

Gérard Ben Arous

Gérard Ben Arous contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 19 - UnverifiedVerification L1Unclaimed author
5works
0followers
10topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

5 published item(s)

preprint2025arXiv

Learning quadratic neural networks in high dimensions: SGD dynamics and scaling laws

We study the optimization and sample complexity of gradient-based training of a two-layer neural network with quadratic activation function in the high-dimensional regime, where the data is generated as $f_*(\boldsymbol{x}) \propto \sum_{j=1}^{r}λ_j σ\left(\langle \boldsymbol{θ_j}, \boldsymbol{x}\rangle\right), \boldsymbol{x} \sim N(0,\boldsymbol{I}_d)$, $σ$ is the 2nd Hermite polynomial, and $\lbrace\boldsymbolθ_j \rbrace_{j=1}^{r} \subset \mathbb{R}^d$ are orthonormal signal directions. We consider the extensive-width regime $r \asymp d^β$ for $β\in [0, 1)$, and assume a power-law decay on the (non-negative) second-layer coefficients $λ_j\asymp j^{-α}$ for $α\geq 0$. We present a sharp analysis of the SGD dynamics in the feature learning regime, for both the population limit and the finite-sample (online) discretization, and derive scaling laws for the prediction risk that highlight the power-law dependencies on the optimization time, sample size, and model width. Our analysis combines a precise characterization of the associated matrix Riccati differential equation with novel matrix monotonicity arguments to establish convergence guarantees for the infinite-dimensional effective dynamics.

preprint2023arXiv

Landscape Complexity for the Empirical Risk of Generalized Linear Models

We present a method to obtain the average and the typical value of the number of critical points of the empirical risk landscape for generalized linear estimation problems and variants. This represents a substantial extension of previous applications of the Kac-Rice method since it allows to analyze the critical points of high dimensional non-Gaussian random functions. Under a technical hypothesis, we obtain a rigorous explicit variational formula for the annealed complexity, which is the logarithm of the average number of critical points at fixed value of the empirical risk. This result is simplified, and extended, using the non-rigorous Kac-Rice replicated method from theoretical physics. In this way we find an explicit variational formula for the quenched complexity, which is generally different from its annealed counterpart, and allows to obtain the number of critical points for typical instances up to exponential accuracy.

preprint2021arXiv

Counting equilibria of large complex systems by instability index

We consider a nonlinear autonomous system of $N\gg 1$ degrees of freedom randomly coupled by both relaxational ('gradient') and non-relaxational ('solenoidal') random interactions. We show that with increased interaction strength such systems generically undergo an abrupt transition from a trivial phase portrait with a single stable equilibrium into a topologically non-trivial regime of 'absolute instability' where equilibria are on average exponentially abundant, but typically all of them are unstable, unless the dynamics is purely gradient. When interactions increase even further the stable equilibria eventually become on average exponentially abundant unless the interaction is purely solenoidal. We further calculate the mean proportion of equilibria which have a fixed fraction of unstable directions.

preprint2020arXiv

Free Energy Wells and Overlap Gap Property in Sparse PCA

We study a variant of the sparse PCA (principal component analysis) problem in the "hard" regime, where the inference task is possible yet no polynomial-time algorithm is known to exist. Prior work, based on the low-degree likelihood ratio, has conjectured a precise expression for the best possible (sub-exponential) runtime throughout the hard regime. Following instead a statistical physics inspired point of view, we show bounds on the depth of free energy wells for various Gibbs measures naturally associated to the problem. These free energy wells imply hitting time lower bounds that corroborate the low-degree conjecture: we show that a class of natural MCMC (Markov chain Monte Carlo) methods (with worst-case initialization) cannot solve sparse PCA with less than the conjectured runtime. These lower bounds apply to a wide range of values for two tuning parameters: temperature and sparsity misparametrization. Finally, we prove that the Overlap Gap Property (OGP), a structural property that implies failure of certain local search algorithms, holds in a significant part of the hard regime.

preprint2018arXiv

Spectral gap estimates in mean field spin glasses

We show that mixing for local, reversible dynamics of mean field spin glasses is exponentially slow in the low temperature regime. We introduce a notion of free energy barriers for the overlap, and prove that their existence imply that the spectral gap is exponentially small, and thus that mixing is exponentially slow. We then exhibit sufficient conditions on the equilibrium Gibbs measure which guarantee the existence of these barriers, using the notion of replicon eigenvalue and 2D Guerra Talagrand bounds. We show how these sufficient conditions cover large classes of Ising spin models for reversible nearest-neighbor dynamics and spherical models for Langevin dynamics. Finally, in the case of Ising spins, Panchenko's recent rigorous calculation [79] of the free energy for a system of "two real replica" enables us to prove a quenched LDP for the overlap distribution, which gives us a wider criterion for slow mixing directly related to the Franz-Parisi-Virasoro approach [43,60]. This condition holds in a wider range of temperatures.