Researcher profile

Matthieu Wyart

Matthieu Wyart contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
12works
0followers
6topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

12 published item(s)

preprint2022arXiv

Avalanches and deformation in glasses and disordered systems

In this chapter, we discuss avalanches in glasses and disordered systems, and the macroscopic dynamical behavior that they mediate. We briefly review three classes of systems where avalanches are observed: depinning transition of disordered interfaces, yielding of amorphous materials, and the jamming transition. Without extensive formalism, we discuss results gleaned from theoretical approaches -- mean-field theory, scaling and exponent relations, the renormalization group, and a few results from replica theory. We focus both on the remarkably sophisticated physics of avalanches and on relatively new approaches to the macroscopic flow behavior exhibited past the depinning/yielding transition.

preprint2022arXiv

Failure and success of the spectral bias prediction for Kernel Ridge Regression: the case of low-dimensional data

Recently, several theories including the replica method made predictions for the generalization error of Kernel Ridge Regression. In some regimes, they predict that the method has a `spectral bias': decomposing the true function $f^*$ on the eigenbasis of the kernel, it fits well the coefficients associated with the O(P) largest eigenvalues, where $P$ is the size of the training set. This prediction works very well on benchmark data sets such as images, yet the assumptions these approaches make on the data are never satisfied in practice. To clarify when the spectral bias prediction holds, we first focus on a one-dimensional model where rigorous results are obtained and then use scaling arguments to generalize and test our findings in higher dimensions. Our predictions include the classification case $f(x)=$sign$(x_1)$ with a data distribution that vanishes at the decision boundary $p(x)\sim x_1^χ$. For $χ>0$ and a Laplace kernel, we find that (i) there exists a cross-over ridge $λ^*_{d,χ}(P)\sim P^{-\frac{1}{d+χ}}$ such that for $λ\gg λ^*_{d,χ}(P)$, the replica method applies, but not for $λ\llλ^*_{d,χ}(P)$, (ii) in the ridge-less case, spectral bias predicts the correct training curve exponent only in the limit $d\rightarrow\infty$.

preprint2022arXiv

Mean-field description for the architecture of low-energy excitations in glasses

In amorphous materials, groups of particles can rearrange locally into a new stable configuration. Such elementary excitations are key as they determine the response to external stresses, as well as to thermal and quantum fluctuations. Yet, understanding what controls their geometry remains a challenge. Here we build a scaling description of the geometry and energy of low-energy excitations in terms of the distance to an instability, as predicted for instance at the dynamical transition in mean field approaches of supercooled liquids. We successfully test our predictions in ultrastable computer glasses, with a gapped and ungapped (regular) spectrum. Overall, our approach explains why excitations become less extended, with a higher energy and displacement scale upon cooling.

preprint2020arXiv

Asymptotic learning curves of kernel methods: empirical data v.s. Teacher-Student paradigm

How many training data are needed to learn a supervised task? It is often observed that the generalization error decreases as $n^{-β}$ where $n$ is the number of training examples and $β$ an exponent that depends on both data and algorithm. In this work we measure $β$ when applying kernel methods to real datasets. For MNIST we find $β\approx 0.4$ and for CIFAR10 $β\approx 0.1$, for both regression and classification tasks, and for Gaussian or Laplace kernels. To rationalize the existence of non-trivial exponents that can be independent of the specific kernel used, we study the Teacher-Student framework for kernels. In this scheme, a Teacher generates data according to a Gaussian random field, and a Student learns them via kernel regression. With a simplifying assumption -- namely that the data are sampled from a regular lattice -- we derive analytically $β$ for translation invariant kernels, using previous results from the kriging literature. Provided that the Student is not too sensitive to high frequencies, $β$ depends only on the smoothness and dimension of the training data. We confirm numerically that these predictions hold when the training points are sampled at random on a hypersphere. Overall, the test error is found to be controlled by the magnitude of the projection of the true function on the kernel eigenvectors whose rank is larger than $n$. Using this idea we predict relate the exponent $β$ to an exponent $a$ describing how the coefficients of the true function in the eigenbasis of the kernel decay with rank. We extract $a$ from real data by performing kernel PCA, leading to $β\approx0.36$ for MNIST and $β\approx0.07$ for CIFAR10, in good agreement with observations. We argue that these rather large exponents are possible due to the small effective dimension of the data.

preprint2020arXiv

Direct Coupling Analysis of Epistasis in Allosteric Materials

In allosteric proteins, the binding of a ligand modifies function at a distant active site. Such allosteric pathways can be used as target for drug design, generating considerable interest in inferring them from sequence alignment data. Currently, different methods lead to conflicting results, in particular on the existence of long-range evolutionary couplings between distant amino-acids mediating allostery. Here we propose a resolution of this conundrum, by studying epistasis and its inference in models where an allosteric material is evolved in silico to perform a mechanical task. We find in our model the four types of epistasis (Synergistic, Sign, Antagonistic, Saturation), which can be both short or long-range and have a simple mechanical interpretation. We perform a Direct Coupling Analysis (DCA) and find that DCA predicts well the cost of point mutations but is a rather poor generative model. Strikingly, it can predict short-range epistasis but fails to capture long-range epistasis, in consistence with empirical findings. We propose that such failure is generic when function requires subparts to work in concert. We illustrate this idea with a simple model, which suggests that other methods may be better suited to capture long-range effects.

preprint2020arXiv

Inferring the flow properties of epithelial tissues from their geometry

Amorphous materials exhibit complex material proprteties with strongly nonlinear behaviors. Below a yield stress they behave as plastic solids, while they start to yield above a critical stress $Σ_c$. A key quantity controlling plasticity which is, however, hard to measure is the density $P(x)$ of weak spots, where $x$ is the additional stress required for local plastic failure. In the thermodynamic limit $P(x)\sim x^θ$ is singular at $x= 0$ in the solid phase below the yield stress $Σ_c$. This singularity is related to the presence of system spannig avalanches of plastic events. Here we address the question if the density of weak spots and the flow properties of a material can be determined from the geometry of an amporphous structure alone. We show that a vertex model for cell packings in tissues exhibits the phenomenology of plastic amorphous systems. As the yield stress is approached from above, the strain rate vanishes and the avalanches size $S$ and their duration $τ$ diverge. We then show that in general, in materials where the energy functional depend on topology, the value $x$ is proportional to the length $L$ of a bond that vanishes in a plastic event. For this class of models $P(x)$ is therefore readily measurable from geometry alone. Applying this approach to a quantification of the cell packing geometry in the developing wing epithelium of the fruit fly, we find that in this tissue $P(L)$ exhibits a power law with exponents similar to those found numerically for a vertex model in its solid phase. This suggests that this tissue exhibits plasticity and non-linear material properties that emerge from collective cell behaviors and that these material properties govern developmental processes. Our approach based on the relation between topology and energetics suggests a new route to outstanding questions associated with the yielding transition.

preprint2020arXiv

Jamming with tunable roughness

We introduce a new model to study the effect of surface roughness on the jamming transition. By performing numerical simulations, we show that for a smooth surface, the jamming transition density and the contact number at the transition point both increase upon increasing asphericity, as for ellipsoids and spherocylinders. Conversely, for a rough surface, both quantities decrease, in quantitative agreement with the behavior of frictional particles. Furthermore, in the limit corresponding to the Coulomb friction law, the model satisfies a generalized isostaticity criterion proposed in previous studies. We introduce a counting argument that justifies this criterion and interprets it geometrically. Finally, we propose a simple theory to predict the contact number at finite friction from the knowledge of the force distribution in the infinite friction limit.

preprint2020arXiv

Perspective: A Phase Diagram for Deep Learning unifying Jamming, Feature Learning and Lazy Training

Deep learning algorithms are responsible for a technological revolution in a variety of tasks including image recognition or Go playing. Yet, why they work is not understood. Ultimately, they manage to classify data lying in high dimension -- a feat generically impossible due to the geometry of high dimensional space and the associated curse of dimensionality. Understanding what kind of structure, symmetry or invariance makes data such as images learnable is a fundamental challenge. Other puzzles include that (i) learning corresponds to minimizing a loss in high dimension, which is in general not convex and could well get stuck bad minima. (ii) Deep learning predicting power increases with the number of fitting parameters, even in a regime where data are perfectly fitted. In this manuscript, we review recent results elucidating (i,ii) and the perspective they offer on the (still unexplained) curse of dimensionality paradox. We base our theoretical discussion on the $(h,α)$ plane where $h$ is the network width and $α$ the scale of the output of the network at initialization, and provide new systematic measures of performance in that plane for MNIST and CIFAR 10. We argue that different learning regimes can be organized into a phase diagram. A line of critical points sharply delimits an under-parametrised phase from an over-parametrized one. In over-parametrized nets, learning can operate in two regimes separated by a smooth cross-over. At large initialization, it corresponds to a kernel method, whereas for small initializations features can be learnt, together with invariants in the data. We review the properties of these different phases, of the transition separating them and some open questions. Our treatment emphasizes analogies with physical systems, scaling arguments and the development of numerical observables to quantitatively test these results empirically.

preprint2020arXiv

Thermal origin of quasi-localised excitations in glasses

Key aspects of glasses are controlled by the presence of excitations in which a group of particles can rearrange. Surprisingly, recent observations indicate that their density is dramatically reduced and their size decreases as the temperature of the supercooled liquid is lowered. Some theories predict these excitations to cause a gap in the spectrum of quasi-localised modes of the Hessian that grows upon cooling, while others predict a pseudo-gap ${D_L(ω)} \sim ω^α$. To unify these views and observations, we generate glassy configurations of controlled gap magnitude $ω_c$ at temperature ${T=0}$, using so-called `breathing' particles, and study how such gapped states respond to thermal fluctuations. We find that \textit{(i)}~the gap always fills up at finite $T$ with ${D_L(ω) \approx A_4(T) \, ω^4}$ and ${A_4 \sim \exp(-E_a / T)}$ at low $T$, \textit{(ii)}~$E_a$ rapidly grows with $ω_c$, in reasonable agreement with a simple scaling prediction ${E_a\sim ω_c^4}$ and \textit{(iii)}~at larger $ω_c$ excitations involve fewer particles, as we rationalise, and eventually become string-like. We propose an interpretation of mean-field theories of the glass transition, in which the modes beyond the gap act as an excitation reservoir, from which a pseudo-gap distribution is populated with its magnitude rapidly decreasing at lower $T$. We discuss how this picture unifies the rarefaction as well as the decreasing size of excitations upon cooling, together with a string-like relaxation occurring near the glass transition.

preprint2019arXiv

A jamming transition from under- to over-parametrization affects loss landscape and generalization

We argue that in fully-connected networks a phase transition delimits the over- and under-parametrized regimes where fitting can or cannot be achieved. Under some general conditions, we show that this transition is sharp for the hinge loss. In the whole over-parametrized regime, poor minima of the loss are not encountered during training since the number of constraints to satisfy is too small to hamper minimization. Our findings support a link between this transition and the generalization properties of the network: as we increase the number of parameters of a given model, starting from an under-parametrized network, we observe that the generalization error displays three phases: (i) initial decay, (ii) increase until the transition point --- where it displays a cusp --- and (iii) slow decay toward a constant for the rest of the over-parametrized regime. Thereby we identify the region where the classical phenomenon of over-fitting takes place, and the region where the model keeps improving, in line with previous empirical observations for modern neural networks.

preprint2019arXiv

Infinitesimal asphericity changes the universality of the jamming transition

The jamming transition of non-spherical particles is fundamentally different from the spherical case. Non-spherical particles are hypostatic at their jamming points, while isostaticity is ensured in the case of the jamming of spherical particles. This structural difference implies that the presence of asphericity affects the critical exponents related to the contact number and the vibrational density of states. Moreover, while the force and gap distributions of isostatic jamming present power-law behaviors, even an infinitesimal asphericity is enough to smooth out these singularities. In a recent work [PNAS 115(46), 11736], we have used a combination of marginal stability arguments and the replica method to explain these observations. We argued that systems with internal degrees of freedom, like the rotations in ellipsoids, or the variation of the radii in the case of the \textit{breathing} particles fall in the same universality class. In this paper, we review comprehensively the results about the jamming with internal degrees of freedom in addition to the translational degrees of freedom. We use a variational argument to derive the critical exponents of the contact number, shear modulus, and the characteristic frequencies of the density of states. Moreover, we present additional numerical data supporting the theoretical results, which were not shown in the previous work.

preprint2019arXiv

Scaling description of generalization with number of parameters in deep learning

Supervised deep learning involves the training of neural networks with a large number $N$ of parameters. For large enough $N$, in the so-called over-parametrized regime, one can essentially fit the training data points. Sparsity-based arguments would suggest that the generalization error increases as $N$ grows past a certain threshold $N^{*}$. Instead, empirical studies have shown that in the over-parametrized regime, generalization error keeps decreasing with $N$. We resolve this paradox through a new framework. We rely on the so-called Neural Tangent Kernel, which connects large neural nets to kernel methods, to show that the initialization causes finite-size random fluctuations $\|f_{N}-\bar{f}_{N}\|\sim N^{-1/4}$ of the neural net output function $f_{N}$ around its expectation $\bar{f}_{N}$. These affect the generalization error $ε_{N}$ for classification: under natural assumptions, it decays to a plateau value $ε_{\infty}$ in a power-law fashion $\sim N^{-1/2}$. This description breaks down at a so-called jamming transition $N=N^{*}$. At this threshold, we argue that $\|f_{N}\|$ diverges. This result leads to a plausible explanation for the cusp in test error known to occur at $N^{*}$. Our results are confirmed by extensive empirical observations on the MNIST and CIFAR image datasets. Our analysis finally suggests that, given a computational envelope, the smallest generalization error is obtained using several networks of intermediate sizes, just beyond $N^{*}$, and averaging their outputs.