Researcher profile

Sho Yaida

Sho Yaida contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 19 - UnverifiedVerification L1Unclaimed author
5works
0followers
8topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

5 published item(s)

preprint2021arXiv

The Principles of Deep Learning Theory

This book develops an effective theory approach to understanding deep neural networks of practical relevance. Beginning from a first-principles component-level picture of networks, we explain how to determine an accurate description of the output of trained networks by solving layer-to-layer iteration equations and nonlinear learning dynamics. A main result is that the predictions of networks are described by nearly-Gaussian distributions, with the depth-to-width aspect ratio of the network controlling the deviations from the infinite-width Gaussian description. We explain how these effectively-deep networks learn nontrivial representations from training and more broadly analyze the mechanism of representation learning for nonlinear models. From a nearly-kernel-methods perspective, we find that the dependence of such models' predictions on the underlying learning algorithm can be expressed in a simple and universal way. To obtain these results, we develop the notion of representation group flow (RG flow) to characterize the propagation of signals through the network. By tuning networks to criticality, we give a practical solution to the exploding and vanishing gradient problem. We further explain how RG flow leads to near-universal behavior and lets us categorize networks built from different activation functions into universality classes. Altogether, we show that the depth-to-width ratio governs the effective model complexity of the ensemble of trained networks. By using information-theoretic techniques, we estimate the optimal aspect ratio at which we expect the network to be practically most useful and show how residual connections can be used to push this scale to arbitrary depths. With these tools, we can learn in detail about the inductive bias of architectures, hyperparameters, and optimizers.

preprint2020arXiv

Non-Gaussian processes and neural networks at finite widths

Gaussian processes are ubiquitous in nature and engineering. A case in point is a class of neural networks in the infinite-width limit, whose priors correspond to Gaussian processes. Here we perturbatively extend this correspondence to finite-width neural networks, yielding non-Gaussian processes as priors. The methodology developed herein allows us to track the flow of preactivation distributions by progressively integrating out random variables from lower to higher layers, reminiscent of renormalization-group flow. We further develop a perturbative procedure to perform Bayesian inference with weakly non-Gaussian priors.

preprint2010arXiv

Holographic Lattices, Dimers, and Glasses

We holographically engineer a periodic lattice of localized fermionic impurities within a plasma medium by putting an array of probe D5-branes in the background produced by N D3-branes. Thermodynamic quantities are computed in the large N limit via the holographic dictionary. We then dope the lattice by replacing some of the D5-branes by anti-D5-branes. In the large N limit, we determine the critical temperature below which the system dimerizes with bond ordering. Finally, we argue that for the special case of a square lattice our system is glassy at large but finite N, with the low temperature physics dominated by a huge collection of metastable dimerized configurations without long-range order, connected only through tunneling events.

preprint2008arXiv

Viscosity Bound Violation in Higher Derivative Gravity

Motivated by the vast string landscape, we consider the shear viscosity to entropy density ratio in conformal field theories dual to Einstein gravity with curvature square corrections. After field redefinitions these theories reduce to Gauss-Bonnet gravity, which has special properties that allow us to compute the shear viscosity nonperturbatively in the Gauss-Bonnet coupling. By tuning of the coupling, the value of the shear viscosity to entropy density ratio can be adjusted to any positive value from infinity down to zero, thus violating the conjectured viscosity bound. At linear order in the coupling, we also check consistency of four different methods to calculate the shear viscosity, and we find that all of them agree. We search for possible pathologies associated with this class of theories violating the viscosity bound.

preprint2005arXiv

Energy Conditions and Junction Conditions

We consider the familiar junction conditions described by Israel for thin timelike walls in Einstein-Hilbert gravity. One such condition requires the induced metric to be continuous across the wall. Now, there are many spacetimes with sources confined to a thin wall for which this condition is violated and the Israel formalism does not apply. However, we explore the conjecture that the induced metric is in fact continuous for any thin wall which models spacetimes containing only positive energy matter. Thus, the usual junction conditions would hold for all positive energy spacetimes. This conjecture is proven in various special cases, including the case of static spacetimes with spherical or planar symmetry as well as settings without symmetry which may be sufficiently well approximated by smooth spacetimes with well-behaved null geodesic congruences.