Researcher profile

Yuhai Tu

Yuhai Tu contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
10works
0followers
12topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

10 published item(s)

preprint2026arXiv

Transient learning dynamics drive escape from sharp valleys in Stochastic Gradient Descent

Stochastic gradient descent (SGD) is central to deep learning, yet the dynamical origin of its preference for flatter, more generalizable solutions remains unclear. Here, by analyzing SGD learning dynamics, we identify a nonequilibrium mechanism governing solution selection. Numerical experiments reveal a transient exploratory phase in which SGD trajectories repeatedly escape sharp valleys and transition toward flatter regions of the loss landscape. By using a tractable physical model, we show that the SGD noise reshapes the landscape into an effective potential that favors flat solutions. Crucially, we uncover a transient freezing mechanism: as training proceeds, growing energy barriers suppress inter-valley transitions and ultimately trap the dynamics within a single basin. Increasing the SGD noise strength delays this freezing, which enhances convergence to flatter minima. Together, these results provide a unified physical framework linking learning dynamics, loss-landscape geometry, and generalization, and suggest principles for the design of more effective optimization algorithms.

preprint2022arXiv

Free energy dissipation enhances spatial accuracy and robustness of Turing pattern in small reaction-diffusion systems

Accurate and robust spatial orders are ubiquitous in living systems. In 1952, Alan Turing proposed an elegant mechanism for pattern formation based on spontaneous breaking of the spatial translational symmetry in the underlying reaction-diffusion system. Much is understood about dynamics and structure of Turing patterns. However, little is known about the energetic cost of Turing pattern. Here, we study nonequilibrium thermodynamics of a small spatially extended biochemical reaction-diffusion system by using analytical and numerical methods. We find that the onset of Turing pattern requires a minimum energy dissipation to drive the nonequilibrium chemical reactions. Above onset, only a small fraction of the total energy expenditure is used to overcome diffusion for maintaining the spatial pattern. We show that the positioning error decreases as energy dissipation increases following the same tradeoff relationship between timing error and energy cost in biochemical oscillatory systems. In a finite system, we find that a specific Turing pattern exists only within a finite range of total molecule number, and energy dissipation broadens the range, which enhances the robustness of the Turing pattern against molecule number fluctuations in living cells. These results are verified in a realistic model of the Muk system underlying DNA segregation in E. coli, and testable predictions are made for the dependence of the accuracy and robustness of the spatial pattern on the ATP/ADP ratio. In general, the theoretical framework developed here can be applied to study nonequilibrium thermodynamics of spatially extended biochemical systems.

preprint2022arXiv

Modeling bacterial flagellar motor with new structure information: Rotational dynamics of two interacting protein nano-rings

In this article, we develop a mathematical model for the rotary bacterial flagellar motor (BFM) based on the recently discovered structure of the stator complex (MotA$_5$MotB$_2$). The structure suggested that the stator also rotates. The BFM is modeled as two rotating nano-rings that interact with each other. Specifically, translocation of protons through the stator complex drives rotation of the MotA pentamer ring, which in turn drives rotation of the FliG ring in the rotor via interactions between the MotA ring of the stator and the FliG ring of the rotor. Preliminary results from the structure-informed model are consistent with the observed torque-speed relation. More importantly, the model predicts distinctive rotor and stator dynamics and their load dependence, which may be tested by future experiments. Possible approaches to verify and improve the model to further understanding of the molecular mechanism for torque generation in BFM are also discussed.

preprint2022arXiv

State-space renormalization group theory of nonequilibrium reaction networks: Exact solutions for hypercubic lattices in arbitrary dimensions

Nonequilibrium reaction networks (NRNs) underlie most biological functions. Despite their diverse dynamic properties, NRNs share the signature characteristics of persistent probability fluxes and continuous energy dissipation even in the steady state. Dynamics of NRNs can be described at different coarse-grained levels. Our previous work showed that the apparent energy dissipation rate at a coarse-grained level follows an inverse power law dependence on the scale of coarse-graining. The scaling exponent is determined by the network structure and correlation of stationary probability fluxes. However, it remains unclear whether and how the (renormalized) flux correlation varies with coarse-graining. Following Kadanoff's real space renormalization group (RG) approach for critical phenomena, we address this question by developing a State-Space Renormalization Group (SSRG) theory for NRNs, which leads to an iterative RG equation for the flux correlation function. In square and hypercubic lattices, we solve the RG equation exactly and find two types of fixed point solutions: a family of nontrivial fixed points where the correlation exhibits power-law decay and a trivial fixed point where the correlation vanishes beyond the nearest neighbors. The power-law fixed point is stable if and only if the power exponent is less than the lattice dimension $n$. Consequently, the correlation function converges to the power-law fixed point only when the correlation in the fine-grained network decays slower than $r^{-n}$ and to the trivial fixed point otherwise. If the flux correlation in the fine-grained network contains multiple stable solutions with different exponents, the RG iteration dynamics select the fixed point solution with the smallest exponent. We also discuss a possible connection between the RG flows of flux correlation with those of the Kosterlitz-Thouless transition.

preprint2022arXiv

The energy cost for flocking of active spins: the cusped dissipation maximum at the flocking transition

We study the energy cost of flocking in the active Ising model (AIM) and show that besides the energy cost for self-propelled motion, an additional energy dissipation is required to power the alignment of spins. We find that this additional alignment dissipation reaches its maximum at the flocking transition point in the form of a cusp with a discontinuous first derivative with respect to the control parameter. To understand this singular behavior, we analytically solve the two- and three-site AIM models and obtain the exact dependence of the alignment dissipation on the flocking order parameter and control parameter, which explains the cusped dissipation maximum at the flocking transition. Our results reveal a trade-off between the energy cost of the system and its performance measured by the flocking speed and sensitivity to external perturbations. This tradeoff relationship provides a new perspective for understanding the dynamics of natural flocks and designing optimal artificial flocking systems.

preprint2021arXiv

Phases of learning dynamics in artificial neural networks: with or without mislabeled data

Despite tremendous success of deep neural network in machine learning, the underlying reason for its superior learning capability remains unclear. Here, we present a framework based on statistical physics to study dynamics of stochastic gradient descent (SGD) that drives learning in neural networks. By using the minibatch gradient ensemble, we construct order parameters to characterize dynamics of weight updates in SGD. Without mislabeled data, we find that the SGD learning dynamics transitions from a fast learning phase to a slow exploration phase, which is associated with large changes in order parameters that characterize the alignment of SGD gradients and their mean amplitude. In the case with randomly mislabeled samples, SGD learning dynamics falls into four distinct phases. The system first finds solutions for the correctly labeled samples in phase I, it then wanders around these solutions in phase II until it finds a direction to learn the mislabeled samples during phase III, after which it finds solutions that satisfy all training samples during phase IV. Correspondingly, the test error decreases during phase I and remains low during phase II; however, it increases during phase III and reaches a high plateau during phase IV. The transitions between different phases can be understood by changes of order parameters that characterize the alignment of mean gradients for the correctly and incorrectly labeled samples and their (relative) strength during learning. We find that individual sample losses for the two datasets are most separated during phase II, which leads to a cleaning process to eliminate mislabeled samples for improving generalization.

preprint2020arXiv

Deciphering gene regulation from gene expression dynamics using deep neural network

Complex biological functions are carried out by the interaction of genes and proteins. Uncovering the gene regulation network behind a function is one of the central themes in biology. Typically, it involves extensive experiments of genetics, biochemistry and molecular biology. In this paper, we show that much of the inference task can be accomplished by a deep neural network (DNN), a form of machine learning or artificial intelligence. Specifically, the DNN learns from the dynamics of the gene expression. The learnt DNN behaves like an accurate simulator of the system, on which one can perform in-silico experiments to reveal the underlying gene network. We demonstrate the method with two examples: biochemical adaptation and the gap-gene patterning in fruit fly embryogenesis. In the first example, the DNN can successfully find the two basic network motifs for adaptation - the negative feedback and the incoherent feed-forward. In the second and much more complex example, the DNN can accurately predict behaviors of essentially all the mutants. Furthermore, the regulation network it uncovers is strikingly similar to the one inferred from experiments. In doing so, we develop methods for deciphering the gene regulation network hidden in the DNN "black box". Our interpretable DNN approach should have broad applications in genotype-phenotype mapping.

preprint2020arXiv

How neural networks find generalizable solutions: Self-tuned annealing in deep learning

Despite the tremendous success of Stochastic Gradient Descent (SGD) algorithm in deep learning, little is known about how SGD finds generalizable solutions in the high-dimensional weight space. By analyzing the learning dynamics and loss function landscape, we discover a robust inverse relation between the weight variance and the landscape flatness (inverse of curvature) for all SGD-based learning algorithms. To explain the inverse variance-flatness relation, we develop a random landscape theory, which shows that the SGD noise strength (effective temperature) depends inversely on the landscape flatness. Our study indicates that SGD attains a self-tuned landscape-dependent annealing strategy to find generalizable solutions at the flat minima of the landscape. Finally, we demonstrate how these new theoretical insights lead to more efficient algorithms, e.g., for avoiding catastrophic forgetting.

preprint2020arXiv

Nonequilibrium thermodynamics of coupled molecular oscillators: The energy cost and optimal design for synchronization

A model of coupled molecular oscillators is proposed to study nonequilibrium thermodynamics of synchronization. We find that synchronization of nonequilibrium oscillators costs energy even when the oscillator-oscillator coupling is conservative. By solving the steady state of the many-body system analytically, we show that the system goes through a nonequilibrium phase transition driven by energy dissipation, and the critical energy dissipation depends on both the frequency and strength of the exchange reactions. Moreover, our study reveals the optimal design for achieving maximum synchronization with a fixed energy budget. We apply our general theory to the Kai system in Cyanobacteria circadian clock and predict a relationship between the KaiC ATPase activity and synchronization of the KaiC hexamers. The theoretical framework can be extended to study thermodynamics of collective behaviors in other extended nonequilibrium active systems.

preprint2020arXiv

Scaling of Energy Dissipation in Nonequilibrium Reaction Networks

The energy dissipation rate in a nonequilibirum reaction system can be determined by the reaction rates in the underlying reaction network. By developing a coarse-graining process in state space and a corresponding renormalization procedure for reaction rates, we find that energy dissipation rate has an inverse power-law dependence on the number of microscopic states in a coarse-grained state. The dissipation scaling law requires self-similarity of the underlying network, and the scaling exponent depends on the network structure and the flux correlation. Implications of this inverse dissipation scaling law for active flow systems such as microtubule-kinesin mixture are discussed.