Source author record

Yuhai Tu

Yuhai Tu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

cond-mat.stat-mech Biological Physics nlin.AO Subcellular Processes adap-org cond-mat Machine Learning nlin.PS Molecular Networks patt-sol physics.data-an Quantitative Methods Biomolecules Cell Behavior cond-mat.dis-nn cond-mat.soft

Catalog footprint

What is connected

20works

16topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Transient learning dynamics drive escape from sharp valleys in Stochastic Gradient Descent

Stochastic gradient descent (SGD) is central to deep learning, yet the dynamical origin of its preference for flatter, more generalizable solutions remains unclear. Here, by analyzing SGD learning dynamics, we identify a nonequilibrium mechanism governing solution selection. Numerical experiments reveal a transient exploratory phase in which SGD trajectories repeatedly escape sharp valleys and transition toward flatter regions of the loss landscape. By using a tractable physical model, we show that the SGD noise reshapes the landscape into an effective potential that favors flat solutions. Crucially, we uncover a transient freezing mechanism: as training proceeds, growing energy barriers suppress inter-valley transitions and ultimately trap the dynamics within a single basin. Increasing the SGD noise strength delays this freezing, which enhances convergence to flatter minima. Together, these results provide a unified physical framework linking learning dynamics, loss-landscape geometry, and generalization, and suggest principles for the design of more effective optimization algorithms.

preprint2022arXiv

Free energy dissipation enhances spatial accuracy and robustness of Turing pattern in small reaction-diffusion systems

Accurate and robust spatial orders are ubiquitous in living systems. In 1952, Alan Turing proposed an elegant mechanism for pattern formation based on spontaneous breaking of the spatial translational symmetry in the underlying reaction-diffusion system. Much is understood about dynamics and structure of Turing patterns. However, little is known about the energetic cost of Turing pattern. Here, we study nonequilibrium thermodynamics of a small spatially extended biochemical reaction-diffusion system by using analytical and numerical methods. We find that the onset of Turing pattern requires a minimum energy dissipation to drive the nonequilibrium chemical reactions. Above onset, only a small fraction of the total energy expenditure is used to overcome diffusion for maintaining the spatial pattern. We show that the positioning error decreases as energy dissipation increases following the same tradeoff relationship between timing error and energy cost in biochemical oscillatory systems. In a finite system, we find that a specific Turing pattern exists only within a finite range of total molecule number, and energy dissipation broadens the range, which enhances the robustness of the Turing pattern against molecule number fluctuations in living cells. These results are verified in a realistic model of the Muk system underlying DNA segregation in E. coli, and testable predictions are made for the dependence of the accuracy and robustness of the spatial pattern on the ATP/ADP ratio. In general, the theoretical framework developed here can be applied to study nonequilibrium thermodynamics of spatially extended biochemical systems.

preprint2022arXiv

Modeling bacterial flagellar motor with new structure information: Rotational dynamics of two interacting protein nano-rings

In this article, we develop a mathematical model for the rotary bacterial flagellar motor (BFM) based on the recently discovered structure of the stator complex (MotA$_5$MotB$_2$). The structure suggested that the stator also rotates. The BFM is modeled as two rotating nano-rings that interact with each other. Specifically, translocation of protons through the stator complex drives rotation of the MotA pentamer ring, which in turn drives rotation of the FliG ring in the rotor via interactions between the MotA ring of the stator and the FliG ring of the rotor. Preliminary results from the structure-informed model are consistent with the observed torque-speed relation. More importantly, the model predicts distinctive rotor and stator dynamics and their load dependence, which may be tested by future experiments. Possible approaches to verify and improve the model to further understanding of the molecular mechanism for torque generation in BFM are also discussed.

preprint2022arXiv

State-space renormalization group theory of nonequilibrium reaction networks: Exact solutions for hypercubic lattices in arbitrary dimensions

Nonequilibrium reaction networks (NRNs) underlie most biological functions. Despite their diverse dynamic properties, NRNs share the signature characteristics of persistent probability fluxes and continuous energy dissipation even in the steady state. Dynamics of NRNs can be described at different coarse-grained levels. Our previous work showed that the apparent energy dissipation rate at a coarse-grained level follows an inverse power law dependence on the scale of coarse-graining. The scaling exponent is determined by the network structure and correlation of stationary probability fluxes. However, it remains unclear whether and how the (renormalized) flux correlation varies with coarse-graining. Following Kadanoff's real space renormalization group (RG) approach for critical phenomena, we address this question by developing a State-Space Renormalization Group (SSRG) theory for NRNs, which leads to an iterative RG equation for the flux correlation function. In square and hypercubic lattices, we solve the RG equation exactly and find two types of fixed point solutions: a family of nontrivial fixed points where the correlation exhibits power-law decay and a trivial fixed point where the correlation vanishes beyond the nearest neighbors. The power-law fixed point is stable if and only if the power exponent is less than the lattice dimension $n$. Consequently, the correlation function converges to the power-law fixed point only when the correlation in the fine-grained network decays slower than $r^{-n}$ and to the trivial fixed point otherwise. If the flux correlation in the fine-grained network contains multiple stable solutions with different exponents, the RG iteration dynamics select the fixed point solution with the smallest exponent. We also discuss a possible connection between the RG flows of flux correlation with those of the Kosterlitz-Thouless transition.

preprint2022arXiv

The energy cost for flocking of active spins: the cusped dissipation maximum at the flocking transition

We study the energy cost of flocking in the active Ising model (AIM) and show that besides the energy cost for self-propelled motion, an additional energy dissipation is required to power the alignment of spins. We find that this additional alignment dissipation reaches its maximum at the flocking transition point in the form of a cusp with a discontinuous first derivative with respect to the control parameter. To understand this singular behavior, we analytically solve the two- and three-site AIM models and obtain the exact dependence of the alignment dissipation on the flocking order parameter and control parameter, which explains the cusped dissipation maximum at the flocking transition. Our results reveal a trade-off between the energy cost of the system and its performance measured by the flocking speed and sensitivity to external perturbations. This tradeoff relationship provides a new perspective for understanding the dynamics of natural flocks and designing optimal artificial flocking systems.

preprint2021arXiv

Phases of learning dynamics in artificial neural networks: with or without mislabeled data

Despite tremendous success of deep neural network in machine learning, the underlying reason for its superior learning capability remains unclear. Here, we present a framework based on statistical physics to study dynamics of stochastic gradient descent (SGD) that drives learning in neural networks. By using the minibatch gradient ensemble, we construct order parameters to characterize dynamics of weight updates in SGD. Without mislabeled data, we find that the SGD learning dynamics transitions from a fast learning phase to a slow exploration phase, which is associated with large changes in order parameters that characterize the alignment of SGD gradients and their mean amplitude. In the case with randomly mislabeled samples, SGD learning dynamics falls into four distinct phases. The system first finds solutions for the correctly labeled samples in phase I, it then wanders around these solutions in phase II until it finds a direction to learn the mislabeled samples during phase III, after which it finds solutions that satisfy all training samples during phase IV. Correspondingly, the test error decreases during phase I and remains low during phase II; however, it increases during phase III and reaches a high plateau during phase IV. The transitions between different phases can be understood by changes of order parameters that characterize the alignment of mean gradients for the correctly and incorrectly labeled samples and their (relative) strength during learning. We find that individual sample losses for the two datasets are most separated during phase II, which leads to a cleaning process to eliminate mislabeled samples for improving generalization.

preprint2020arXiv

Deciphering gene regulation from gene expression dynamics using deep neural network

Complex biological functions are carried out by the interaction of genes and proteins. Uncovering the gene regulation network behind a function is one of the central themes in biology. Typically, it involves extensive experiments of genetics, biochemistry and molecular biology. In this paper, we show that much of the inference task can be accomplished by a deep neural network (DNN), a form of machine learning or artificial intelligence. Specifically, the DNN learns from the dynamics of the gene expression. The learnt DNN behaves like an accurate simulator of the system, on which one can perform in-silico experiments to reveal the underlying gene network. We demonstrate the method with two examples: biochemical adaptation and the gap-gene patterning in fruit fly embryogenesis. In the first example, the DNN can successfully find the two basic network motifs for adaptation - the negative feedback and the incoherent feed-forward. In the second and much more complex example, the DNN can accurately predict behaviors of essentially all the mutants. Furthermore, the regulation network it uncovers is strikingly similar to the one inferred from experiments. In doing so, we develop methods for deciphering the gene regulation network hidden in the DNN "black box". Our interpretable DNN approach should have broad applications in genotype-phenotype mapping.

preprint2020arXiv

How neural networks find generalizable solutions: Self-tuned annealing in deep learning

Despite the tremendous success of Stochastic Gradient Descent (SGD) algorithm in deep learning, little is known about how SGD finds generalizable solutions in the high-dimensional weight space. By analyzing the learning dynamics and loss function landscape, we discover a robust inverse relation between the weight variance and the landscape flatness (inverse of curvature) for all SGD-based learning algorithms. To explain the inverse variance-flatness relation, we develop a random landscape theory, which shows that the SGD noise strength (effective temperature) depends inversely on the landscape flatness. Our study indicates that SGD attains a self-tuned landscape-dependent annealing strategy to find generalizable solutions at the flat minima of the landscape. Finally, we demonstrate how these new theoretical insights lead to more efficient algorithms, e.g., for avoiding catastrophic forgetting.

preprint2020arXiv

Nonequilibrium thermodynamics of coupled molecular oscillators: The energy cost and optimal design for synchronization

A model of coupled molecular oscillators is proposed to study nonequilibrium thermodynamics of synchronization. We find that synchronization of nonequilibrium oscillators costs energy even when the oscillator-oscillator coupling is conservative. By solving the steady state of the many-body system analytically, we show that the system goes through a nonequilibrium phase transition driven by energy dissipation, and the critical energy dissipation depends on both the frequency and strength of the exchange reactions. Moreover, our study reveals the optimal design for achieving maximum synchronization with a fixed energy budget. We apply our general theory to the Kai system in Cyanobacteria circadian clock and predict a relationship between the KaiC ATPase activity and synchronization of the KaiC hexamers. The theoretical framework can be extended to study thermodynamics of collective behaviors in other extended nonequilibrium active systems.

preprint2020arXiv

Scaling of Energy Dissipation in Nonequilibrium Reaction Networks

The energy dissipation rate in a nonequilibirum reaction system can be determined by the reaction rates in the underlying reaction network. By developing a coarse-graining process in state space and a corresponding renormalization procedure for reaction rates, we find that energy dissipation rate has an inverse power-law dependence on the number of microscopic states in a coarse-grained state. The dissipation scaling law requires self-similarity of the underlying network, and the scaling exponent depends on the network structure and the flux correlation. Implications of this inverse dissipation scaling law for active flow systems such as microtubule-kinesin mixture are discussed.

preprint2016arXiv

A framework towards understanding mesoscopic phenomena: Emergent unpredictability, symmetry breaking and dynamics across scales

By integrating 4 lines of thoughts: symmetry breaking originally advanced by Anderson, bifurcation from nonlinear dynamics, Landau's theory of phase transition, and the mechanism of emergent rare events studied by Kramers, we introduce a possible framework for understanding mesoscopic dynamics that links (i) fast lower level microscopic motions, (ii) movements within each basin at the mid-level, and (iii) higher-level rare transitions between neighboring basins, which have rates that decrease exponentially with the size of the system. In this mesoscopic framework, multiple attractors arise as emergent properties of the nonlinear systems. The interplay between the stochasticity and nonlinearity leads to successive jump-like transitions among different basins. We argue each transition is a dynamic symmetry breaking, with the potential of exhibiting Thom-Zeeman catastrophe as well as phase transition with the breakdown of ergodicity (e.g., cell differentiation). The slow-time dynamics of the nonlinear mesoscopic system is not deterministic, rather it is a discrete stochastic jump process. The existence of these discrete states and the Markov transitions among them are both emergent phenomena. This emergent stochastic jump dynamics then serves as the stochastic element for the nonlinear dynamics of a higher level aggregates on an even larger spatial and slower time scales (e.g., evolution). This description captures the hierarchical structure outlined by Anderson and illustrates two distinct types of limit of a mesoscopic dynamics: A long-time ensemble thermodynamics in terms of time $t$ tending infinity followed by the size of the system $N$ tending infinity, and a short-time trajectory steady state with $N$ tending infinity followed by $t$ tending infinity. With these limits, symmetry breaking and cusp catastrophe are two perspectives of the same mesoscopic system on different time scales.

preprint2015arXiv

The free energy cost of accurate biochemical oscillations

Oscillation is an important cellular process that regulates timing of different vital life cycles. However, in the noisy cellular environment, oscillations can be highly inaccurate due to phase fluctuations. It remains poorly understood how biochemical circuits suppress phase fluctuations and what is the incurred thermodynamic cost. Here, we study four different types of biochemical oscillations representing three basic oscillation motifs shared by all known oscillatory systems. We find that the phase diffusion constant follows the same inverse dependence on the free energy dissipation per period for all systems studied. This relationship between the phase diffusion and energy dissipation is shown analytically in a model of noisy oscillation. Microscopically, we find that the oscillation is driven by multiple irreversible cycles that hydrolyze the fuel molecules such as ATP; the number of phase coherent periods is proportional to the free energy consumed per period. Experimental evidence in support of this universal relationship and testable predictions are also presented.

preprint2015arXiv

The free energy cost of reducing noise while maintaining a high sensitivity

Living systems need to be highly responsive, and also to keep fluctuations low. These goals are incompatible in equilibrium systems due to the Fluctuation Dissipation Theorem (FDT). Here, we show that biological sensory systems, driven far from equilibrium by free energy consumption, can reduce their intrinsic fluctuations while maintaining high responsiveness. By developing a continuum theory of the E. coli chemotaxis pathway, we demonstrate that adaptation can be understood as a non-equilibrium phase transition controlled by free energy dissipation, and it is characterized by a breaking of the FDT. We show that the maximum response at short time is enhanced by free energy dissipation. At the same time, the low frequency fluctuations and the adaptation error decrease with the free energy dissipation algebraically and exponentially, respectively.

preprint2011arXiv

Noise Filtering Strategies of Adaptive Signaling Networks: The Case of E. Coli Chemotaxis

Two distinct mechanisms for filtering noise in an input signal are identified in a class of adaptive sensory networks. We find that the high frequency noise is filtered by the output degradation process through time-averaging; while the low frequency noise is damped by adaptation through negative feedback. Both filtering processes themselves introduce intrinsic noises, which are found to be unfiltered and can thus amount to a significant internal noise floor even without signaling. These results are applied to E. coli chemotaxis. We show unambiguously that the molecular mechanism for the Berg-Purcell time-averaging scheme is the dephosphorylation of the response regulator CheY-P, not the receptor adaptation process as previously suggested. The high frequency noise due to the stochastic ligand binding-unbinding events and the random ligand molecule diffusion is averaged by the CheY-P dephosphorylation process to a negligible level in E.coli. We identify a previously unstudied noise source caused by the random motion of the cell in a ligand gradient. We show that this random walk induced signal noise has a divergent low frequency component, which is only rendered finite by the receptor adaptation process. For gradients within the E. coli sensing range, this dominant external noise can be comparable to the significant intrinsic noise in the system. The dependence of the response and its fluctuations on the key time scales of the system are studied systematically. We show that the chemotaxis pathway may have evolved to optimize gradient sensing, strong response, and noise control in different time scales

preprint2010arXiv

The Effects of Stator Compliance, Backs Steps, Temperature, and Clockwise Rotation on the Torque-Speed Curve of Bacterial Flagellar Motor

Rotation of a single bacterial flagellar motor is powered by multiple stators tethered to the cell wall. In a "power-stroke" model the observed independence of the speed at low load on the number of stators is explained by a torque-dependent stepping mechanism independent of the strength of the stator tethering spring. On the other hand, in models that depend solely on the stator spring to explain the observed behavior, exceedingly small stator spring constants are required. To study the dynamics of the motor driven by external forces (such as those exerted by an optical tweezer), back-stepping is introduced when stators are driven far out of equilibrium. Our model with back-stepping reproduces the observed absence of a barrier to backward rotation, as well the behaviors in the high-speed negative-torque regime. Recently measured temperature dependence of the motor speed near zero load (Yuan & Berg 2010 Biophys J) is explained quantitatively by the thermally activated stepping rates in our model. Finally, we suggest that the general mechanical properties of all molecular motors (linear and rotary), characterized by their force(torque)-speed curve, can be determined by their power-stroke potentials and the dependence of the stepping rates on the mechanical state of the motor (force or speed). The torque-speed curve for the clockwise rotating flagellar motor has been observed for the first time recently (Yuan et al. 2010 PNAS). Its quasi-linear behavior is quantitatively reproduced by our model. In particular, we show that concave and convex shapes of the torque-speed curve can be achieved by changing the interaction potential from linear to quadratic form. We also show that reversing the stepping rate dependence on force (torque) can lead to non-monotonicity in the speed-load dependency.

preprint2004arXiv

Moving and staying together without a leader

A microscopic, stochastic, minimal model for collective and cohesive motion of identical self-propelled particles is introduced. Even though the particles interact strictly locally in a very noisy manner, we show that cohesion can be maintained, even in the zero-density limit of an arbitrarily large flock in an infinite space. The phase diagram spanned by the two main parameters of our model, which encode the tendencies for particles to align and to stay together, contains non-moving "gas", "liquid"' and "solid" phases separated from their moving counterparts by the onset of collective motion. The "gas/liquid" and "liquid/solid" are shown to be first-order phase transitions in all cases. In the cohesive phases, we study also the diffusive properties of individuals and their relation to the macroscopic motion and to the shape of the flock.

preprint1997arXiv

Worm Structure in Modified Swift-Hohenberg Equation for Electroconvection

A theoretical model for studying pattern formation in electroconvection is proposed in the form of a modified Swift-Hohenberg equation. A localized state is found in two dimension, in agreement with the experimentally observed ``worm" state. The corresponding one dimensional model is also studied, and a novel stationary localized state due to nonadiabatic effect is found. The existence of the 1D localized state is shown to be responsible for the formation of the two dimensional ``worm" state in our model.

preprint1996arXiv

Phase Structure of Systems with Multiplicative Noise

The phase diagrams and transitions of nonequilibrium systems with multiplicative noise are studied theoretically. We show the existence of both strong and weak-coupling critical behavior, of two distinct active phases, and of a nonzero range of parameter values over which the susceptibility is infinite in any dimension. A scaling theory of the strong-coupling transition is constructed.

preprint1996arXiv

Systems with Multiplicative Noise: Critical Behavior from KPZ Equation and Numerics

We show that certain critical exponents of systems with multiplicative noise can be obtained from exponents of the KPZ equation. Numerical simulations in 1d confirm this prediction, and yield other exponents of the multiplicative noise problem. The numerics also verify an earlier prediction of the divergence of the susceptibility over an entire range of control parameter values, and show that the exponent governing the divergence in this range varies continuously with control parameter.

preprint1995arXiv

How birds fly together: Long-range order in a two-dimensional dynamical XY model

We propose a non-equilibrium continuum dynamical model for the collective motion of large groups of biological organisms (e.g., flocks of birds, slime molds, etc.) Our model becomes highly non-trivial, and different from the equilibrium model, for $d<d_c=4$; nonetheless, we are able to determine its scaling exponents {\it exactly} in $d=2$, and show that, unlike equilibrium systems, our model exhibits a broken continuous symmetry even in $d=2$. Our model describes a large universality class of microscopic rules, including those recently simulated by Viscek et. al.

Yuhai Tu

What is connected

Connect this record

See the researcher in context

Building this map preview

20 published item(s)

Transient learning dynamics drive escape from sharp valleys in Stochastic Gradient Descent

Free energy dissipation enhances spatial accuracy and robustness of Turing pattern in small reaction-diffusion systems

Modeling bacterial flagellar motor with new structure information: Rotational dynamics of two interacting protein nano-rings

State-space renormalization group theory of nonequilibrium reaction networks: Exact solutions for hypercubic lattices in arbitrary dimensions

The energy cost for flocking of active spins: the cusped dissipation maximum at the flocking transition

Phases of learning dynamics in artificial neural networks: with or without mislabeled data

Deciphering gene regulation from gene expression dynamics using deep neural network

How neural networks find generalizable solutions: Self-tuned annealing in deep learning

Nonequilibrium thermodynamics of coupled molecular oscillators: The energy cost and optimal design for synchronization

Scaling of Energy Dissipation in Nonequilibrium Reaction Networks

A framework towards understanding mesoscopic phenomena: Emergent unpredictability, symmetry breaking and dynamics across scales

The free energy cost of accurate biochemical oscillations

The free energy cost of reducing noise while maintaining a high sensitivity

Noise Filtering Strategies of Adaptive Signaling Networks: The Case of E. Coli Chemotaxis

The Effects of Stator Compliance, Backs Steps, Temperature, and Clockwise Rotation on the Torque-Speed Curve of Bacterial Flagellar Motor

Moving and staying together without a leader

Worm Structure in Modified Swift-Hohenberg Equation for Electroconvection

Phase Structure of Systems with Multiplicative Noise

Systems with Multiplicative Noise: Critical Behavior from KPZ Equation and Numerics

How birds fly together: Long-range order in a two-dimensional dynamical XY model