Source author record

Michael R. DeWeese

Michael R. DeWeese appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

cond-mat.stat-mech Machine Learning Neural and Evolutionary Computing Neurons and Cognition Biological Physics Computation cond-mat.dis-nn cond-mat.mes-hall Emerging Technologies Information Theory math.PR nlin.CD physics.data-an physics.optics Populations and Evolution

Catalog footprint

What is connected

21works

15topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

A Theory of Saddle Escape in Deep Nonlinear Networks

In deep networks with small initialization, training exhibits long plateaus separated by sharp feature-acquisition transitions. Whereas shallow nonlinear networks and deep linear networks are well studied, extending these analyses to deep nonlinear networks remains challenging. We derive an exact identity for the imbalance of Frobenius norms of layer weight matrices that holds for any smooth activation and any differentiable loss and use this to classify activation functions into four universality classes. On the permutation-symmetric submanifold, the identity combines with an approximate balance law to reduce the full matrix flow to a scalar ODE, giving a critical-depth escape time law $τ_\star = Θ(\varepsilon^{-(r-2)})$ governed by the number $r$ of layers at the bottleneck scale rather than the total depth $L$. We find that this same $r-2$ exponent is recovered under He-normal initialization with $r$ bottleneck layers rescaled by $\varepsilon$, where the symmetry manifold is preserved by the flow but not attracting. We find close agreement between our theory and numerical simulations.

preprint2026arXiv

Higher-order response theory in optimal stochastic thermodynamics

Linear response theory has found many applications in statistical physics. One of these is to compute minimal-work protocols that drive nonequilibrium systems between different thermodynamic states, which are useful for designing engineered nanoscale systems and understanding biomolecular machines. We compare and explore the relationships between linear-response-based approximations used to study optimal protocols in different driving regimes by showing that they arise as controlled truncations of a general causal response (Volterra) expansion. We then construct higher-order response terms and discuss the drawbacks and utility of their inclusion. We illustrate our results for an overdamped particle in a harmonic trap, ultimately showing that the inclusion of higher-order response in calculating optimal protocols provides marginal improvement in effectiveness despite incurring a significant computational expense, while introducing the possibility of predicting arbitrarily low and unphysical negative excess work.

preprint2026arXiv

The Thermodynamic Costs of Simple Linear Regression

The construction of models from data is a significant contributor to the energetic costs of computation. Because of this, understanding how foundational thermodynamic bounds apply to modeling algorithms will be increasingly important. Here, we study the thermodynamic costs of a basic and fundamental modeling algorithm: simple linear regression. Following Landauer, we approximate the thermodynamic lower bound on irreversibly performing both exact linear regression and linear regression via stochastic gradient descent as implemented on floating-point numbers. From this, we derive energycost aware scaling laws for the optimal dataset size for training a linear regression model given a generalization error dependent demand for inference. Additionally, we discuss a method to lower bound the entropy production from the mismatch cost for algorithms with continuous input variables.

preprint2022arXiv

Engineered swift equilibration for arbitrary geometries

Engineered swift equilibration (ESE) is a class of driving protocols that enforce an equilibrium distribution with respect to external control parameters at the beginning and end of rapid state transformations of open, classical non-equilibrium systems. ESE protocols have previously been derived and experimentally realized for Brownian particles in simple, one-dimensional, time-varying trapping potentials; one recent study considered ESE in two-dimensional Euclidean configuration space. Here we extend the ESE framework to generic, overdamped Brownian systems in arbitrary curved configuration space and illustrate our results with specific examples not amenable to previous techniques. Our approach may be used to impose the necessary dynamics to control the full temporal configurational distribution in a wide variety of experimentally realizable settings.

preprint2022arXiv

Reverse Engineering the Neural Tangent Kernel

The development of methods to guide the design of neural networks is an important open challenge for deep learning theory. As a paradigm for principled neural architecture design, we propose the translation of high-performing kernels, which are better-understood and amenable to first-principles design, into equivalent network architectures, which have superior efficiency, flexibility, and feature learning. To this end, we constructively prove that, with just an appropriate choice of activation function, any positive-semidefinite dot-product kernel can be realized as either the NNGP or neural tangent kernel of a fully-connected neural network with only one hidden layer. We verify our construction numerically and demonstrate its utility as a design tool for finite fully-connected networks in several experiments.

preprint2022arXiv

Solution to the Fokker-Planck equation for slowly driven Brownian motion: Emergent geometry and a formula for the corresponding thermodynamic metric

Considerable progress has recently been made with geometrical approaches to understanding and controlling small out-of-equilibrium systems, but a mathematically rigorous foundation for these methods has been lacking. Towards this end, we develop a perturbative solution to the Fokker-Planck equation for one-dimensional driven Brownian motion in the overdamped limit enabled by the spectral properties of the corresponding single-particle Schrödinger operator. The perturbation theory is in powers of the inverse characteristic timescale of variation of the fastest varying control parameter, measured in units of the system timescale, which is set by the smallest eigenvalue of the corresponding Schrödinger operator. It applies to any Brownian system for which the Schrödinger operator has a confining potential. We use the theory to rigorously derive an exact formula for a Riemannian "thermodynamic" metric in the space of control parameters of the system. We show that up to second-order terms in the perturbation theory, optimal dissipation-minimizing driving protocols minimize the length defined by this metric. We also show that a previously proposed metric is calculable from our exact formula with corrections that are exponentially suppressed in a characteristic length scale. We illustrate our formula using the two-dimensional example of a harmonic oscillator with time-dependent spring constant in a time-dependent electric field. Lastly, we demonstrate that the Riemannian geometric structure of the optimal control problem is emergent; it derives from the form of the perturbative expansion for the probability density and persists to all orders of the expansion.

preprint2020arXiv

A new method for parameter estimation in probabilistic models: Minimum probability flow

Fitting probabilistic models to data is often difficult, due to the general intractability of the partition function. We propose a new parameter fitting method, Minimum Probability Flow (MPF), which is applicable to any parametric model. We demonstrate parameter estimation using MPF in two cases: a continuous state space model, and an Ising spin glass. In the latter case it outperforms current techniques by at least an order of magnitude in convergence time with lower error in the recovered coupling parameters.

preprint2020arXiv

Critical Point-Finding Methods Reveal Gradient-Flat Regions of Deep Network Losses

Despite the fact that the loss functions of deep neural networks are highly non-convex, gradient-based optimization algorithms converge to approximately the same performance from many random initial points. One thread of work has focused on explaining this phenomenon by characterizing the local curvature near critical points of the loss function, where the gradients are near zero, and demonstrating that neural network losses enjoy a no-bad-local-minima property and an abundance of saddle points. We report here that the methods used to find these putative critical points suffer from a bad local minima problem of their own: they often converge to or pass through regions where the gradient norm has a stationary point. We call these gradient-flat regions, since they arise when the gradient is approximately in the kernel of the Hessian, such that the loss is locally approximately linear, or flat, in the direction of the gradient. We describe how the presence of these regions necessitates care in both interpreting past results that claimed to find critical points of neural network losses and in designing second-order methods for optimizing neural networks.

preprint2019arXiv

Design of optical neural networks with component imprecisions

For the benefit of designing scalable, fault resistant optical neural networks (ONNs), we investigate the effects architectural designs have on the ONNs' robustness to imprecise components. We train two ONNs -- one with a more tunable design (GridNet) and one with better fault tolerance (FFTNet) -- to classify handwritten digits. When simulated without any imperfections, GridNet yields a better accuracy (~98%) than FFTNet (~95%). However, under a small amount of error in their photonic components, the more fault tolerant FFTNet overtakes GridNet. We further provide thorough quantitative and qualitative analyses of ONNs' sensitivity to varying levels and types of imprecisions. Our results offer guidelines for the principled design of fault-tolerant ONNs as well as a foundation for further research.

preprint2016arXiv

Hamiltonian Monte Carlo Without Detailed Balance

We present a method for performing Hamiltonian Monte Carlo that largely eliminates sample rejection for typical hyperparameters. In situations that would normally lead to rejection, instead a longer trajectory is computed until a new state is reached that can be accepted. This is achieved using Markov chain transitions that satisfy the fixed point equation, but do not satisfy detailed balance. The resulting algorithm significantly suppresses the random walk behavior and wasted function evaluations that are typically the consequence of update rejection. We demonstrate a greater than factor of two improvement in mixing time on three test problems. We release the source code as Python and MATLAB packages.

preprint2016arXiv

Nonequilibrium work energy relation for non-Hamiltonian dynamics

Recent years have witnessed major advances in our understanding of nonequilibrium processes. The Jarzynski equality, for example, provides a link between equilibrium free energy differences and finite-time, nonequilibrium dynamics. We propose a generalization of this relation to non-Hamiltonian dynamics, relevant for active matter systems, continuous feedback, and computer simulation. Surprisingly, this relation allows us to calculate the free energy difference between the desired initial and final equilibrium states using arbitrary dynamics. As a practical matter, this dissociation between the dynamics and the initial and final states promises to facilitate a range of techniques for free energy estimation in a single, universal expression.

preprint2015arXiv

A Markov Jump Process for More Efficient Hamiltonian Monte Carlo

In most sampling algorithms, including Hamiltonian Monte Carlo, transition rates between states correspond to the probability of making a transition in a single time step, and are constrained to be less than or equal to 1. We derive a Hamiltonian Monte Carlo algorithm using a continuous time Markov jump process, and are thus able to escape this constraint. Transition rates in a Markov jump process need only be non-negative. We demonstrate that the new algorithm leads to improved mixing for several example problems, both by evaluating the spectral gap of the Markov operator, and by computing autocorrelation as a function of compute time. We release the algorithm as an open source Python package.

preprint2015arXiv

Optimal Control of Overdamped Systems

Nonequilibrium physics encompasses a broad range of natural and synthetic small-scale systems. Optimizing transitions of such systems will be crucial for the development of nanoscale technologies and may reveal the physical principles underlying biological processes at the molecular level. Recent work has demonstrated that when a thermodynamic system is driven away from equilibrium then the space of controllable parameters has a Riemannian geometry induced by a generalized inverse diffusion tensor. We derive a simple, compact expression for the inverse diffusion tensor that depends solely on equilibrium information for a broad class of potentials. We use this formula to compute the minimal dissipation for two model systems relevant to small-scale information processing and biological molecular motors. In the first model, we optimally erase a single classical bit of information modelled by an overdamped particle in a smooth double-well potential. In the second model, we find the minimal dissipation of a simple molecular motor model coupled to an optical trap. In both models, we find that the minimal dissipation for the optimal protocol is inversely proportional to protocol duration, as expected, though the dissipation for the erasure model takes a different form than what we found previously for a similar system.

preprint2015arXiv

Optimal protocols for slowly-driven quantum processes

The design of efficient quantum information processing will rely on optimal nonequilibrium transitions of driven quantum systems. Building on a recently-developed geometric framework for computing optimal protocols for classical systems driven in finite-time, we construct a general framework for optimizing the average information entropy for driven quantum systems. Geodesics on the parameter manifold endowed with a positive semi-definite metric correspond to protocols that minimize the average information entropy production in finite-time. We use this framework to explicitly compute the optimal entropy production for a simple two-state quantum system coupled to a heat bath of bosonic oscillators, which has applications to quantum annealing.

preprint2015arXiv

Time Resolution Dependence of Information Measures for Spiking Neurons: Atoms, Scaling, and Universality

The mutual information between stimulus and spike-train response is commonly used to monitor neural coding efficiency, but neuronal computation broadly conceived requires more refined and targeted information measures of input-output joint processes. A first step towards that larger goal is to develop information measures for individual output processes, including information generation (entropy rate), stored information (statistical complexity), predictable information (excess entropy), and active information accumulation (bound information rate). We calculate these for spike trains generated by a variety of noise-driven integrate-and-fire neurons as a function of time resolution and for alternating renewal processes. We show that their time-resolution dependence reveals coarse-grained structural properties of interspike interval statistics; e.g., $τ$-entropy rates that diverge less quickly than the firing rate indicate interspike interval correlations. We also find evidence that the excess entropy and regularized statistical complexity of different types of integrate-and-fire neurons are universal in the continuous-time limit in the sense that they do not depend on mechanism details. This suggests a surprising simplicity in the spike trains generated by these model neurons. Interestingly, neurons with gamma-distributed ISIs and neurons whose spike trains are alternating renewal processes do not fall into the same universality class. These results lead to two conclusions. First, the dependence of information measures on time resolution reveals mechanistic details about spike train generation. Second, information measures can be used as model selection tools for analyzing spike train processes.

preprint2013arXiv

Optimal control of transitions between nonequilibrium steady states

Biological systems fundamentally exist out of equilibrium in order to preserve organized structures and processes. Many changing cellular conditions can be represented as transitions between nonequilibrium steady states, and organisms have an interest in optimizing such transitions. Using the Hatano-Sasa Y-value, we extend a recently developed geometrical framework for determining optimal protocols so that it can be applied to systems driven from nonequilibrium steady states. We calculate and numerically verify optimal protocols for a colloidal particle dragged through solution by a translating optical trap with two controllable parameters. We offer experimental predictions, specifically that optimal protocols are significantly less costly than naive ones. Optimal protocols similar to these may ultimately point to design principles for biological energy transduction systems and guide the design of artificial molecular machines.

preprint2013arXiv

Optimal finite-time erasure of a classical bit

Information erasure inevitably leads to heat dissipation. Minimizing this dissipation will be crucial for developing small-scale information processing systems, but little is known about the optimal procedures required. We have obtained closed-form expressions for maximally efficient erasure cycles for deletion of a classical bit of information stored by the position of a particle diffusing in a double-well potential. We find that the extra dissipation beyond the Landauer bound is proportional to the square of the Hellinger distance between the initial and final states divided by the cycle duration, which quantifies how far out of equilibrium the system is driven. Finally, we demonstrate close agreement between the exact optimal cycle and the protocol found using a linear response framework.

preprint2012arXiv

Sparse Codes for Speech Predict Spectrotemporal Receptive Fields in the Inferior Colliculus

We have developed a sparse mathematical representation of speech that minimizes the number of active model neurons needed to represent typical speech sounds. The model learns several well-known acoustic features of speech such as harmonic stacks, formants, onsets and terminations, but we also find more exotic structures in the spectrogram representation of sound such as localized checkerboard patterns and frequency-modulated excitatory subregions flanked by suppressive sidebands. Moreover, several of these novel features resemble neuronal receptive fields reported in the Inferior Colliculus (IC), as well as auditory thalamus and cortex, and our model neurons exhibit the same tradeoff in spectrotemporal resolution as has been observed in IC. To our knowledge, this is the first demonstration that receptive fields of neurons in the ascending mammalian auditory pathway beyond the auditory nerve can be predicted based on coding principles and the statistical properties of recorded sounds.

preprint2012arXiv

The geometry of thermodynamic control

A deeper understanding of nonequilibrium phenomena is needed to reveal the principles governing natural and synthetic molecular machines. Recent work has shown that when a thermodynamic system is driven from equilibrium then, in the linear response regime, the space of controllable parameters has a Riemannian geometry induced by a generalized friction tensor. We exploit this geometric insight to construct closed-form expressions for minimal-dissipation protocols for a particle diffusing in a one dimensional harmonic potential, where the spring constant, inverse temperature, and trap location are adjusted simultaneously. These optimal protocols are geodesics on the Riemannian manifold, and reveal that this simple model has a surprisingly rich geometry. We test these optimal protocols via a numerical implementation of the Fokker-Planck equation and demonstrate that the friction tensor arises naturally from a first order expansion in temporal derivatives of the control parameters, without appealing directly to linear response theory.

preprint2011arXiv

How shoud prey animals respond to uncertain threats?

A prey animal surveying its environment must decide whether there is a dangerous predator present or not. If there is, it may flee. Flight has an associated cost, so the animal should not flee if there is no danger. However, the prey animal cannot know the state of its environment with certainty, and is thus bound to make some errors. We formulate a probabilistic automaton model of a prey animal's life and use it to compute the optimal escape decision strategy, subject to the animal's uncertainty. The uncertainty is a major factor in determining the decision strategy: only in the presence of uncertainty do economic factors (like mating opportunities lost due to flight) influence the decision. We performed computer simulations and found that \emph{in silico} populations of animals subject to predation evolve to display the strategies predicted by our model, confirming our choice of objective function for our analytic calculations. To the best of our knowledge, this is the first theoretical study of escape decisions to incorporate the effects of uncertainty, and to demonstrate the correctness of the objective function used in the model.

preprint2011arXiv

Minimum Probability Flow Learning

Fitting probabilistic models to data is often difficult, due to the general intractability of the partition function and its derivatives. Here we propose a new parameter estimation technique that does not require computing an intractable normalization factor or sampling from the equilibrium distribution of the model. This is achieved by establishing dynamics that would transform the observed data distribution into the model distribution, and then setting as the objective the minimization of the KL divergence between the data distribution and the distribution produced by running the dynamics for an infinitesimal time. Score matching, minimum velocity learning, and certain forms of contrastive divergence are shown to be special cases of this learning technique. We demonstrate parameter estimation in Ising models, deep belief networks and an independent component analysis model of natural scenes. In the Ising model case, current state of the art techniques are outperformed by at least an order of magnitude in learning time, with lower error in recovered coupling parameters.

Michael R. DeWeese

What is connected

Connect this record

See the researcher in context

Building this map preview

21 published item(s)

A Theory of Saddle Escape in Deep Nonlinear Networks

Higher-order response theory in optimal stochastic thermodynamics

The Thermodynamic Costs of Simple Linear Regression

Engineered swift equilibration for arbitrary geometries

Reverse Engineering the Neural Tangent Kernel

Solution to the Fokker-Planck equation for slowly driven Brownian motion: Emergent geometry and a formula for the corresponding thermodynamic metric

A new method for parameter estimation in probabilistic models: Minimum probability flow

Critical Point-Finding Methods Reveal Gradient-Flat Regions of Deep Network Losses

Design of optical neural networks with component imprecisions

Hamiltonian Monte Carlo Without Detailed Balance

Nonequilibrium work energy relation for non-Hamiltonian dynamics

A Markov Jump Process for More Efficient Hamiltonian Monte Carlo

Optimal Control of Overdamped Systems

Optimal protocols for slowly-driven quantum processes

Time Resolution Dependence of Information Measures for Spiking Neurons: Atoms, Scaling, and Universality

Optimal control of transitions between nonequilibrium steady states

Optimal finite-time erasure of a classical bit

Sparse Codes for Speech Predict Spectrotemporal Receptive Fields in the Inferior Colliculus

The geometry of thermodynamic control

How shoud prey animals respond to uncertain threats?

Minimum Probability Flow Learning