Researcher profile

Michael R. DeWeese

Michael R. DeWeese contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
9works
0followers
7topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

9 published item(s)

preprint2026arXiv

A Theory of Saddle Escape in Deep Nonlinear Networks

In deep networks with small initialization, training exhibits long plateaus separated by sharp feature-acquisition transitions. Whereas shallow nonlinear networks and deep linear networks are well studied, extending these analyses to deep nonlinear networks remains challenging. We derive an exact identity for the imbalance of Frobenius norms of layer weight matrices that holds for any smooth activation and any differentiable loss and use this to classify activation functions into four universality classes. On the permutation-symmetric submanifold, the identity combines with an approximate balance law to reduce the full matrix flow to a scalar ODE, giving a critical-depth escape time law $τ_\star = Θ(\varepsilon^{-(r-2)})$ governed by the number $r$ of layers at the bottleneck scale rather than the total depth $L$. We find that this same $r-2$ exponent is recovered under He-normal initialization with $r$ bottleneck layers rescaled by $\varepsilon$, where the symmetry manifold is preserved by the flow but not attracting. We find close agreement between our theory and numerical simulations.

preprint2026arXiv

Higher-order response theory in optimal stochastic thermodynamics

Linear response theory has found many applications in statistical physics. One of these is to compute minimal-work protocols that drive nonequilibrium systems between different thermodynamic states, which are useful for designing engineered nanoscale systems and understanding biomolecular machines. We compare and explore the relationships between linear-response-based approximations used to study optimal protocols in different driving regimes by showing that they arise as controlled truncations of a general causal response (Volterra) expansion. We then construct higher-order response terms and discuss the drawbacks and utility of their inclusion. We illustrate our results for an overdamped particle in a harmonic trap, ultimately showing that the inclusion of higher-order response in calculating optimal protocols provides marginal improvement in effectiveness despite incurring a significant computational expense, while introducing the possibility of predicting arbitrarily low and unphysical negative excess work.

preprint2026arXiv

The Thermodynamic Costs of Simple Linear Regression

The construction of models from data is a significant contributor to the energetic costs of computation. Because of this, understanding how foundational thermodynamic bounds apply to modeling algorithms will be increasingly important. Here, we study the thermodynamic costs of a basic and fundamental modeling algorithm: simple linear regression. Following Landauer, we approximate the thermodynamic lower bound on irreversibly performing both exact linear regression and linear regression via stochastic gradient descent as implemented on floating-point numbers. From this, we derive energycost aware scaling laws for the optimal dataset size for training a linear regression model given a generalization error dependent demand for inference. Additionally, we discuss a method to lower bound the entropy production from the mismatch cost for algorithms with continuous input variables.

preprint2022arXiv

Engineered swift equilibration for arbitrary geometries

Engineered swift equilibration (ESE) is a class of driving protocols that enforce an equilibrium distribution with respect to external control parameters at the beginning and end of rapid state transformations of open, classical non-equilibrium systems. ESE protocols have previously been derived and experimentally realized for Brownian particles in simple, one-dimensional, time-varying trapping potentials; one recent study considered ESE in two-dimensional Euclidean configuration space. Here we extend the ESE framework to generic, overdamped Brownian systems in arbitrary curved configuration space and illustrate our results with specific examples not amenable to previous techniques. Our approach may be used to impose the necessary dynamics to control the full temporal configurational distribution in a wide variety of experimentally realizable settings.

preprint2022arXiv

Reverse Engineering the Neural Tangent Kernel

The development of methods to guide the design of neural networks is an important open challenge for deep learning theory. As a paradigm for principled neural architecture design, we propose the translation of high-performing kernels, which are better-understood and amenable to first-principles design, into equivalent network architectures, which have superior efficiency, flexibility, and feature learning. To this end, we constructively prove that, with just an appropriate choice of activation function, any positive-semidefinite dot-product kernel can be realized as either the NNGP or neural tangent kernel of a fully-connected neural network with only one hidden layer. We verify our construction numerically and demonstrate its utility as a design tool for finite fully-connected networks in several experiments.

preprint2022arXiv

Solution to the Fokker-Planck equation for slowly driven Brownian motion: Emergent geometry and a formula for the corresponding thermodynamic metric

Considerable progress has recently been made with geometrical approaches to understanding and controlling small out-of-equilibrium systems, but a mathematically rigorous foundation for these methods has been lacking. Towards this end, we develop a perturbative solution to the Fokker-Planck equation for one-dimensional driven Brownian motion in the overdamped limit enabled by the spectral properties of the corresponding single-particle Schrödinger operator. The perturbation theory is in powers of the inverse characteristic timescale of variation of the fastest varying control parameter, measured in units of the system timescale, which is set by the smallest eigenvalue of the corresponding Schrödinger operator. It applies to any Brownian system for which the Schrödinger operator has a confining potential. We use the theory to rigorously derive an exact formula for a Riemannian "thermodynamic" metric in the space of control parameters of the system. We show that up to second-order terms in the perturbation theory, optimal dissipation-minimizing driving protocols minimize the length defined by this metric. We also show that a previously proposed metric is calculable from our exact formula with corrections that are exponentially suppressed in a characteristic length scale. We illustrate our formula using the two-dimensional example of a harmonic oscillator with time-dependent spring constant in a time-dependent electric field. Lastly, we demonstrate that the Riemannian geometric structure of the optimal control problem is emergent; it derives from the form of the perturbative expansion for the probability density and persists to all orders of the expansion.

preprint2020arXiv

A new method for parameter estimation in probabilistic models: Minimum probability flow

Fitting probabilistic models to data is often difficult, due to the general intractability of the partition function. We propose a new parameter fitting method, Minimum Probability Flow (MPF), which is applicable to any parametric model. We demonstrate parameter estimation using MPF in two cases: a continuous state space model, and an Ising spin glass. In the latter case it outperforms current techniques by at least an order of magnitude in convergence time with lower error in the recovered coupling parameters.

preprint2020arXiv

Critical Point-Finding Methods Reveal Gradient-Flat Regions of Deep Network Losses

Despite the fact that the loss functions of deep neural networks are highly non-convex, gradient-based optimization algorithms converge to approximately the same performance from many random initial points. One thread of work has focused on explaining this phenomenon by characterizing the local curvature near critical points of the loss function, where the gradients are near zero, and demonstrating that neural network losses enjoy a no-bad-local-minima property and an abundance of saddle points. We report here that the methods used to find these putative critical points suffer from a bad local minima problem of their own: they often converge to or pass through regions where the gradient norm has a stationary point. We call these gradient-flat regions, since they arise when the gradient is approximately in the kernel of the Hessian, such that the loss is locally approximately linear, or flat, in the direction of the gradient. We describe how the presence of these regions necessitates care in both interpreting past results that claimed to find critical points of neural network losses and in designing second-order methods for optimizing neural networks.

preprint2019arXiv

Design of optical neural networks with component imprecisions

For the benefit of designing scalable, fault resistant optical neural networks (ONNs), we investigate the effects architectural designs have on the ONNs' robustness to imprecise components. We train two ONNs -- one with a more tunable design (GridNet) and one with better fault tolerance (FFTNet) -- to classify handwritten digits. When simulated without any imperfections, GridNet yields a better accuracy (~98%) than FFTNet (~95%). However, under a small amount of error in their photonic components, the more fault tolerant FFTNet overtakes GridNet. We further provide thorough quantitative and qualitative analyses of ONNs' sensitivity to varying levels and types of imprecisions. Our results offer guidelines for the principled design of fault-tolerant ONNs as well as a foundation for further research.