Researcher profile

Molei Tao

Molei Tao contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
9works
0followers
12topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

9 published item(s)

preprint2026arXiv

NoiseRater: Meta-Learned Noise Valuation for Diffusion Model Training

Diffusion models have achieved remarkable success across a wide range of generative tasks, yet their training paradigm largely treats injected noise as uniformly informative. In this work, we challenge this assumption and introduce NoiseRater, a meta-learning framework for instance-level noise valuation in diffusion model training. We propose a parametric noise rater that assigns importance scores to individual noise realizations conditioned on data and timestep, enabling adaptive reweighting of the training objective. The rater is trained via bilevel optimization to improve downstream validation performance after inner-loop diffusion updates. To enable efficient deployment, we further design a decoupled two-stage pipeline that transitions from soft weighting during meta-training to hard noise selection during standard training. Extensive experiments on FFHQ and ImageNet demonstrate that not all noise samples contribute equally, and that prioritizing informative noise improves both training efficiency and generation quality. Our results establish noise valuation as a complementary and previously underexplored axis for improving diffusion model training. Our code is available at: https://anonymous.4open.science/r/NoiseRater-DEB116.

preprint2024arXiv

Automated construction of effective potential via algorithmic implicit bias

We introduce a novel approach for decomposing and learning every scale of a given multiscale objective function in $\mathbb{R}^d$, where $d\ge 1$. This approach leverages a recently demonstrated implicit bias of the optimization method of gradient descent by Kong and Tao, which enables the automatic generation of data that nearly follow Gibbs distribution with an effective potential at any desired scale. One application of this automated effective potential modeling is to construct reduced-order models. For instance, a deterministic surrogate Hamiltonian model can be developed to substantially soften the stiffness that bottlenecks the simulation, while maintaining the accuracy of phase portraits at the scale of interest. Similarly, a stochastic surrogate model can be constructed at a desired scale, such that both its equilibrium and out-of-equilibrium behaviors (characterized by auto-correlation function and mean path) align with those of a damped mechanical system with the original multiscale function being its potential. The robustness and efficiency of our proposed approach in multi-dimensional scenarios have been demonstrated through a series of numerical experiments. A by-product of our development is a method for anisotropic noise estimation and calibration. More precisely, Langevin model of stochastic mechanical systems may not have isotropic noise in practice, and we provide a systematic algorithm to quantify its covariance matrix without directly measuring the noise. In this case, the system may not admit closed form expression of its invariant distribution either, but with this tool, we can design friction matrix appropriately to calibrate the system so that its invariant distribution has a closed form expression of Gibbs.

preprint2022arXiv

Alternating Mirror Descent for Constrained Min-Max Games

In this paper we study two-player bilinear zero-sum games with constrained strategy spaces. An instance of natural occurrences of such constraints is when mixed strategies are used, which correspond to a probability simplex constraint. We propose and analyze the alternating mirror descent algorithm, in which each player takes turns to take action following the mirror descent algorithm for constrained optimization. We interpret alternating mirror descent as an alternating discretization of a skew-gradient flow in the dual space, and use tools from convex optimization and modified energy function to establish an $O(K^{-2/3})$ bound on its average regret after $K$ iterations. This quantitatively verifies the algorithm's better behavior than the simultaneous version of mirror descent algorithm, which is known to diverge and yields an $O(K^{-1/2})$ average regret bound. In the special case of an unconstrained setting, our results recover the behavior of alternating gradient descent algorithm for zero-sum games which was studied in (Bailey et al., COLT 2020).

preprint2022arXiv

Hessian-Free High-Resolution Nesterov Acceleration for Sampling

Nesterov's Accelerated Gradient (NAG) for optimization has better performance than its continuous time limit (noiseless kinetic Langevin) when a finite step-size is employed \citep{shi2021understanding}. This work explores the sampling counterpart of this phenonemon and proposes a diffusion process, whose discretizations can yield accelerated gradient-based MCMC methods. More precisely, we reformulate the optimizer of NAG for strongly convex functions (NAG-SC) as a Hessian-Free High-Resolution ODE, change its high-resolution coefficient to a hyperparameter, inject appropriate noise, and discretize the resulting diffusion process. The acceleration effect of the new hyperparameter is quantified and it is not an artificial one created by time-rescaling. Instead, acceleration beyond underdamped Langevin in $W_2$ distance is quantitatively established for log-strongly-concave-and-smooth targets, at both the continuous dynamics level and the discrete algorithm level. Empirical experiments in both log-strongly-concave and multi-modal cases also numerically demonstrate this acceleration.

preprint2022arXiv

Large Learning Rate Tames Homogeneity: Convergence and Balancing Effect

Recent empirical advances show that training deep models with large learning rate often improves generalization performance. However, theoretical justifications on the benefits of large learning rate are highly limited, due to challenges in analysis. In this paper, we consider using Gradient Descent (GD) with a large learning rate on a homogeneous matrix factorization problem, i.e., $\min_{X, Y} \|A - XY^\top\|_{\sf F}^2$. We prove a convergence theory for constant large learning rates well beyond $2/L$, where $L$ is the largest eigenvalue of Hessian at the initialization. Moreover, we rigorously establish an implicit bias of GD induced by such a large learning rate, termed 'balancing', meaning that magnitudes of $X$ and $Y$ at the limit of GD iterations will be close even if their initialization is significantly unbalanced. Numerical experiments are provided to support our theory.

preprint2022arXiv

Low Spin-Axis Variations of Circumbinary Planets

Having a massive moon has been considered as a primary mechanism for stabilized planetary obliquity, an example of which being our Earth. This is, however, not always consistent with the exoplanetary cases. This article details the discovery of an alternative mechanism, namely that planets orbiting around binary stars tend to have low spin-axis variations. This is because the large quadrupole potential of the stellar binary could speed up the planetary orbital precession, and detune the system out of secular spin-orbit resonances. Consequently, habitable zone planets around the stellar binaries in low inclination orbits hold higher potential for regular seasonal changes comparing to their single star analogues.

preprint2021arXiv

Accurate and Efficient Simulations of Hamiltonian Mechanical Systems with Discontinuous Potentials

This article considers Hamiltonian mechanical systems with potential functions admitting jump discontinuities. The focus is on accurate and efficient numerical approximations of their solutions, which will be defined via the laws of reflection and refraction. Despite of the success of symplectic integrators for smooth mechanical systems, their construction for the discontinuous ones is nontrivial, and numerical convergence order can be impaired too. Several rather-usable numerical methods are proposed, including: a first-order symplectic integrator for general problems, a third-order symplectic integrator for problems with only one linear interface, arbitrarily high-order reversible integrators for general problems (no longer symplectic), and an adaptive time-stepping version of the previous high-order method. Interestingly, whether symplecticity leads to favorable long time performance is no longer clear due to discontinuity, as traditional Hamiltonian backward error analysis does not apply any more. Therefore, at this stage, our recommended default method is the last one. Various numerical evidence, on the order of convergence, long time performance, momentum map conservation, and consistency with the computationally-expensive penalty method, are supplied. A complex problem, namely the Sauteed Mushroom, is also proposed and numerically investigated, for which multiple bifurcations between trapped and ergodic dynamics are observed.

preprint2020arXiv

Variational Optimization on Lie Groups, with Examples of Leading (Generalized) Eigenvalue Problems

The article considers smooth optimization of functions on Lie groups. By generalizing NAG variational principle in vector space (Wibisono et al., 2016) to Lie groups, continuous Lie-NAG dynamics which are guaranteed to converge to local optimum are obtained. They correspond to momentum versions of gradient flow on Lie groups. A particular case of $\mathsf{SO}(n)$ is then studied in details, with objective functions corresponding to leading Generalized EigenValue problems: the Lie-NAG dynamics are first made explicit in coordinates, and then discretized in structure preserving fashions, resulting in optimization algorithms with faithful energy behavior (due to conformal symplecticity) and exactly remaining on the Lie group. Stochastic gradient versions are also investigated. Numerical experiments on both synthetic data and practical problem (LDA for MNIST) demonstrate the effectiveness of the proposed methods as optimization algorithms ($not$ as a classification method).

preprint2019arXiv

Space-Time Phononic Crystals with Anomalous Topological Edge States

It is well known that an interface created by two topologically distinct structures could host nontrivial edge states that are immune to defects. In this letter, we introduce a one-dimensional space-time phononic crystal and study the associated anomalous topological edge states. A space-decoupled time modulation is assumed. While preserving the key topological feature of the system, such a modulation also duplicates the edge state mode across the spectrum, both inside and outside the band gap. It is shown that, in contrast to conventional topological edge states which are excited by frequencies in the Bragg regime, the time-modulation-induced frequency conversion can be leveraged to access topological edge states at a deep subwavelength scale where the entire phononic crystal size is merely 1/5.1 of the wavelength. This remarkable feature could open a new route for designing miniature devices that are based on topological physics.