Source author record

Sinho Chewi

Sinho Chewi appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.ST Machine Learning Statistics Theory math.PR Data Structures and Algorithms math.FA Artificial Intelligence Discrete Mathematics Information Theory math.AP math.IT math.OC

Catalog footprint

What is connected

12works

12topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

A proximal gradient algorithm for composite log-concave sampling

We propose an algorithm to sample from composite log-concave distributions over $\mathbb{R}^d$, i.e., densities of the form $π\propto e^{-f-g}$, assuming access to gradient evaluations of $f$ and a restricted Gaussian oracle (RGO) for $g$. The latter requirement means that we can easily sample from the density $\text{RGO}_{g,h,y}(x) \propto \exp(-g(x) -\frac{1}{2h}||y-x||^2)$, which is the sampling analogue of the proximal operator for $g$. If $f + g$ is $α$-strongly convex and $f$ is $β$-smooth, our sampler achieves $\varepsilon$ error in total variation distance in $\widetilde{\mathcal O}(κ\sqrt d \log^4(1/\varepsilon))$ iterations where $κ:= β/α$, which matches prior state-of-the-art results for the case $g=0$. We further extend our results to cases where (1) $π$ is non-log-concave but satisfies a Poincaré or log-Sobolev inequality, and (2) $f$ is non-smooth but Lipschitz.

preprint2026arXiv

A Rod Flow Model for Adam at the Edge of Stability

Cohen et al. (arXiv:2207.14484) observed that adaptive gradient methods such as Adam operate at the edge of stability. While there has been significant work on continuous-time modeling of gradient descent at the edge of stability, extending these models to momentum methods remains underdeveloped. In the gradient descent setting, Regis et al. (arXiv:2602.01480) introduced rod flow, which models consecutive iterates as an extended one-dimensional object -- a "rod." Here we extend rod flow to Adam by working in the joint phase space of parameters and first moment $(w, m)$ and treating the second moment $ν$ as a smooth auxiliary variable. We also develop rod flows for heavy ball momentum, Nesterov momentum, and scalar and per-component versions of RMSProp, Adam, and NAdam. For all eight optimizers, we empirically evaluate rod flow on representative machine learning architectures, where it tracks the discrete iterates through the edge-of-stability regime significantly more accurately than the corresponding stable flow.

preprint2026arXiv

Complexity of Non-Log-Concave Sampling in Fisher Information

We study the query complexity of obtaining a relative Fisher information guarantee for sampling from a log-smooth non-log-concave distribution; this is a sampling analog of finding an approximate stationary point in optimization. Our algorithm is based on the proximal sampler, which is an implicit discretization of the Langevin diffusion, and requires an implementation of the backward step known as the restricted Gaussian oracle (RGO). We show that by leveraging the recent results for log-concave sampling with high-accuracy guarantees in Rényi divergence, we can obtain an approximate RGO implementation that -- when used with the proximal sampler -- yields a complexity guarantee in relative Fisher information that inherits the same dimension dependence as log-concave sampling, and improves upon prior work for non-log-concave sampling. We also show a converse reduction that any improvement in the dimension dependence in relative Fisher information for non-log-concave sampling will yield an improved dimension dependence for high-accuracy log-concave sampling.

preprint2023arXiv

Shifted Composition II: Shift Harnack Inequalities and Curvature Upper Bounds

We apply the shifted composition rule -- an information-theoretic principle introduced in our earlier work [AC23] -- to establish shift Harnack inequalities for the Langevin diffusion. We obtain sharp constants for these inequalities for the first time, allowing us to investigate their relationship with other properties of the diffusion. Namely, we show that they are equivalent to a sharp "local gradient-entropy" bound, and that they imply curvature upper bounds in a compelling reflection of the Bakry-Emery theory of curvature lower bounds. Finally, we show that the local gradient-entropy inequality implies optimal concentration of the score, a.k.a. the logarithmic gradient of the density.

preprint2022arXiv

An entropic generalization of Caffarelli's contraction theorem via covariance inequalities

The optimal transport map between the standard Gaussian measure and an $α$-strongly log-concave probability measure is $α^{-1/2}$-Lipschitz, as first observed in a celebrated theorem of Caffarelli. In this paper, we apply two classical covariance inequalities (the Brascamp-Lieb and Cramér-Rao inequalities) to prove a sharp bound on the Lipschitz constant of the map that arises from entropically regularized optimal transport. In the limit as the regularization tends to zero, we obtain an elegant and short proof of Caffarelli's original result. We also extend Caffarelli's theorem to the setting in which the Hessians of the log-densities of the measures are bounded by arbitrary positive definite commuting matrices.

preprint2022arXiv

Gaussian discrepancy: a probabilistic relaxation of vector balancing

We introduce a novel relaxation of combinatorial discrepancy called Gaussian discrepancy, whereby binary signings are replaced with correlated standard Gaussian random variables. This relaxation effectively reformulates an optimization problem over the Boolean hypercube into one over the space of correlation matrices. We show that Gaussian discrepancy is a tighter relaxation than the previously studied vector and spherical discrepancy problems, and we construct a fast online algorithm that achieves a version of the Banaszczyk bound for Gaussian discrepancy. This work also raises new questions such as the Komlós conjecture for Gaussian discrepancy, which may shed light on classical discrepancy problems.

preprint2022arXiv

Improved analysis for a proximal algorithm for sampling

We study the proximal sampler of Lee, Shen, and Tian (2021) and obtain new convergence guarantees under weaker assumptions than strong log-concavity: namely, our results hold for (1) weakly log-concave targets, and (2) targets satisfying isoperimetric assumptions which allow for non-log-concavity. We demonstrate our results by obtaining new state-of-the-art sampling guarantees for several classes of target distributions. We also strengthen the connection between the proximal sampler and the proximal method in optimization by interpreting the proximal sampler as an entropically regularized Wasserstein proximal method, and the proximal point method as the limit of the proximal sampler with vanishing noise.

preprint2022arXiv

Towards a Theory of Non-Log-Concave Sampling: First-Order Stationarity Guarantees for Langevin Monte Carlo

For the task of sampling from a density $π\propto \exp(-V)$ on $\mathbb{R}^d$, where $V$ is possibly non-convex but $L$-gradient Lipschitz, we prove that averaged Langevin Monte Carlo outputs a sample with $\varepsilon$-relative Fisher information after $O( L^2 d^2/\varepsilon^2)$ iterations. This is the sampling analogue of complexity bounds for finding an $\varepsilon$-approximate first-order stationary points in non-convex optimization and therefore constitutes a first step towards the general theory of non-log-concave sampling. We discuss numerous extensions and applications of our result; in particular, it yields a new state-of-the-art guarantee for sampling from distributions which satisfy a Poincaré inequality.

preprint2021arXiv

Dimension-free log-Sobolev inequalities for mixture distributions

We prove that if ${(P_x)}_{x\in \mathscr X}$ is a family of probability measures which satisfy the log-Sobolev inequality and whose pairwise chi-squared divergences are uniformly bounded, and $μ$ is any mixing distribution on $\mathscr X$, then the mixture $\int P_x \, \mathrm{d} μ(x)$ satisfies a log-Sobolev inequality. In various settings of interest, the resulting log-Sobolev constant is dimension-free. In particular, our result implies a conjecture of Zimmermann and Bardet et al. that Gaussian convolutions of measures with bounded support enjoy dimension-free log-Sobolev inequalities.

preprint2020arXiv

Exponential ergodicity of mirror-Langevin diffusions

Motivated by the problem of sampling from ill-conditioned log-concave distributions, we give a clean non-asymptotic convergence analysis of mirror-Langevin diffusions as introduced in Zhang et al. (2020). As a special case of this framework, we propose a class of diffusions called Newton-Langevin diffusions and prove that they converge to stationarity exponentially fast with a rate which not only is dimension-free, but also has no dependence on the target distribution. We give an application of this result to the problem of sampling from the uniform distribution on a convex body using a strategy inspired by interior-point methods. Our general approach follows the recent trend of linking sampling and optimization and highlights the role of the chi-squared divergence. In particular, it yields new results on the convergence of the vanilla Langevin diffusion in Wasserstein distance.

preprint2020arXiv

Gradient descent algorithms for Bures-Wasserstein barycenters

We study first order methods to compute the barycenter of a probability distribution $P$ over the space of probability measures with finite second moment. We develop a framework to derive global rates of convergence for both gradient descent and stochastic gradient descent despite the fact that the barycenter functional is not geodesically convex. Our analysis overcomes this technical hurdle by employing a Polyak-Lojasiewicz (PL) inequality and relies on tools from optimal transport and metric geometry. In turn, we establish a PL inequality when $P$ is supported on the Bures-Wasserstein manifold of Gaussian probability measures. It leads to the first global rates of convergence for first order methods in this context.

preprint2020arXiv

SVGD as a kernelized Wasserstein gradient flow of the chi-squared divergence

Stein Variational Gradient Descent (SVGD), a popular sampling algorithm, is often described as the kernelized gradient flow for the Kullback-Leibler divergence in the geometry of optimal transport. We introduce a new perspective on SVGD that instead views SVGD as the (kernelized) gradient flow of the chi-squared divergence which, we show, exhibits a strong form of uniform exponential ergodicity under conditions as weak as a Poincaré inequality. This perspective leads us to propose an alternative to SVGD, called Laplacian Adjusted Wasserstein Gradient Descent (LAWGD), that can be implemented from the spectral decomposition of the Laplacian operator associated with the target density. We show that LAWGD exhibits strong convergence guarantees and good practical performance.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Topic footprint