Source author record

Alain Durmus

Alain Durmus appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Computation math.PR math.ST Methodology Statistics Theory math.NA Artificial Intelligence math.OC Numerical Analysis

Catalog footprint

What is connected

24works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Characterizing the Generalization Error of Random Feature Regression with Arbitrary Data-Augmentation

This paper aims at analyzing the regularization effect that data augmentation induces on supervised regression methods in the proportional regime, where the number of covariates grows proportionally to the number of samples. We provide a tight characterization of the test error, measured in mean squared error, in terms only of the population quantities of the true data, as well as first and second order statistics of the augmentation scheme. Our results are valid under misspecified feature maps, and for any network architecture where only the last readout layer is trained, and the rest of the network is either frozen or randomly initialized. We specify our results in the case of Gaussian data, and show that our asymptotic characterization is tight in this setting.

preprint2026arXiv

Discrete Flow Matching: Convergence Guarantees Under Minimal Assumptions

Flow Matching has recently emerged as a popular class of generative models for simulating a target distribution $μ_1$ from samples drawn from a source distribution $μ_0$. This framework relies on a fixed coupling between $μ_0$ and $μ_1$, and on a deterministic or stochastic bridge to define an interpolating process between the two distributions. The time marginals of this process can then be approximately sampled by estimating the transition rates, or more generally the generator, of its Markovian projection. This framework has recently been extended to the case of discrete source and target distributions, under the name Discrete Flow Matching (DFM). However, theoretical guarantees for such models remain scarce. In this paper, we study two DFM models on $\mathbb{Z}_m^d = \{0,\ldots,m-1\}^d$, sampled through time discretization, and derive non-asymptotic associated bounds for both of them. In contrast to previous work, we establish non-asymptotic bounds in Kullback--Leibler divergence for the early-stopped version of the target distribution. We also derive explicit convergence guarantees in total variation distance with respect to the true target distribution. Importantly, these bounds rely only on an approximation error assumption, relaxing standard score assumptions used in earlier works, while also yielding improved dependence on the vocabulary size $m$ and the dimension $d$.

preprint2022arXiv

Fast Approximation of the Sliced-Wasserstein Distance Using Concentration of Random Projections

The Sliced-Wasserstein distance (SW) is being increasingly used in machine learning applications as an alternative to the Wasserstein distance and offers significant computational and statistical benefits. Since it is defined as an expectation over random projections, SW is commonly approximated by Monte Carlo. We adopt a new perspective to approximate SW by making use of the concentration of measure phenomenon: under mild assumptions, one-dimensional projections of a high-dimensional random vector are approximately Gaussian. Based on this observation, we develop a simple deterministic approximation for SW. Our method does not require sampling a number of random projections, and is therefore both accurate and easy to use compared to the usual Monte Carlo approximation. We derive nonasymptotical guarantees for our approach, and show that the approximation error goes to zero as the dimension increases, under a weak dependence condition on the data distribution. We validate our theoretical findings on synthetic datasets, and illustrate the proposed approximation on a generative modeling problem.

preprint2022arXiv

On the geometric convergence for MALA under verifiable conditions

While the Metropolis Adjusted Langevin Algorithm (MALA) is a popular and widely used Markov chain Monte Carlo method, very few papers derive conditions that ensure its convergence. In particular, to the authors' knowledge, assumptions that are both easy to verify and guarantee geometric convergence, are still missing. In this work, we establish $V$-uniformly geometric convergence for MALA under mild assumptions about the target distribution. Unlike previous work, we only consider tail and smoothness conditions for the potential associated with the target distribution. These conditions are quite common in the MCMC literature and are easy to verify in practice. Finally, we pay special attention to the dependence of the bounds we derive on the step size of the Euler-Maruyama discretization, which corresponds to the proposal Markov kernel of MALA.

preprint2022arXiv

QLSD: Quantised Langevin stochastic dynamics for Bayesian federated learning

The objective of Federated Learning (FL) is to perform statistical inference for data which are decentralised and stored locally on networked clients. FL raises many constraints which include privacy and data ownership, communication overhead, statistical heterogeneity, and partial client participation. In this paper, we address these problems in the framework of the Bayesian paradigm. To this end, we propose a novel federated Markov Chain Monte Carlo algorithm, referred to as Quantised Langevin Stochastic Dynamics which may be seen as an extension to the FL setting of Stochastic Gradient Langevin Dynamics, which handles the communication bottleneck using gradient compression. To improve performance, we then introduce variance reduction techniques, which lead to two improved versions coined \texttt{QLSD}$^{\star}$ and \texttt{QLSD}$^{++}$. We give both non-asymptotic and asymptotic convergence guarantees for the proposed algorithms. We illustrate their performances using various Bayesian Federated Learning benchmarks.

preprint2022arXiv

Statistical and Topological Properties of Sliced Probability Divergences

The idea of slicing divergences has been proven to be successful when comparing two probability measures in various machine learning applications including generative modeling, and consists in computing the expected value of a `base divergence' between one-dimensional random projections of the two measures. However, the topological, statistical, and computational consequences of this technique have not yet been well-established. In this paper, we aim at bridging this gap and derive various theoretical properties of sliced probability divergences. First, we show that slicing preserves the metric axioms and the weak continuity of the divergence, implying that the sliced divergence will share similar topological properties. We then precise the results in the case where the base divergence belongs to the class of integral probability metrics. On the other hand, we establish that, under mild conditions, the sample complexity of a sliced divergence does not depend on the problem dimension. We finally apply our general results to several base divergences, and illustrate our theory on both synthetic and real data experiments.

preprint2022arXiv

Variational Inference of overparameterized Bayesian Neural Networks: a theoretical and empirical study

This paper studies the Variational Inference (VI) used for training Bayesian Neural Networks (BNN) in the overparameterized regime, i.e., when the number of neurons tends to infinity. More specifically, we consider overparameterized two-layer BNN and point out a critical issue in the mean-field VI training. This problem arises from the decomposition of the lower bound on the evidence (ELBO) into two terms: one corresponding to the likelihood function of the model and the second to the Kullback-Leibler (KL) divergence between the prior distribution and the variational posterior. In particular, we show both theoretically and empirically that there is a trade-off between these two terms in the overparameterized regime only when the KL is appropriately re-scaled with respect to the ratio between the the number of observations and neurons. We also illustrate our theoretical results with numerical experiments that highlight the critical choice of this ratio.

preprint2021arXiv

Convergence rates and approximation results for SGD and its continuous-time counterpart

This paper proposes a thorough theoretical analysis of Stochastic Gradient Descent (SGD) with non-increasing step sizes. First, we show that the recursion defining SGD can be provably approximated by solutions of a time inhomogeneous Stochastic Differential Equation (SDE) using an appropriate coupling. In the specific case of a batch noise we refine our results using recent advances in Stein's method. Then, motivated by recent analyses of deterministic and stochastic optimization methods by their continuous counterpart, we study the long-time behavior of the continuous processes at hand and establish non-asymptotic bounds. To that purpose, we develop new comparison techniques which are of independent interest. Adapting these techniques to the discrete setting, we show that the same results hold for the corresponding SGD sequences. In our analysis, we notably improve non-asymptotic bounds in the convex setting for SGD under weaker assumptions than the ones considered in previous works. Finally, we also establish finite-time convergence results under various conditions, including relaxations of the famous Łojasiewicz inequality, which can be applied to a class of non-convex functions.

preprint2021arXiv

On Riemannian Stochastic Approximation Schemes with Fixed Step-Size

This paper studies fixed step-size stochastic approximation (SA) schemes, including stochastic gradient schemes, in a Riemannian framework. It is motivated by several applications, where geodesics can be computed explicitly, and their use accelerates crude Euclidean methods. A fixed step-size scheme defines a family of time-homogeneous Markov chains, parametrized by the step-size. Here, using this formulation, non-asymptotic performance bounds are derived, under Lyapunov conditions. Then, for any step-size, the corresponding Markov chain is proved to admit a unique stationary distribution, and to be geometrically ergodic. This result gives rise to a family of stationary distributions indexed by the step-size, which is further shown to converge to a Dirac measure, concentrated at the solution of the problem at hand, as the step-size goes to 0. Finally, the asymptotic rate of this convergence is established, through an asymptotic expansion of the bias, and a central limit theorem.

preprint2021arXiv

On the Stability of Random Matrix Product with Markovian Noise: Application to Linear Stochastic Approximation and TD Learning

This paper studies the exponential stability of random matrix products driven by a general (possibly unbounded) state space Markov chain. It is a cornerstone in the analysis of stochastic algorithms in machine learning (e.g. for parameter tracking in online learning or reinforcement learning). The existing results impose strong conditions such as uniform boundedness of the matrix-valued functions and uniform ergodicity of the Markov chains. Our main contribution is an exponential stability result for the $p$-th moment of random matrix product, provided that (i) the underlying Markov chain satisfies a super-Lyapunov drift condition, (ii) the growth of the matrix-valued functions is controlled by an appropriately defined function (related to the drift condition). Using this result, we give finite-time $p$-th moment bounds for constant and decreasing stepsize linear stochastic approximation schemes with Markovian noise on general state space. We illustrate these findings for linear value-function estimation in reinforcement learning. We provide finite-time $p$-th moment bound for various members of temporal difference (TD) family of algorithms.

preprint2020arXiv

Approximate Bayesian Computation with the Sliced-Wasserstein Distance

Approximate Bayesian Computation (ABC) is a popular method for approximate inference in generative models with intractable but easy-to-sample likelihood. It constructs an approximate posterior distribution by finding parameters for which the simulated data are close to the observations in terms of summary statistics. These statistics are defined beforehand and might induce a loss of information, which has been shown to deteriorate the quality of the approximation. To overcome this problem, Wasserstein-ABC has been recently proposed, and compares the datasets via the Wasserstein distance between their empirical distributions, but does not scale well to the dimension or the number of samples. We propose a new ABC technique, called Sliced-Wasserstein ABC and based on the Sliced-Wasserstein distance, which has better computational and statistical properties. We derive two theoretical results showing the asymptotical consistency of our approach, and we illustrate its advantages on synthetic data and an image denoising task.

preprint2020arXiv

Asymptotic Guarantees for Learning Generative Models with the Sliced-Wasserstein Distance

Minimum expected distance estimation (MEDE) algorithms have been widely used for probabilistic models with intractable likelihood functions and they have become increasingly popular due to their use in implicit generative modeling (e.g. Wasserstein generative adversarial networks, Wasserstein autoencoders). Emerging from computational optimal transport, the Sliced-Wasserstein (SW) distance has become a popular choice in MEDE thanks to its simplicity and computational benefits. While several studies have reported empirical success on generative modeling with SW, the theoretical properties of such estimators have not yet been established. In this study, we investigate the asymptotic properties of estimators that are obtained by minimizing SW. We first show that convergence in SW implies weak convergence of probability measures in general Wasserstein spaces. Then we show that estimators obtained by minimizing SW (and also an approximate version of SW) are asymptotically consistent. We finally prove a central limit theorem, which characterizes the asymptotic distribution of the estimators and establish a convergence rate of $\sqrt{n}$, where $n$ denotes the number of observed data points. We illustrate the validity of our theory on both synthetic data and neural networks.

preprint2020arXiv

Convergence of diffusions and their discretizations: from continuous to discrete processes and back

In this paper, we establish new quantitative convergence bounds for a class of functional autoregressive models in weighted total variation metrics. To derive our results, we show that under mild assumptions, explicit minorization and Foster-Lyapunov drift conditions hold. The main applications and consequences of the bounds we obtain concern the geometric convergence of Euler-Maruyama discretizations of diffusions with identity covariance matrix. Second, as a corollary, we provide a new approach to establish quantitative convergence of these diffusion processes by applying our conclusions in the discrete-time setting to a well-suited sequence of discretizations whose associated stepsizes decrease towards zero.

preprint2020arXiv

Efficient stochastic optimisation by unadjusted Langevin Monte Carlo. Application to maximum marginal likelihood and empirical Bayesian estimation

Stochastic approximation methods play a central role in maximum likelihood estimation problems involving intractable likelihood functions, such as marginal likelihoods arising in problems with missing or incomplete data, and in parametric empirical Bayesian estimation. Combined with Markov chain Monte Carlo algorithms, these stochastic optimisation methods have been successfully applied to a wide range of problems in science and industry. However, this strategy scales poorly to large problems because of methodological and theoretical difficulties related to using high-dimensional Markov chain Monte Carlo algorithms within a stochastic approximation scheme. This paper proposes to address these difficulties by using unadjusted Langevin algorithms to construct the stochastic approximation. This leads to a highly efficient stochastic optimisation methodology with favourable convergence properties that can be quantified explicitly and easily checked. The proposed methodology is demonstrated with three experiments, including a challenging application to high-dimensional statistical audio analysis and a sparse Bayesian logistic regression with random effects problem.

preprint2020arXiv

Forward Event-Chain Monte Carlo: Fast sampling by randomness control in irreversible Markov chains

Irreversible and rejection-free Monte Carlo methods, recently developed in Physics under the name Event-Chain and known in Statistics as Piecewise Deterministic Monte Carlo (PDMC), have proven to produce clear acceleration over standard Monte Carlo methods, thanks to the reduction of their random-walk behavior. However, while applying such schemes to standard statistical models, one generally needs to introduce an additional randomization for sake of correctness. We propose here a new class of Event-Chain Monte Carlo methods that reduces this extra-randomization to a bare minimum. We compare the efficiency of this new methodology to standard PDMC and Monte Carlo methods. Accelerations up to several magnitudes and reduced dimensional scalings are exhibited.

preprint2020arXiv

Maximum likelihood estimation of regularisation parameters in high-dimensional inverse problems: an empirical Bayesian approach. Part I: Methodology and Experiments

Many imaging problems require solving an inverse problem that is ill-conditioned or ill-posed. Imaging methods typically address this difficulty by regularising the estimation problem to make it well-posed. This often requires setting the value of the so-called regularisation parameters that control the amount of regularisation enforced. These parameters are notoriously difficult to set a priori, and can have a dramatic impact on the recovered estimates. In this work, we propose a general empirical Bayesian method for setting regularisation parameters in imaging problems that are convex w.r.t. the unknown image. Our method calibrates regularisation parameters directly from the observed data by maximum marginal likelihood estimation, and can simultaneously estimate multiple regularisation parameters. Furthermore, the proposed algorithm uses the same basic operators as proximal optimisation algorithms, namely gradient and proximal operators, and it is therefore straightforward to apply to problems that are currently solved by using proximal optimisation techniques. Our methodology is demonstrated with a range of experiments and comparisons with alternative approaches from the literature. The considered experiments include image denoising, non-blind image deconvolution, and hyperspectral unmixing, using synthesis and analysis priors involving the L1, total-variation, total-variation and L1, and total-generalised-variation pseudo-norms. A detailed theoretical analysis of the proposed method is presented in the companion paper arXiv:2008.05793.

preprint2020arXiv

Maximum likelihood estimation of regularisation parameters in high-dimensional inverse problems: an empirical Bayesian approach. Part II: Theoretical Analysis

This paper presents a detailed theoretical analysis of the three stochastic approximation proximal gradient algorithms proposed in our companion paper [49] to set regularization parameters by marginal maximum likelihood estimation. We prove the convergence of a more general stochastic approximation scheme that includes the three algorithms of [49] as special cases. This includes asymptotic and non-asymptotic convergence results with natural and easily verifiable conditions, as well as explicit bounds on the convergence rates. Importantly, the theory is also general in that it can be applied to other intractable optimisation problems. A main novelty of the work is that the stochastic gradient estimates of our scheme are constructed from inexact proximal Markov chain Monte Carlo samplers. This allows the use of samplers that scale efficiently to large problems and for which we have precise theoretical guarantees.

preprint2020arXiv

MetFlow: A New Efficient Method for Bridging the Gap between Markov Chain Monte Carlo and Variational Inference

In this contribution, we propose a new computationally efficient method to combine Variational Inference (VI) with Markov Chain Monte Carlo (MCMC). This approach can be used with generic MCMC kernels, but is especially well suited to \textit{MetFlow}, a novel family of MCMC algorithms we introduce, in which proposals are obtained using Normalizing Flows. The marginal distribution produced by such MCMC algorithms is a mixture of flow-based distributions, thus drastically increasing the expressivity of the variational family. Unlike previous methods following this direction, our approach is amenable to the reparametrization trick and does not rely on computationally expensive reverse kernels. Extensive numerical experiments show clear computational and performance improvements over state-of-the-art methods.

preprint2020arXiv

Quantitative Propagation of Chaos for SGD in Wide Neural Networks

In this paper, we investigate the limiting behavior of a continuous-time counterpart of the Stochastic Gradient Descent (SGD) algorithm applied to two-layer overparameterized neural networks, as the number or neurons (ie, the size of the hidden layer) $N \to +\infty$. Following a probabilistic approach, we show 'propagation of chaos' for the particle system defined by this continuous-time dynamics under different scenarios, indicating that the statistical interaction between the particles asymptotically vanishes. In particular, we establish quantitative convergence with respect to $N$ of any particle to a solution of a mean-field McKean-Vlasov equation in the metric space endowed with the Wasserstein distance. In comparison to previous works on the subject, we consider settings in which the sequence of stepsizes in SGD can potentially depend on the number of neurons and the iterations. We then identify two regimes under which different mean-field limits are obtained, one of them corresponding to an implicitly regularized version of the minimization problem at hand. We perform various experiments on real datasets to validate our theoretical results, assessing the existence of these two regimes on classification problems and illustrating our convergence results.

preprint2016arXiv

Efficient Bayesian computation by proximal Markov chain Monte Carlo: when Langevin meets Moreau

Modern imaging methods rely strongly on Bayesian inference techniques to solve challenging imaging problems. Currently, the predominant Bayesian computation approach is convex optimisation, which scales very efficiently to high dimensional image models and delivers accurate point estimation results. However, in order to perform more complex analyses, for example image uncertainty quantification or model selection, it is necessary to use more computationally intensive Bayesian computation techniques such as Markov chain Monte Carlo methods. This paper presents a new and highly efficient Markov chain Monte Carlo methodology to perform Bayesian computation for high dimensional models that are log-concave and non-smooth, a class of models that is central in imaging sciences. The methodology is based on a regularised unadjusted Langevin algorithm that exploits tools from convex analysis, namely Moreau-Yoshida envelopes and proximal operators, to construct Markov chains with favourable convergence properties. In addition to scaling efficiently to high dimensions, the method is straightforward to apply to models that are currently solved by using proximal optimisation algorithms. We provide a detailed theoretical analysis of the proposed methodology, including asymptotic and non-asymptotic convergence results with easily verifiable conditions, and explicit bounds on the convergence rates. The proposed methodology is demonstrated with four experiments related to image deconvolution and tomographic reconstruction with total-variation and $\ell_1$ priors, where we conduct a range of challenging Bayesian analyses related to uncertainty quantification, hypothesis testing, and model selection in the absence of ground truth.

preprint2016arXiv

Fast Langevin based algorithm for MCMC in high dimensions

We introduce new Gaussian proposals to improve the efficiency of the standard Hastings-Metropolis algorithm in Markov chain Monte Carlo (MCMC) methods, used for the sampling from a target distribution in large dimension $d$. The improved complexity is $\mathcal{O}(d^{1/5})$ compared to the complexity $\mathcal{O}(d^{1/3})$ of the standard approach. We prove an asymptotic diffusion limit theorem and show that the relative efficiency of the algorithm can be characterised by its overall acceptance rate (with asymptotical value 0.704), independently of the target distribution. Numerical experiments confirm our theoretical findings.

preprint2016arXiv

Non-asymptotic convergence analysis for the Unadjusted Langevin Algorithm

In this paper, we study a method to sample from a target distribution $π$ over $\mathbb{R}^d$ having a positive density with respect to the Lebesgue measure, known up to a normalisation factor. This method is based on the Euler discretization of the overdamped Langevin stochastic differential equation associated with $π$. For both constant and decreasing step sizes in the Euler discretization, we obtain non-asymptotic bounds for the convergence to the target distribution $π$ in total variation distance. A particular attention is paid to the dependency on the dimension $d$, to demonstrate the applicability of this method in the high dimensional setting. These bounds improve and extend the results of (Dalalyan 2014).

preprint2016arXiv

Optimal scaling of the Random Walk Metropolis algorithm under Lp mean differentiability

This paper considers the optimal scaling problem for high-dimensional random walk Metropolis algorithms for densities which are differentiable in Lp mean but which may be irregular at some points (like the Laplace density for example) and/or are supported on an interval. Our main result is the weak convergence of the Markov chain (appropriately rescaled in time and space) to a Langevin diffusion process as the dimension d goes to infinity. Because the log-density might be non-differentiable, the limiting diffusion could be singular. The scaling limit is established under assumptions which are much weaker than the one used in the original derivation of [6]. This result has important practical implications for the use of random walk Metropolis algorithms in Bayesian frameworks based on sparsity inducing priors.

preprint2015arXiv

Subgeometric rates of convergence in Wasserstein distance for Markov chains

In this paper, we provide sufficient conditions for the existence of the invariant distribution and for subgeometric rates of convergence in Wasserstein distance for general state-space Markov chains which are (possibly) not irreducible. Compared to previous work, our approach is based on a purely probabilistic coupling construction which allows to retrieve rates of convergence matching those previously reported for convergence in total variation. Our results are applied to establish the subgeometric ergodicity in Wasserstein distance of non-linear autoregressive models and of the pre-conditioned Crank-Nicolson Markov chain Monte Carlo algorithm in Hilbert space.

Alain Durmus

What is connected

Connect this record

See the researcher in context

Building this map preview

24 published item(s)

Characterizing the Generalization Error of Random Feature Regression with Arbitrary Data-Augmentation

Discrete Flow Matching: Convergence Guarantees Under Minimal Assumptions

Fast Approximation of the Sliced-Wasserstein Distance Using Concentration of Random Projections

On the geometric convergence for MALA under verifiable conditions

QLSD: Quantised Langevin stochastic dynamics for Bayesian federated learning

Statistical and Topological Properties of Sliced Probability Divergences

Variational Inference of overparameterized Bayesian Neural Networks: a theoretical and empirical study

Convergence rates and approximation results for SGD and its continuous-time counterpart

On Riemannian Stochastic Approximation Schemes with Fixed Step-Size

On the Stability of Random Matrix Product with Markovian Noise: Application to Linear Stochastic Approximation and TD Learning

Approximate Bayesian Computation with the Sliced-Wasserstein Distance

Asymptotic Guarantees for Learning Generative Models with the Sliced-Wasserstein Distance

Convergence of diffusions and their discretizations: from continuous to discrete processes and back

Efficient stochastic optimisation by unadjusted Langevin Monte Carlo. Application to maximum marginal likelihood and empirical Bayesian estimation

Forward Event-Chain Monte Carlo: Fast sampling by randomness control in irreversible Markov chains

Maximum likelihood estimation of regularisation parameters in high-dimensional inverse problems: an empirical Bayesian approach. Part I: Methodology and Experiments

Maximum likelihood estimation of regularisation parameters in high-dimensional inverse problems: an empirical Bayesian approach. Part II: Theoretical Analysis

MetFlow: A New Efficient Method for Bridging the Gap between Markov Chain Monte Carlo and Variational Inference

Quantitative Propagation of Chaos for SGD in Wide Neural Networks

Efficient Bayesian computation by proximal Markov chain Monte Carlo: when Langevin meets Moreau

Fast Langevin based algorithm for MCMC in high dimensions

Non-asymptotic convergence analysis for the Unadjusted Langevin Algorithm

Optimal scaling of the Random Walk Metropolis algorithm under Lp mean differentiability

Subgeometric rates of convergence in Wasserstein distance for Markov chains