Source author record

Stephan Mandt

Stephan Mandt appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning cond-mat.quant-gas Artificial Intelligence cond-mat.stat-mech eess.IV physics.chem-ph quant-ph Computer Vision Information Theory math.IT Computation cond-mat.mes-hall hep-ex hep-ph

Catalog footprint

What is connected

32works

14topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Skipping the Zeros in Diffusion Models for Sparse Data Generation

Diffusion models (DMs) excel on dense continuous data, but are not designed for sparse continuous data. They do not model exact zeros that represent the deliberate absence of a signal. As a result, they erase sparsity patterns and perform unnecessary computation on mostly zero entries. With Sparsity-Exploiting Diffusion (SED), we model only non-zero values, preserving sparsity. SED delivers computational savings while maintaining or improving generation quality by skipping zeros during training and inference. Across physics and biology benchmarks, SED matches or surpasses conventional DMs and domain-specific baselines, while vision experiments provide intuitive insights into the limitations of dense DMs and the benefits of SED.

preprint2024arXiv

Lossy Image Compression with Conditional Diffusion Models

This paper outlines an end-to-end optimized lossy image compression framework using diffusion generative models. The approach relies on the transform coding paradigm, where an image is mapped into a latent space for entropy coding and, from there, mapped back to the data space for reconstruction. In contrast to VAE-based neural compression, where the (mean) decoder is a deterministic neural network, our decoder is a conditional diffusion model. Our approach thus introduces an additional ``content'' latent variable on which the reverse diffusion process is conditioned and uses this variable to store information about the image. The remaining ``texture'' variables characterizing the diffusion process are synthesized at decoding time. We show that the model's performance can be tuned toward perceptual metrics of interest. Our extensive experiments involving multiple datasets and image quality assessment metrics show that our approach yields stronger reported FID scores than the GAN-based model, while also yielding competitive performance with VAE-based models in several distortion metrics. Furthermore, training the diffusion with $\mathcal{X}$-parameterization enables high-quality reconstructions in only a handful of decoding steps, greatly affecting the model's practicality. Our code is available at: \url{https://github.com/buggyyang/CDC_compression}

preprint2022arXiv

Hybridizing Physical and Data-driven Prediction Methods for Physicochemical Properties

We present a generic way to hybridize physical and data-driven methods for predicting physicochemical properties. The approach `distills' the physical method's predictions into a prior model and combines it with sparse experimental data using Bayesian inference. We apply the new approach to predict activity coefficients at infinite dilution and obtain significant improvements compared to the data-driven and physical baselines and established ensemble methods from the machine learning literature.

preprint2022arXiv

Improving Sequential Latent Variable Models with Autoregressive Flows

We propose an approach for improving sequence modeling based on autoregressive normalizing flows. Each autoregressive transform, acting across time, serves as a moving frame of reference, removing temporal correlations, and simplifying the modeling of higher-level dynamics. This technique provides a simple, general-purpose method for improving sequence modeling, with connections to existing and classical techniques. We demonstrate the proposed approach both with standalone flow-based models and as a component within sequential latent variable models. Results are presented on three benchmark video datasets, where autoregressive flow-based dynamics improve log-likelihood performance over baseline models. Finally, we illustrate the decorrelation and improved generalization properties of using flow-based dynamics.

preprint2022arXiv

Latent Outlier Exposure for Anomaly Detection with Contaminated Data

Anomaly detection aims at identifying data points that show systematic deviations from the majority of data in an unlabeled dataset. A common assumption is that clean training data (free of anomalies) is available, which is often violated in practice. We propose a strategy for training an anomaly detector in the presence of unlabeled anomalies that is compatible with a broad class of models. The idea is to jointly infer binary labels to each datum (normal vs. anomalous) while updating the model parameters. Inspired by outlier exposure (Hendrycks et al., 2018) that considers synthetically created, labeled anomalies, we thereby use a combination of two losses that share parameters: one for the normal and one for the anomalous data. We then iteratively proceed with block coordinate updates on the parameters and the most likely (latent) labels. Our experiments with several backbone models on three image datasets, 30 tabular data sets, and a video anomaly detection benchmark showed consistent and significant improvements over the baselines.

preprint2022arXiv

Learning to Simulate High Energy Particle Collisions from Unlabeled Data

In many scientific fields which rely on statistical inference, simulations are often used to map from theoretical models to experimental data, allowing scientists to test model predictions against experimental results. Experimental data is often reconstructed from indirect measurements causing the aggregate transformation from theoretical models to experimental data to be poorly-described analytically. Instead, numerical simulations are used at great computational cost. We introduce Optimal-Transport-based Unfolding and Simulation (OTUS), a fast simulator based on unsupervised machine-learning that is capable of predicting experimental data from theoretical models. Without the aid of current simulation information, OTUS trains a probabilistic autoencoder to transform directly between theoretical models and experimental data. Identifying the probabilistic autoencoder's latent space with the space of theoretical models causes the decoder network to become a fast, predictive simulator with the potential to replace current, computationally-costly simulators. Here, we provide proof-of-principle results on two particle physics examples, $Z$-boson and top-quark decays, but stress that OTUS can be widely applied to other fields.

preprint2022arXiv

Lossless Compression with Probabilistic Circuits

Despite extensive progress on image generation, common deep generative model architectures are not easily applied to lossless compression. For example, VAEs suffer from a compression cost overhead due to their latent variables. This overhead can only be partially eliminated with elaborate schemes such as bits-back coding, often resulting in poor single-sample compression rates. To overcome such problems, we establish a new class of tractable lossless compression models that permit efficient encoding and decoding: Probabilistic Circuits (PCs). These are a class of neural networks involving $|p|$ computational units that support efficient marginalization over arbitrary subsets of the $D$ feature dimensions, enabling efficient arithmetic coding. We derive efficient encoding and decoding schemes that both have time complexity $\mathcal{O} (\log(D) \cdot |p|)$, where a naive scheme would have linear costs in $D$ and $|p|$, making the approach highly scalable. Empirically, our PC-based (de)compression algorithm runs 5-40 times faster than neural compression algorithms that achieve similar bitrates. By scaling up the traditional PC structure learning pipeline, we achieve state-of-the-art results on image datasets such as MNIST. Furthermore, PCs can be naturally integrated with existing neural compression algorithms to improve the performance of these base models on natural image datasets. Our results highlight the potential impact that non-standard learning architectures may have on neural data compression.

preprint2022arXiv

Making Thermodynamic Models of Mixtures Predictive by Machine Learning: Matrix Completion of Pair Interactions

Predictive models of thermodynamic properties of mixtures are paramount in chemical engineering and chemistry. Classical thermodynamic models are successful in generalizing over (continuous) conditions like temperature and concentration. On the other hand, matrix completion methods (MCMs) from machine learning successfully generalize over (discrete) binary systems; these MCMs can make predictions without any data for a given binary system by implicitly learning commonalities across systems. In the present work, we combine the strengths of both worlds in a hybrid approach. The underlying idea is to predict the pair-interaction energies, as they are used in basically all physical models of liquid mixtures, by an MCM. As an example, we embed an MCM into UNIQUAC, a widely-used physical model for the Gibbs excess energy. We train the resulting hybrid model in a Bayesian machine-learning framework on experimental data for activity coefficients in binary systems of 1146 components from the Dortmund Data Bank. We thereby obtain, for the first time, a complete set of UNIQUAC parameters for all binary systems of these components, which allows us to predict, in principle, activity coefficients at arbitrary temperature and composition for any combination of these components, not only for binary but also for multicomponent systems. The hybrid model even outperforms the best available physical model for predicting activity coefficients, the modified UNIFAC (Dortmund) model.

preprint2022arXiv

Neural Transformation Learning for Deep Anomaly Detection Beyond Images

Data transformations (e.g. rotations, reflections, and cropping) play an important role in self-supervised learning. Typically, images are transformed into different views, and neural networks trained on tasks involving these views produce useful feature representations for downstream tasks, including anomaly detection. However, for anomaly detection beyond image data, it is often unclear which transformations to use. Here we present a simple end-to-end procedure for anomaly detection with learnable transformations. The key idea is to embed the transformed data into a semantic space such that the transformed data still resemble their untransformed form, while different transformations are easily distinguishable. Extensive experiments on time series demonstrate that our proposed method outperforms existing approaches in the one-vs.-rest setting and is competitive in the more challenging n-vs.-rest anomaly detection task. On tabular datasets from the medical and cyber-security domains, our method learns domain-specific transformations and detects anomalies more accurately than previous work.

preprint2022arXiv

Raising the Bar in Graph-level Anomaly Detection

Graph-level anomaly detection has become a critical topic in diverse areas, such as financial fraud detection and detecting anomalous activities in social networks. While most research has focused on anomaly detection for visual data such as images, where high detection accuracies have been obtained, existing deep learning approaches for graphs currently show considerably worse performance. This paper raises the bar on graph-level anomaly detection, i.e., the task of detecting abnormal graphs in a set of graphs. By drawing on ideas from self-supervised learning and transformation learning, we present a new deep learning approach that significantly improves existing deep one-class approaches by fixing some of their known problems, including hypersphere collapse and performance flip. Experiments on nine real-world data sets involving nine techniques reveal that our method achieves an average performance improvement of 11.8% AUC compared to the best existing approach.

preprint2022arXiv

Structured Stochastic Gradient MCMC

Stochastic gradient Markov Chain Monte Carlo (SGMCMC) is considered the gold standard for Bayesian inference in large-scale models, such as Bayesian neural networks. Since practitioners face speed versus accuracy tradeoffs in these models, variational inference (VI) is often the preferable option. Unfortunately, VI makes strong assumptions on both the factorization and functional form of the posterior. In this work, we propose a new non-parametric variational approximation that makes no assumptions about the approximate posterior's functional form and allows practitioners to specify the exact dependencies the algorithm should respect or break. The approach relies on a new Langevin-type algorithm that operates on a modified energy function, where parts of the latent variables are averaged over samples from earlier iterations of the Markov chain. This way, statistical dependencies can be broken in a controlled way, allowing the chain to mix faster. This scheme can be further modified in a "dropout" manner, leading to even more scalability. We test our scheme for ResNet-20 on CIFAR-10, SVHN, and FMNIST. In all cases, we find improvements in convergence speed and/or final accuracy compared to SG-MCMC and VI.

preprint2022arXiv

Towards Empirical Sandwich Bounds on the Rate-Distortion Function

Rate-distortion (R-D) function, a key quantity in information theory, characterizes the fundamental limit of how much a data source can be compressed subject to a fidelity criterion, by any compression algorithm. As researchers push for ever-improving compression performance, establishing the R-D function of a given data source is not only of scientific interest, but also sheds light on the possible room for improving compression algorithms. Previous work on this problem relied on distributional assumptions on the data source (Gibson, 2017) or only applied to discrete data (Blahut, 1972; Arimoto, 1972). By contrast, this paper makes the first attempt at an algorithm for sandwiching the R-D function of a general (not necessarily discrete) source requiring only i.i.d. data samples. We estimate R-D sandwich bounds for a variety of artificial and real-world data sources, in settings far beyond the feasibility of any known method, and shed light on the optimality of neural data compression (Ballé et al., 2021; Yang et al., 2022). Our R-D upper bound on natural images indicates theoretical room for improving state-of-the-art image compression methods by at least one dB in PSNR at various bitrates. Our data and code can be found at https://github.com/mandt-lab/empirical-RD-sandwich.

preprint2021arXiv

Improving Inference for Neural Image Compression

We consider the problem of lossy image compression with deep latent variable models. State-of-the-art methods build on hierarchical variational autoencoders (VAEs) and learn inference networks to predict a compressible latent representation of each data point. Drawing on the variational inference perspective on compression, we identify three approximation gaps which limit performance in the conventional approach: an amortization gap, a discretization gap, and a marginalization gap. We propose remedies for each of these three limitations based on ideas related to iterative inference, stochastic annealing for discrete optimization, and bits-back coding, resulting in the first application of bits-back coding to lossy compression. In our experiments, which include extensive baseline comparisons and ablation studies, we achieve new state-of-the-art performance on lossy image compression using an established VAE architecture, by changing only the inference method.

preprint2021arXiv

Scalable Gaussian Process Variational Autoencoders

Conventional variational autoencoders fail in modeling correlations between data points due to their use of factorized priors. Amortized Gaussian process inference through GP-VAEs has led to significant improvements in this regard, but is still inhibited by the intrinsic complexity of exact GP inference. We improve the scalability of these methods through principled sparse inference approaches. We propose a new scalable GP-VAE model that outperforms existing approaches in terms of runtime and memory footprint, is easy to implement, and allows for joint end-to-end optimization of all components.

preprint2020arXiv

Extreme Classification via Adversarial Softmax Approximation

Training a classifier over a large number of classes, known as 'extreme classification', has become a topic of major interest with applications in technology, science, and e-commerce. Traditional softmax regression induces a gradient cost proportional to the number of classes $C$, which often is prohibitively expensive. A popular scalable softmax approximation relies on uniform negative sampling, which suffers from slow convergence due a poor signal-to-noise ratio. In this paper, we propose a simple training method for drastically enhancing the gradient signal by drawing negative samples from an adversarial model that mimics the data distribution. Our contributions are three-fold: (i) an adversarial sampling mechanism that produces negative samples at a cost only logarithmic in $C$, thus still resulting in cheap gradient updates; (ii) a mathematical proof that this adversarial sampling minimizes the gradient variance while any bias due to non-uniform sampling can be removed; (iii) experimental results on large scale data sets that show a reduction of the training time by an order of magnitude relative to several competitive baselines.

preprint2020arXiv

GP-VAE: Deep Probabilistic Time Series Imputation

Multivariate time series with missing values are common in areas such as healthcare and finance, and have grown in number and complexity over the years. This raises the question whether deep learning methodologies can outperform classical data imputation methods in this domain. However, naive applications of deep learning fall short in giving reliable confidence estimates and lack interpretability. We propose a new deep sequential latent variable model for dimensionality reduction and data imputation. Our modeling assumption is simple and interpretable: the high dimensional time series has a lower-dimensional representation which evolves smoothly in time according to a Gaussian process. The non-linear dimensionality reduction in the presence of missing data is achieved using a VAE approach with a novel structured variational approximation. We demonstrate that our approach outperforms several classical and deep learning-based data imputation methods on high-dimensional data from the domains of computer vision and healthcare, while additionally improving the smoothness of the imputations and providing interpretable uncertainty estimates.

preprint2020arXiv

How Good is the Bayes Posterior in Deep Neural Networks Really?

During the past five years the Bayesian deep learning community has developed increasingly accurate and efficient approximate inference procedures that allow for Bayesian inference in deep neural networks. However, despite this algorithmic progress and the promise of improved uncertainty quantification and sample efficiency there are---as of early 2020---no publicized deployments of Bayesian neural networks in industrial practice. In this work we cast doubt on the current understanding of Bayes posteriors in popular deep neural networks: we demonstrate through careful MCMC sampling that the posterior predictive induced by the Bayes posterior yields systematically worse predictions compared to simpler methods including point estimates obtained from SGD. Furthermore, we demonstrate that predictive performance is improved significantly through the use of a "cold posterior" that overcounts evidence. Such cold posteriors sharply deviate from the Bayesian paradigm but are commonly used as heuristic in Bayesian deep learning papers. We put forward several hypotheses that could explain cold posteriors and evaluate the hypotheses through experiments. Our work questions the goal of accurate posterior approximations in Bayesian deep learning: If the true Bayes posterior is poor, what is the use of more accurate approximations? Instead, we argue that it is timely to focus on understanding the origin of the improved performance of cold posteriors.

preprint2020arXiv

Machine Learning in Thermodynamics: Prediction of Activity Coefficients by Matrix Completion

Activity coefficients, which are a measure of the non-ideality of liquid mixtures, are a key property in chemical engineering with relevance to modeling chemical and phase equilibria as well as transport processes. Although experimental data on thousands of binary mixtures are available, prediction methods are needed to calculate the activity coefficients in many relevant mixtures that have not been explored to-date. In this report, we propose a probabilistic matrix factorization model for predicting the activity coefficients in arbitrary binary mixtures. Although no physical descriptors for the considered components were used, our method outperforms the state-of-the-art method that has been refined over three decades while requiring much less training effort. This opens perspectives to novel methods for predicting physico-chemical properties of binary mixtures with the potential to revolutionize modeling and simulation in chemical engineering.

preprint2020arXiv

The k-tied Normal Distribution: A Compact Parameterization of Gaussian Mean Field Posteriors in Bayesian Neural Networks

Variational Bayesian Inference is a popular methodology for approximating posterior distributions over Bayesian neural network weights. Recent work developing this class of methods has explored ever richer parameterizations of the approximate posterior in the hope of improving performance. In contrast, here we share a curious experimental finding that suggests instead restricting the variational distribution to a more compact parameterization. For a variety of deep Bayesian neural networks trained using Gaussian mean-field variational inference, we find that the posterior standard deviations consistently exhibit strong low-rank structure after convergence. This means that by decomposing these variational parameters into a low-rank factorization, we can make our variational approximation more compact without decreasing the models' performance. Furthermore, we find that such factorized parameterizations improve the signal-to-noise ratio of stochastic gradient estimates of the variational lower bound, resulting in faster convergence.

preprint2020arXiv

Variational Bayesian Quantization

We propose a novel algorithm for quantizing continuous latent representations in trained models. Our approach applies to deep probabilistic models, such as variational autoencoders (VAEs), and enables both data and model compression. Unlike current end-to-end neural compression methods that cater the model to a fixed quantization scheme, our algorithm separates model design and training from quantization. Consequently, our algorithm enables "plug-and-play" compression with variable rate-distortion trade-off, using a single trained model. Our algorithm can be seen as a novel extension of arithmetic coding to the continuous domain, and uses adaptive quantization accuracy based on estimates of posterior uncertainty. Our experimental results demonstrate the importance of taking into account posterior uncertainties, and show that image compression with the proposed algorithm outperforms JPEG over a wide range of bit rates using only a single standard VAE. Further experiments on Bayesian neural word embeddings demonstrate the versatility of the proposed method.

preprint2019arXiv

Tightening Bounds for Variational Inference by Revisiting Perturbation Theory

Variational inference has become one of the most widely used methods in latent variable modeling. In its basic form, variational inference employs a fully factorized variational distribution and minimizes its KL divergence to the posterior. As the minimization can only be carried out approximately, this approximation induces a bias. In this paper, we revisit perturbation theory as a powerful way of improving the variational approximation. Perturbation theory relies on a form of Taylor expansion of the log marginal likelihood, vaguely in terms of the log ratio of the true posterior and its variational approximation. While first order terms give the classical variational bound, higher-order terms yield corrections that tighten it. However, traditional perturbation theory does not provide a lower bound, making it inapt for stochastic optimization. In this paper, we present a similar yet alternative way of deriving corrections to the ELBO that resemble perturbation theory, but that result in a valid bound. We show in experiments on Gaussian Processes and Variational Autoencoders that the new bounds are more mass covering, and that the resulting posterior covariances are closer to the true posterior and lead to higher likelihoods on held-out data.

preprint2018arXiv

Improving Optimization for Models With Continuous Symmetry Breaking

Many loss functions in representation learning are invariant under a continuous symmetry transformation. For example, the loss function of word embeddings (Mikolov et al., 2013) remains unchanged if we simultaneously rotate all word and context embedding vectors. We show that representation learning models for time series possess an approximate continuous symmetry that leads to slow convergence of gradient descent. We propose a new optimization algorithm that speeds up convergence using ideas from gauge theory in physics. Our algorithm leads to orders of magnitude faster convergence and to more interpretable representations, as we show for dynamic extensions of matrix factorization and word embedding models. We further present an example application of our proposed algorithm that translates modern words into their historic equivalents.

preprint2016arXiv

Exponential Family Embeddings

Word embeddings are a powerful approach for capturing semantic similarity among terms in a vocabulary. In this paper, we develop exponential family embeddings, a class of methods that extends the idea of word embeddings to other types of high-dimensional data. As examples, we studied neural data with real-valued observations, count data from a market basket analysis, and ratings data from a movie recommendation system. The main idea is to model each observation conditioned on a set of other observations. This set is called the context, and the way the context is defined is a modeling choice that depends on the problem. In language the context is the surrounding words; in neuroscience the context is close-by neurons; in market basket data the context is other items in the shopping cart. Each type of embedding model defines the context, the exponential family of conditional distributions, and how the latent embedding vectors are shared across data. We infer the embeddings with a scalable algorithm based on stochastic gradient descent. On all three applications - neural activity of zebrafish, users' shopping behavior, and movie ratings - we found exponential family embedding models to be more effective than other types of dimension reduction. They better reconstruct held-out data and find interesting qualitative structure.

preprint2016arXiv

Variational Tempering

Variational inference (VI) combined with data subsampling enables approximate posterior inference over large data sets, but suffers from poor local optima. We first formulate a deterministic annealing approach for the generic class of conditionally conjugate exponential family models. This approach uses a decreasing temperature parameter which deterministically deforms the objective during the course of the optimization. A well-known drawback to this annealing approach is the choice of the cooling schedule. We therefore introduce variational tempering, a variational algorithm that introduces a temperature latent variable to the model. In contrast to related work in the Markov chain Monte Carlo literature, this algorithm results in adaptive annealing schedules. Lastly, we develop local variational tempering, which assigns a latent temperature to each data point; this allows for dynamic annealing that varies across data. Compared to the traditional VI, all proposed approaches find improved predictive likelihoods on held-out data.

preprint2014arXiv

Comment on "Consistent thermostatistics forbids negative absolute temperatures"

In this comment we argue that negative absolute temperatures are a well-established concept for systems with bounded spectra. They are not only consistent with thermodynamics, but are even unavoidable for a consistent description of the thermal equilibrium of inverted populations.

preprint2014arXiv

Damping of Bloch oscillations: Variational solutions of the Boltzmann equation beyond linear response

Variational solutions of the Boltzmann equation usually rely on the concept of linear response. We extend the variational approach for tight-binding models at high entropies to a regime far beyond linear response. We analyze both weakly interacting fermions and incoherent bosons on a lattice. We consider a case where the particles are driven by a constant force, leading to the well-known Bloch oscillations, and we consider interactions that are weak enough not to overdamp these oscillations. This regime is computationally demanding and relevant for ultracold atoms in optical lattices. We derive a simple theory in terms of coupled dynamic equations for the particle density, energy density, current and heat current, allowing for analytic solutions. As an application, we identify damping coefficients for Bloch oscillations in the Hubbard model at weak interactions and compute them for a one-dimensional toy model. We also approximately solve the long-time dynamics of a weakly interacting, strongly Bloch-oscillating cloud of fermionic particles in a tilted lattice, leading to a subdiffusive scaling exponent.

preprint2014arXiv

Smoothed Gradients for Stochastic Variational Inference

Stochastic variational inference (SVI) lets us scale up Bayesian computation to massive data. It uses stochastic optimization to fit a variational distribution, following easy-to-compute noisy natural gradients. As with most traditional stochastic optimization methods, SVI takes precautions to use unbiased stochastic gradients whose expectations are equal to the true gradients. In this paper, we explore the idea of following biased stochastic gradients in SVI. Our method replaces the natural gradient with a similarly constructed vector that uses a fixed-window moving average of some of its previous terms. We will demonstrate the many advantages of this technique. First, its computational cost is the same as for SVI and storage requirements only multiply by a constant factor. Second, it enjoys significant variance reduction over the unbiased estimates, smaller bias than averaged gradients, and leads to smaller mean-squared error against the full gradient. We test our method on latent Dirichlet allocation with three large corpora.

preprint2014arXiv

Stochastic Differential Equations for Quantum Dynamics of Spin-Boson Networks

The quantum dynamics of open many-body systems poses a challenge for computational approaches. Here we develop a stochastic scheme based on the positive P phase-space representation to study the nonequilibrium dynamics of coupled spin-boson networks that are driven and dissipative. Such problems are at the forefront of experimental research in cavity and solid state realizations of quantum optics, as well as cold atom physics, trapped ions and superconducting circuits. We demonstrate and test our method on a driven, dissipative two-site system, each site involving a spin coupled to a photonic mode, with photons hopping between the sites, where we find good agreement with Monte Carlo Wavefunction simulations. In addition to numerically reproducing features recently observed in an experiment [Phys. Rev. X 4, 031043 (2014)], we also predict a novel steady state quantum dynamical phase transition for an asymmetric configuration of drive and dissipation.

preprint2013arXiv

Relaxation towards negative temperatures in bosonic systems: Generalized Gibbs ensembles and beyond integrability

Motivated by the recent experimental observation of negative absolute temperature states in systems of ultracold atomic gases in optical lattices [Braun et al., Science 339, 52 (2013)], we investigate theoretically the formation of these states. More specifically, we consider the relaxation after a sudden inversion of the external parabolic confining potential in the one-dimensional inhomogeneous Bose-Hubbard model. First, we focus on the integrable hard-core boson limit which allows us to treat large systems and arbitrarily long times, providing convincing numerical evidence for relaxation to a generalized Gibbs ensemble at negative temperature T<0, a notion we define in this context. Second, going beyond one dimension, we demonstrate that the emergence of negative temperature states can be understood in a dual way in terms of positive temperatures, which relies on a dynamic symmetry of the Hubbard model. We complement the study by exact diagonalization simulations at finite values of the on-site interaction.

preprint2012arXiv

Fermionic transport in a homogeneous Hubbard model: Out-of-equilibrium dynamics with ultracold atoms

Transport properties are among the defining characteristics of many important phases in condensed matter physics. In the presence of strong correlations they are difficult to predict even for model systems like the Hubbard model. In real materials they are in general obscured by additional complications including impurities, lattice defects or multi-band effects. Ultracold atoms in contrast offer the possibility to study transport and out-of-equilibrium phenomena in a clean and well-controlled environment and can therefore act as a quantum simulator for condensed matter systems. Here we studied the expansion of an initially confined fermionic quantum gas in the lowest band of a homogeneous optical lattice. While we observe ballistic transport for non-interacting atoms, even small interactions render the expansion almost bimodal with a dramatically reduced expansion velocity. The dynamics is independent of the sign of the interaction, revealing a novel, dynamic symmetry of the Hubbard model.

preprint2011arXiv

Interacting Fermionic Atoms in Optical Lattices Diffuse Symmetrically Upwards and Downwards in a Gravitational Potential

We consider a cloud of fermionic atoms in an optical lattice described by a Hubbard model with an additional linear potential. While homogeneous interacting systems mainly show damped Bloch oscillations and heating, a finite cloud behaves differently: It expands symmetrically such that gains of potential energy at the top are compensated by losses at the bottom. Interactions stabilize the necessary heat currents by inducing gradients of the inverse temperature 1/T, with T<0 at the bottom of the cloud. An analytic solution of hydrodynamic equations shows that the width of the cloud increases with t^(1/3) for long times consistent with results from our Boltzmann simulations.

preprint2010arXiv

Equilibration rates and negative absolute temperatures for ultracold atoms in optical lattices

As highly tunable interacting systems, cold atoms in optical lattices are ideal to realize and observe negative absolute temperatures, T < 0. We show theoretically that by reversing the confining potential, stable superfluid condensates at finite momentum and T < 0 can be created with low entropy production for attractive bosons. They may serve as `smoking gun' signatures of equilibrated T < 0. For fermions, we analyze the time scales needed to equilibrate to T < 0. For moderate interactions, the equilibration time is proportional to the square of the radius of the cloud and grows with increasing interaction strengths as atoms and energy are transported by diffusive processes.

Stephan Mandt

What is connected

Connect this record

See the researcher in context

Building this map preview

32 published item(s)

Skipping the Zeros in Diffusion Models for Sparse Data Generation

Lossy Image Compression with Conditional Diffusion Models

Hybridizing Physical and Data-driven Prediction Methods for Physicochemical Properties

Improving Sequential Latent Variable Models with Autoregressive Flows

Latent Outlier Exposure for Anomaly Detection with Contaminated Data

Learning to Simulate High Energy Particle Collisions from Unlabeled Data

Lossless Compression with Probabilistic Circuits

Making Thermodynamic Models of Mixtures Predictive by Machine Learning: Matrix Completion of Pair Interactions

Neural Transformation Learning for Deep Anomaly Detection Beyond Images

Raising the Bar in Graph-level Anomaly Detection

Structured Stochastic Gradient MCMC

Towards Empirical Sandwich Bounds on the Rate-Distortion Function

Improving Inference for Neural Image Compression

Scalable Gaussian Process Variational Autoencoders

Extreme Classification via Adversarial Softmax Approximation

GP-VAE: Deep Probabilistic Time Series Imputation

How Good is the Bayes Posterior in Deep Neural Networks Really?

Machine Learning in Thermodynamics: Prediction of Activity Coefficients by Matrix Completion

The k-tied Normal Distribution: A Compact Parameterization of Gaussian Mean Field Posteriors in Bayesian Neural Networks

Variational Bayesian Quantization

Tightening Bounds for Variational Inference by Revisiting Perturbation Theory

Improving Optimization for Models With Continuous Symmetry Breaking

Exponential Family Embeddings

Variational Tempering

Comment on "Consistent thermostatistics forbids negative absolute temperatures"

Damping of Bloch oscillations: Variational solutions of the Boltzmann equation beyond linear response

Smoothed Gradients for Stochastic Variational Inference

Stochastic Differential Equations for Quantum Dynamics of Spin-Boson Networks

Relaxation towards negative temperatures in bosonic systems: Generalized Gibbs ensembles and beyond integrability

Fermionic transport in a homogeneous Hubbard model: Out-of-equilibrium dynamics with ultracold atoms

Interacting Fermionic Atoms in Optical Lattices Diffuse Symmetrically Upwards and Downwards in a Gravitational Potential

Equilibration rates and negative absolute temperatures for ultracold atoms in optical lattices