Source author record

Thomas B. Schön

Thomas B. Schön appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Computation Systems and Control math.OC Computer Vision Robotics Methodology Artificial Intelligence eess.IV eess.SP Neural and Evolutionary Computing Applications Information Theory math.DS math.IT physics.med-ph q-fin.CP

Catalog footprint

What is connected

47works

17topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

How Do Electrocardiogram Models Scale?

While scaling laws have established a fundamental framework for foundation models in natural language processing, their applicability to electrocardiogram (ECG) models remains poorly characterized. Indeed, recent studies do not always yield consistent downstream gains as one increases the model size or pre-training dataset size of ECG models, leaving the exact roles of architectural inductive biases, pre-training paradigms, and expected improvements with size largely unanswered. In this work, we systematically investigate neural and loss-to-loss scaling laws within the ECG domain. By pre-training over $120$ models (ranging from $20$K to $200$M parameters) on the large-scale CODE dataset ($2.3$M records), we decouple the effects of model architecture (ResNet vs. Transformer) and pre-training paradigm, namely supervised learning (SL) versus self-supervised learning (SSL). We found that (i) SL models are data-bottlenecked in-distribution, whereas SSL models scale robustly across both model and data sizes; (ii) for out-of-distribution (OOD) generalization, ResNets are $1.3$ to $2.5$ times more parameter-efficient than Transformers, while SSL is up to $16$ times more data-efficient and achieves up to $7.6$ times higher transfer efficiency than SL on unseen clinical tasks; (iii) across the observed scales, ResNet-based models generally achieve the lowest OOD loss, with SSL dominating on unseen clinical tasks and self-supervised Transformers overtaking at very large model sizes. Our results suggest that the path to effective ECG foundation models lies in the strategic alignment of architecture and paradigm rather than brute-force scaling.

preprint2026arXiv

Structure-Preserving Gaussian Processes Via Discrete Euler-Lagrange Equations

In this paper, we propose Lagrangian Gaussian Processes (LGPs) for probabilistic and data-efficient learning of dynamics via discrete forced Euler-Lagrange equations. Importantly, the geometric structure of the Lagrange-d'Alembert principle, which governs the motion of dynamical systems, is preserved by construction in the absence of external forces. This allows learning physically consistent models that overcome erroneous drift in the system's energy, thereby providing stable long-term predictions. At the core of our approach lie linear operators for Gaussian process conditioning, constructed from discrete forced Euler-Lagrange equations and variational discretization schemes. Thereby and unlike prior work, the method enables learning dynamics from discrete position snapshots, i.e., without access to a system's velocities or momenta. This is particularly relevant for a large class of practical scenarios where only position measurements are available, for instance, in motion capture or visual servoing applications. We demonstrate the data-efficiency and generalization capabilities of the LGPs in various synthetic and real-world case studies, including a real-world soft robot with hysteresis. The experimental results underscore that the LGPs learn physically consistent dynamics with uncertainty quantification solely from sparse positional data and enable stable long-term predictions.

preprint2022arXiv

Efficient Learning of the Parameters of Non-Linear Models using Differentiable Resampling in Particle Filters

It has been widely documented that the sampling and resampling steps in particle filters cannot be differentiated. The {\itshape reparameterisation trick} was introduced to allow the sampling step to be reformulated into a differentiable function. We extend the {\itshape reparameterisation trick} to include the stochastic input to resampling therefore limiting the discontinuities in the gradient calculation after this step. Knowing the gradients of the prior and likelihood allows us to run particle Markov Chain Monte Carlo (p-MCMC) and use the No-U-Turn Sampler (NUTS) as the proposal when estimating parameters. We compare the Metropolis-adjusted Langevin algorithm (MALA), Hamiltonian Monte Carlo with different number of steps and NUTS. We consider two state-space models and show that NUTS improves the mixing of the Markov chain and can produce more accurate results in less computational time.

preprint2021arXiv

How Convolutional Neural Networks Deal with Aliasing

The convolutional neural network (CNN) remains an essential tool in solving computer vision problems. Standard convolutional architectures consist of stacked layers of operations that progressively downscale the image. Aliasing is a well-known side-effect of downsampling that may take place: it causes high-frequency components of the original signal to become indistinguishable from its low-frequency components. While downsampling takes place in the max-pooling layers or in the strided-convolutions in these models, there is no explicit mechanism that prevents aliasing from taking place in these layers. Due to the impressive performance of these models, it is natural to suspect that they, somehow, implicitly deal with this distortion. The question we aim to answer in this paper is simply: "how and to what extent do CNNs counteract aliasing?" We explore the question by means of two examples: In the first, we assess the CNNs capability of distinguishing oscillations at the input, showing that the redundancies in the intermediate channels play an important role in succeeding at the task; In the second, we show that an image classifier CNN while, in principle, capable of implementing anti-aliasing filters, does not prevent aliasing from taking place in the intermediate layers.

preprint2021arXiv

Probabilistic programming for birth-death models of evolution using an alive particle filter with delayed sampling

We consider probabilistic programming for birth-death models of evolution and introduce a new widely-applicable inference method that combines an extension of the alive particle filter (APF) with automatic Rao-Blackwellization via delayed sampling. Birth-death models of evolution are an important family of phylogenetic models of the diversification processes that lead to evolutionary trees. Probabilistic programming languages (PPLs) give phylogeneticists a new and exciting tool: their models can be implemented as probabilistic programs with just a basic knowledge of programming. The general inference methods in PPLs reduce the need for external experts, allow quick prototyping and testing, and accelerate the development and deployment of new models. We show how these birth-death models can be implemented as simple programs in existing PPLs, and demonstrate the usefulness of the proposed inference method for such models. For the popular BiSSE model the method yields an increase of the effective sample size and the conditional acceptance rate by a factor of 30 in comparison with a standard bootstrap particle filter. Although concentrating on phylogenetics, the extended APF is a general inference method that shows its strength in situations where particles are often assigned zero weight. In the case when the weights are always positive, the extra cost of using the APF rather than the bootstrap particle filter is negligible, making our method a suitable drop-in replacement for the bootstrap particle filter in probabilistic programming inference.

preprint2020arXiv

Automated learning with a probabilistic programming language: Birch

This work offers a broad perspective on probabilistic modeling and inference in light of recent advances in probabilistic programming, in which models are formally expressed in Turing-complete programming languages. We consider a typical workflow and how probabilistic programming languages can help to automate this workflow, especially in the matching of models with inference methods. We focus on two properties of a model that are critical in this matching: its structure---the conditional dependencies between random variables---and its form---the precise mathematical definition of those dependencies. While the structure and form of a probabilistic model are often fixed a priori, it is a curiosity of probabilistic programming that they need not be, and may instead vary according to random choices made during program execution. We introduce a formal description of models expressed as programs, and discuss some of the ways in which probabilistic programming languages can reveal the structure and form of these, in order to tailor inference methods. We demonstrate the ideas with a new probabilistic programming language called Birch, with a multiple object tracking example.

preprint2020arXiv

Automatic diagnosis of the 12-lead ECG using a deep neural network

The role of automatic electrocardiogram (ECG) analysis in clinical practice is limited by the accuracy of existing models. Deep Neural Networks (DNNs) are models composed of stacked transformations that learn tasks by examples. This technology has recently achieved striking success in a variety of task and there are great expectations on how it might improve clinical practice. Here we present a DNN model trained in a dataset with more than 2 million labeled exams analyzed by the Telehealth Network of Minas Gerais and collected under the scope of the CODE (Clinical Outcomes in Digital Electrocardiology) study. The DNN outperform cardiology resident medical doctors in recognizing 6 types of abnormalities in 12-lead ECG recordings, with F1 scores above 80% and specificity over 99%. These results indicate ECG analysis based on DNNs, previously studied in a single-lead setup, generalizes well to 12-lead exams, taking the technology closer to the standard clinical practice.

preprint2020arXiv

Beyond exploding and vanishing gradients: analysing RNN training using attractors and smoothness

The exploding and vanishing gradient problem has been the major conceptual principle behind most architecture and training improvements in recurrent neural networks (RNNs) during the last decade. In this paper, we argue that this principle, while powerful, might need some refinement to explain recent developments. We refine the concept of exploding gradients by reformulating the problem in terms of the cost function smoothness, which gives insight into higher-order derivatives and the existence of regions with many close local minima. We also clarify the distinction between vanishing gradients and the need for the RNN to learn attractors to fully use its expressive power. Through the lens of these refinements, we shed new light on recent developments in the RNN field, namely stable RNN and unitary (or orthogonal) RNNs.

preprint2020arXiv

Energy-Based Models for Deep Probabilistic Regression

While deep learning-based classification is generally tackled using standardized approaches, a wide variety of techniques are employed for regression. In computer vision, one particularly popular such technique is that of confidence-based regression, which entails predicting a confidence value for each input-target pair (x,y). While this approach has demonstrated impressive results, it requires important task-dependent design choices, and the predicted confidences lack a natural probabilistic meaning. We address these issues by proposing a general and conceptually simple regression method with a clear probabilistic interpretation. In our proposed approach, we create an energy-based model of the conditional target density p(y|x), using a deep neural network to predict the un-normalized density from (x,y). This model of p(y|x) is trained by directly minimizing the associated negative log-likelihood, approximated using Monte Carlo sampling. We perform comprehensive experiments on four computer vision regression tasks. Our approach outperforms direct regression, as well as other probabilistic and confidence-based methods. Notably, our model achieves a 2.2% AP improvement over Faster-RCNN for object detection on the COCO dataset, and sets a new state-of-the-art on visual tracking when applied for bounding box estimation. In contrast to confidence-based methods, our approach is also shown to be directly applicable to more general tasks such as age and head-pose estimation. Code is available at https://github.com/fregu856/ebms_regression.

preprint2020arXiv

Evaluating Scalable Bayesian Deep Learning Methods for Robust Computer Vision

While deep neural networks have become the go-to approach in computer vision, the vast majority of these models fail to properly capture the uncertainty inherent in their predictions. Estimating this predictive uncertainty can be crucial, for example in automotive applications. In Bayesian deep learning, predictive uncertainty is commonly decomposed into the distinct types of aleatoric and epistemic uncertainty. The former can be estimated by letting a neural network output the parameters of a certain probability distribution. Epistemic uncertainty estimation is a more challenging problem, and while different scalable methods recently have emerged, no extensive comparison has been performed in a real-world setting. We therefore accept this task and propose a comprehensive evaluation framework for scalable epistemic uncertainty estimation methods in deep learning. Our proposed framework is specifically designed to test the robustness required in real-world computer vision applications. We also apply this framework to provide the first properly extensive and conclusive comparison of the two current state-of-the-art scalable methods: ensembling and MC-dropout. Our comparison demonstrates that ensembling consistently provides more reliable and practically useful uncertainty estimates. Code is available at https://github.com/fregu856/evaluating_bdl.

preprint2020arXiv

How to Train Your Energy-Based Model for Regression

Energy-based models (EBMs) have become increasingly popular within computer vision in recent years. While they are commonly employed for generative image modeling, recent work has applied EBMs also for regression tasks, achieving state-of-the-art performance on object detection and visual tracking. Training EBMs is however known to be challenging. While a variety of different techniques have been explored for generative modeling, the application of EBMs to regression is not a well-studied problem. How EBMs should be trained for best possible regression performance is thus currently unclear. We therefore accept the task of providing the first detailed study of this problem. To that end, we propose a simple yet highly effective extension of noise contrastive estimation, and carefully compare its performance to six popular methods from literature on the tasks of 1D regression and object detection. The results of this comparison suggest that our training method should be considered the go-to approach. We also apply our method to the visual tracking task, achieving state-of-the-art performance on five datasets. Notably, our tracker achieves 63.7% AUC on LaSOT and 78.7% Success on TrackingNet. Code is available at https://github.com/fregu856/ebms_regression.

preprint2020arXiv

On the smoothness of nonlinear system identification

We shed new light on the \textit{smoothness} of optimization problems arising in prediction error parameter estimation of linear and nonlinear systems. We show that for regions of the parameter space where the model is not contractive, the Lipschitz constant and $β$-smoothness of the objective function might blow up exponentially with the simulation length, making it hard to numerically find minima within those regions or, even, to escape from them. In addition to providing theoretical understanding of this problem, this paper also proposes the use of multiple shooting as a viable solution. The proposed method minimizes the error between a prediction model and the observed values. Rather than running the prediction model over the entire dataset, multiple shooting splits the data into smaller subsets and runs the prediction model over each subset, making the simulation length a design parameter and making it possible to solve problems that would be infeasible using a standard approach. The equivalence to the original problem is obtained by including constraints in the optimization. The new method is illustrated by estimating the parameters of nonlinear systems with chaotic or unstable behavior, as well as neural networks. We also present a comparative analysis of the proposed method with multi-step-ahead prediction error minimization.

preprint2020arXiv

Particle filter with rejection control and unbiased estimator of the marginal likelihood

We consider the combined use of resampling and partial rejection control in sequential Monte Carlo methods, also known as particle filters. While the variance reducing properties of rejection control are known, there has not been (to the best of our knowledge) any work on unbiased estimation of the marginal likelihood (also known as the model evidence or the normalizing constant) in this type of particle filter. Being able to estimate the marginal likelihood without bias is highly relevant for model comparison, computation of interpretable and reliable confidence intervals, and in exact approximation methods, such as particle Markov chain Monte Carlo. In the paper we present a particle filter with rejection control that enables unbiased estimation of the marginal likelihood.

preprint2020arXiv

Registration by tracking for sequential 2D MRI

Our anatomy is in constant motion. With modern MR imaging it is possible to record this motion in real-time during an ongoing radiation therapy session. In this paper we present an image registration method that exploits the sequential nature of 2D MR images to estimate the corresponding displacement field. The method employs several discriminative correlation filters that independently track specific points. Together with a sparse-to-dense interpolation scheme we can then estimate of the displacement field. The discriminative correlation filters are trained online, and our method is modality agnostic. For the interpolation scheme we use a neural network with normalized convolutions that is trained using synthetic diffeomorphic displacement fields. The method is evaluated on a segmented cardiac dataset and when compared to two conventional methods we observe an improved performance. This improvement is especially pronounced when it comes to the detection of larger motions of small objects.

preprint2016arXiv

A Scalable and Distributed Solution to the Inertial Motion Capture Problem

In inertial motion capture, a multitude of body segments are equipped with inertial sensors, consisting of 3D accelerometers and 3D gyroscopes. Using an optimization-based approach to solve the motion capture problem allows for natural inclusion of biomechanical constraints and for modeling the connection of the body segments at the joint locations. The computational complexity of solving this problem grows both with the length of the data set and with the number of sensors and body segments considered. In this work, we present a scalable and distributed solution to this problem using tailored message passing, capable of exploiting the structure that is inherent in the problem. As a proof-of-concept we apply our algorithm to data from a lower body configuration.

preprint2016arXiv

Computationally Efficient Bayesian Learning of Gaussian Process State Space Models

Gaussian processes allow for flexible specification of prior assumptions of unknown dynamics in state space models. We present a procedure for efficient Bayesian learning in Gaussian process state space models, where the representation is formed by projecting the problem onto a set of approximate eigenfunctions derived from the prior covariance structure. Learning under this family of models can be conducted using a carefully crafted particle MCMC algorithm. This scheme is computationally efficient and yet allows for a fully Bayesian treatment of the problem. Compared to conventional system identification tools or existing learning methods, we show competitive performance and reliable quantification of uncertainties in the model.

preprint2016arXiv

Coupling of Particle Filters

Particle filters provide Monte Carlo approximations of intractable quantities such as point-wise evaluations of the likelihood in state space models. In many scenarios, the interest lies in the comparison of these quantities as some parameter or input varies. To facilitate such comparisons, we introduce and study methods to couple two particle filters in such a way that the correlation between the two underlying particle systems is increased. The motivation stems from the classic variance reduction technique of positively correlating two estimators. The key challenge in constructing such a coupling stems from the discontinuity of the resampling step of the particle filter. As our first contribution, we consider coupled resampling algorithms. Within bootstrap particle filters, they improve the precision of finite-difference estimators of the score vector and boost the performance of particle marginal Metropolis--Hastings algorithms for parameter inference. The second contribution arises from the use of these coupled resampling schemes within conditional particle filters, allowing for unbiased estimators of smoothing functionals. The result is a new smoothing strategy that operates by averaging a number of independent and unbiased estimators, which allows for 1) straightforward parallelization and 2) the construction of accurate error estimates. Neither of the above is possible with existing particle smoothers.

preprint2016arXiv

High-dimensional Filtering using Nested Sequential Monte Carlo

Sequential Monte Carlo (SMC) methods comprise one of the most successful approaches to approximate Bayesian filtering. However, SMC without good proposal distributions struggle in high dimensions. We propose nested sequential Monte Carlo (NSMC), a methodology that generalises the SMC framework by requiring only approximate, properly weighted, samples from the SMC proposal distribution, while still resulting in a correct SMC algorithm. This way we can exactly approximate the locally optimal proposal, and extend the class of models for which we can perform efficient inference using SMC. We show improved accuracy over other state-of-the-art methods on several spatio-temporal state space models.

preprint2016arXiv

Linear System Identification via EM with Latent Disturbances and Lagrangian Relaxation

In the application of the Expectation Maximization algorithm to identification of dynamical systems, internal states are typically chosen as latent variables, for simplicity. In this work, we propose a different choice of latent variables, namely, system disturbances. Such a formulation elegantly handles the problematic case of singular state space models, and is shown, under certain circumstances, to improve the fidelity of bounds on the likelihood, leading to convergence in fewer iterations. To access these benefits we develop a Lagrangian relaxation of the nonconvex optimization problems that arise in the latent disturbances formulation, and proceed via semidefinite programming.

preprint2016arXiv

Magnetometer calibration using inertial sensors

In this work we present a practical algorithm for calibrating a magnetometer for the presence of magnetic disturbances and for magnetometer sensor errors. To allow for combining the magnetometer measurements with inertial measurements for orientation estimation, the algorithm also corrects for misalignment between the magnetometer and the inertial sensor axes. The calibration algorithm is formulated as the solution to a maximum likelihood problem and the computations are performed offline. The algorithm is shown to give good results using data from two different commercially available sensor units. Using the calibrated magnetometer measurements in combination with the inertial sensors to determine the sensor's orientation is shown to lead to significantly improved heading estimates.

preprint2016arXiv

Mean and variance of the LQG cost function

Linear Quadratic Gaussian (LQG) systems are well-understood and methods to minimize the expected cost are readily available. Less is known about the statistical properties of the resulting cost function. The contribution of this paper is a set of analytic expressions for the mean and variance of the LQG cost function. These expressions are derived using two different methods, one using solutions to Lyapunov equations and the other using only matrix exponentials. Both the discounted and the non-discounted cost function are considered, as well as the finite-time and the infinite-time cost function. The derived expressions are successfully applied to an example system to reduce the probability of the cost exceeding a given threshold.

preprint2016arXiv

Particle-based Gaussian process optimization for input design in nonlinear dynamical models

We propose a novel approach to input design for identification of nonlinear state space models. The optimal input sequence is obtained by maximizing a scalar cost function of the Fisher information matrix. Since the Fisher information matrix is unavailable in closed form, it is estimated using particle methods. In addition, we make use of Gaussian process optimization to find the optimal input and to mitigate the problem of a large computational cost incurred by the particle filter, as the method reduces the number of functional evaluations. Numerical examples are provided to illustrate the performance of the resulting algorithm.

preprint2016arXiv

Sequential Monte Carlo Methods for System Identification

One of the key challenges in identifying nonlinear and possibly non-Gaussian state space models (SSMs) is the intractability of estimating the system state. Sequential Monte Carlo (SMC) methods, such as the particle filter (introduced more than two decades ago), provide numerical solutions to the nonlinear state estimation problems arising in SSMs. When combined with additional identification techniques, these algorithms provide solid solutions to the nonlinear system identification problem. We describe two general strategies for creating such combinations and discuss why SMC is a natural tool for implementing these strategies.

preprint2015arXiv

Accelerating pseudo-marginal Metropolis-Hastings by correlating auxiliary variables

Pseudo-marginal Metropolis-Hastings (pmMH) is a powerful method for Bayesian inference in models where the posterior distribution is analytical intractable or computationally costly to evaluate directly. It operates by introducing additional auxiliary variables into the model and form an extended target distribution, which then can be evaluated point-wise. In many cases, the standard Metropolis-Hastings is then applied to sample from the extended target and the sought posterior can be obtained by marginalisation. However, in some implementations this approach suffers from poor mixing as the auxiliary variables are sampled from an independent proposal. We propose a modification to the pmMH algorithm in which a Crank-Nicolson (CN) proposal is used instead. This results in that we introduce a positive correlation in the auxiliary variables. We investigate how to tune the CN proposal and its impact on the mixing of the resulting pmMH sampler. The conclusion is that the proposed modification can have a beneficial effect on both the mixing of the Markov chain and the computational cost for each iteration of the pmMH algorithm.

preprint2015arXiv

Data-Efficient Learning of Feedback Policies from Image Pixels using Deep Dynamical Models

Data-efficient reinforcement learning (RL) in continuous state-action spaces using very high-dimensional observations remains a key challenge in developing fully autonomous systems. We consider a particularly important instance of this challenge, the pixels-to-torques problem, where an RL agent learns a closed-loop control policy ("torques") from pixel information only. We introduce a data-efficient, model-based reinforcement learning algorithm that learns such a closed-loop policy directly from pixel information. The key ingredient is a deep dynamical model for learning a low-dimensional feature embedding of images jointly with a predictive model in this low-dimensional feature space. Joint learning is crucial for long-term predictions, which lie at the core of the adaptive nonlinear model predictive control strategy that we use for closed-loop control. Compared to state-of-the-art RL methods for continuous states and actions, our approach learns quickly, scales to high-dimensional state spaces, is lightweight and an important step toward fully autonomous end-to-end learning from pixels to torques.

preprint2015arXiv

From Pixels to Torques: Policy Learning with Deep Dynamical Models

Data-efficient learning in continuous state-action spaces using very high-dimensional observations remains a key challenge in developing fully autonomous systems. In this paper, we consider one instance of this challenge, the pixels to torques problem, where an agent must learn a closed-loop control policy from pixel information only. We introduce a data-efficient, model-based reinforcement learning algorithm that learns such a closed-loop policy directly from pixel information. The key ingredient is a deep dynamical model that uses deep auto-encoders to learn a low-dimensional embedding of images jointly with a predictive model in this low-dimensional feature space. Joint learning ensures that not only static but also dynamic properties of the data are accounted for. This is crucial for long-term predictions, which lie at the core of the adaptive model predictive control strategy that we use for closed-loop control. Compared to state-of-the-art reinforcement learning methods for continuous states and actions, our approach learns quickly, scales to high-dimensional state spaces and is an important step toward fully autonomous learning from pixels to torques.

preprint2015arXiv

Marginalizing Gaussian Process Hyperparameters using Sequential Monte Carlo

Gaussian process regression is a popular method for non-parametric probabilistic modeling of functions. The Gaussian process prior is characterized by so-called hyperparameters, which often have a large influence on the posterior model and can be difficult to tune. This work provides a method for numerical marginalization of the hyperparameters, relying on the rigorous framework of sequential Monte Carlo. Our method is well suited for online problems, and we demonstrate its ability to handle real-world problems with several dimensions and compare it to other marginalization methods. We also conclude that our proposed method is a competitive alternative to the commonly used point estimates maximizing the likelihood, both in terms of computational load and its ability to handle multimodal posteriors.

preprint2015arXiv

Nested Sequential Monte Carlo Methods

We propose nested sequential Monte Carlo (NSMC), a methodology to sample from sequences of probability distributions, even where the random variables are high-dimensional. NSMC generalises the SMC framework by requiring only approximate, properly weighted, samples from the SMC proposal distribution, while still resulting in a correct SMC algorithm. Furthermore, NSMC can in itself be used to produce such properly weighted samples. Consequently, one NSMC sampler can be used to construct an efficient high-dimensional proposal distribution for another NSMC sampler, and this nesting of the algorithm can be done to an arbitrary degree. This allows us to consider complex and high-dimensional models using SMC. We show results that motivate the efficacy of our approach on several filtering problems with dimensions in the order of 100 to 1 000.

preprint2015arXiv

Newton-based maximum likelihood estimation in nonlinear state space models

Maximum likelihood (ML) estimation using Newton's method in nonlinear state space models (SSMs) is a challenging problem due to the analytical intractability of the log-likelihood and its gradient and Hessian. We estimate the gradient and Hessian using Fisher's identity in combination with a smoothing algorithm. We explore two approximations of the log-likelihood and of the solution of the smoothing problem. The first is a linearization approximation which is computationally cheap, but the accuracy typically varies between models. The second is a sampling approximation which is asymptotically valid for any SSM but is more computationally costly. We demonstrate our approach for ML parameter estimation on simulated data from two different SSMs with encouraging results.

preprint2015arXiv

Nonlinear State Space Model Identification Using a Regularized Basis Function Expansion

This paper is concerned with black-box identification of nonlinear state space models. By using a basis function expansion within the state space model, we obtain a flexible structure. The model is identified using an expectation maximization approach, where the states and the parameters are updated iteratively in such a way that a maximum likelihood estimate is obtained. We use recent particle methods with sound theoretical properties to infer the states, whereas the model parameters can be updated using closed-form expressions by exploiting the fact that our model is linear in the parameters. Not to over-fit the flexible model to the data, we also propose a regularization scheme without increasing the computational burden. Importantly, this opens up for systematic use of regularization in nonlinear state space models. We conclude by evaluating our proposed approach on one simulation example and two real-data problems.

preprint2015arXiv

Nonlinear state space smoothing using the conditional particle filter

To estimate the smoothing distribution in a nonlinear state space model, we apply the conditional particle filter with ancestor sampling. This gives an iterative algorithm in a Markov chain Monte Carlo fashion, with asymptotic convergence results. The computational complexity is analyzed, and our proposed algorithm is successfully applied to the challenging problem of sensor fusion between ultra-wideband and accelerometer/gyroscope measurements for indoor positioning. It appears to be a competitive alternative to existing nonlinear smoothing algorithms, in particular the forward filtering-backward simulation smoother.

preprint2015arXiv

Particle ancestor sampling for near-degenerate or intractable state transition models

We consider Bayesian inference in sequential latent variable models in general, and in nonlinear state space models in particular (i.e., state smoothing). We work with sequential Monte Carlo (SMC) algorithms, which provide a powerful inference framework for addressing this problem. However, for certain challenging and common model classes the state-of-the-art algorithms still struggle. The work is motivated in particular by two such model classes: (i) models where the state transition kernel is (nearly) degenerate, i.e. (nearly) concentrated on a low-dimensional manifold, and (ii) models where point-wise evaluation of the state transition density is intractable. Both types of models arise in many applications of interest, including tracking, epidemiology, and econometrics. The difficulties with these types of models is that they essentially rule out forward-backward-based methods, which are known to be of great practical importance, not least to construct computationally efficient particle Markov chain Monte Carlo (PMCMC) algorithms. To alleviate this, we propose a "particle rejuvenation" technique to enable the use of the forward-backward strategy for (nearly) degenerate models and, by extension, for intractable models. We derive the proposed method specifically within the context of PMCMC, but we emphasise that it is applicable to any forward-backward-based Monte Carlo method.

preprint2015arXiv

Quasi-Newton particle Metropolis-Hastings

Particle Metropolis-Hastings enables Bayesian parameter inference in general nonlinear state space models (SSMs). However, in many implementations a random walk proposal is used and this can result in poor mixing if not tuned correctly using tedious pilot runs. Therefore, we consider a new proposal inspired by quasi-Newton algorithms that may achieve similar (or better) mixing with less tuning. An advantage compared to other Hessian based proposals, is that it only requires estimates of the gradient of the log-posterior. A possible application is parameter inference in the challenging class of SSMs with intractable likelihoods. We exemplify this application and the benefits of the new proposal by modelling log-returns of future contracts on coffee by a stochastic volatility model with $α$-stable observations.

preprint2015arXiv

Rao-Blackwellized particle smoothers for conditionally linear Gaussian models

Sequential Monte Carlo (SMC) methods, such as the particle filter, are by now one of the standard computational techniques for addressing the filtering problem in general state-space models. However, many applications require post-processing of data offline. In such scenarios the smoothing problem--in which all the available data is used to compute state estimates--is of central interest. We consider the smoothing problem for a class of conditionally linear Gaussian models. We present a forward-backward-type Rao-Blackwellized particle smoother (RBPS) that is able to exploit the tractable substructure present in these models. Akin to the well known Rao-Blackwellized particle filter, the proposed RBPS marginalizes out a conditionally tractable subset of state variables, effectively making use of SMC only for the "intractable part" of the model. Compared to existing RBPS, two key features of the proposed method are: (i) it does not require structural approximations of the model, and (ii) the aforementioned marginalization is done both in the forward direction and in the backward direction.

preprint2014arXiv

A graph/particle-based method for experiment design in nonlinear systems

We propose an extended method for experiment design in nonlinear state space models. The proposed input design technique optimizes a scalar cost function of the information matrix, by computing the optimal stationary probability mass function (pmf) from which an input sequence is sampled. The feasible set of the stationary pmf is a polytope, allowing it to be expressed as a convex combination of its extreme points. The extreme points in the feasible set of pmf's can be computed using graph theory. Therefore, the final information matrix can be approximated as a convex combination of the information matrices associated with each extreme point. For nonlinear systems, the information matrices for each extreme point can be computed by using particle methods. Numerical examples show that the proposed technique can be successfully employed for experiment design in nonlinear systems.

preprint2014arXiv

A new structure exploiting derivation of recursive direct weight optimization

The recursive direct weight optimization method is used to solve challenging nonlinear system identification problems. This note provides a new derivation and a new interpretation of the method. The key underlying the note is to acknowledge and exploit a certain structure inherent in the problem.

preprint2014arXiv

Capacity estimation of two-dimensional channels using Sequential Monte Carlo

We derive a new Sequential-Monte-Carlo-based algorithm to estimate the capacity of two-dimensional channel models. The focus is on computing the noiseless capacity of the 2-D one-infinity run-length limited constrained channel, but the underlying idea is generally applicable. The proposed algorithm is profiled against a state-of-the-art method, yielding more than an order of magnitude improvement in estimation accuracy for a given computation time.

preprint2014arXiv

Identification of jump Markov linear models using particle filters

Jump Markov linear models consists of a finite number of linear state space models and a discrete variable encoding the jumps (or switches) between the different linear models. Identifying jump Markov linear models makes for a challenging problem lacking an analytical solution. We derive a new expectation maximization (EM) type algorithm that produce maximum likelihood estimates of the model parameters. Our development hinges upon recent progress in combining particle filters with Markov chain Monte Carlo methods in solving the nonlinear state smoothing problem inherent in the EM formulation. Key to our development is that we exploit a conditionally linear Gaussian substructure in the model, allowing for an efficient algorithm.

preprint2014arXiv

Learning deep dynamical models from image pixels

Modeling dynamical systems is important in many disciplines, e.g., control, robotics, or neurotechnology. Commonly the state of these systems is not directly observed, but only available through noisy and potentially high-dimensional observations. In these cases, system identification, i.e., finding the measurement mapping and the transition mapping (system dynamics) in latent space can be challenging. For linear system dynamics and measurement mappings efficient solutions for system identification are available. However, in practical applications, the linearity assumptions does not hold, requiring non-linear system identification techniques. If additionally the observations are high-dimensional (e.g., images), non-linear system identification is inherently hard. To address the problem of non-linear system identification from high-dimensional observations, we combine recent advances in deep learning and system identification. In particular, we jointly learn a low-dimensional embedding of the observation by means of deep auto-encoders and a predictive transition model in this low-dimensional space. We demonstrate that our model enables learning good predictive models of dynamical systems from pixel information only.

preprint2014arXiv

Particle Gibbs with Ancestor Sampling

Particle Markov chain Monte Carlo (PMCMC) is a systematic way of combining the two main tools used for Monte Carlo statistical inference: sequential Monte Carlo (SMC) and Markov chain Monte Carlo (MCMC). We present a novel PMCMC algorithm that we refer to as particle Gibbs with ancestor sampling (PGAS). PGAS provides the data analyst with an off-the-shelf class of Markov kernels that can be used to simulate the typically high-dimensional and highly autocorrelated state trajectory in a state-space model. The ancestor sampling procedure enables fast mixing of the PGAS kernel even when using seemingly few particles in the underlying SMC sampler. This is important as it can significantly reduce the computational burden that is typically associated with using SMC. PGAS is conceptually similar to the existing PG with backward simulation (PGBS) procedure. Instead of using separate forward and backward sweeps as in PGBS, however, we achieve the same effect in a single forward sweep. This makes PGAS well suited for addressing inference problems not only in state-space models, but also in models with more complex dependencies, such as non-Markovian, Bayesian nonparametric, and general probabilistic graphical models.

preprint2014arXiv

Particle Metropolis-Hastings using gradient and Hessian information

Particle Metropolis-Hastings (PMH) allows for Bayesian parameter inference in nonlinear state space models by combining Markov chain Monte Carlo (MCMC) and particle filtering. The latter is used to estimate the intractable likelihood. In its original formulation, PMH makes use of a marginal MCMC proposal for the parameters, typically a Gaussian random walk. However, this can lead to a poor exploration of the parameter space and an inefficient use of the generated particles. We propose a number of alternative versions of PMH that incorporate gradient and Hessian information about the posterior into the proposal. This information is more or less obtained as a byproduct of the likelihood estimation. Indeed, we show how to estimate the required information using a fixed-lag particle smoother, with a computational cost growing linearly in the number of particles. We conclude that the proposed methods can: (i) decrease the length of the burn-in phase, (ii) increase the mixing of the Markov chain at the stationary phase, and (iii) make the proposal distribution scale invariant which simplifies tuning.

preprint2014arXiv

Sequential Monte Carlo for Graphical Models

We propose a new framework for how to use sequential Monte Carlo (SMC) algorithms for inference in probabilistic graphical models (PGM). Via a sequential decomposition of the PGM we find a sequence of auxiliary distributions defined on a monotonically increasing sequence of probability spaces. By targeting these auxiliary distributions using SMC we are able to approximate the full joint distribution defined by the PGM. One of the key merits of the SMC sampler is that it provides an unbiased estimate of the partition function of the model. We also show how it can be used within a particle Markov chain Monte Carlo framework in order to construct high-dimensional block-sampling algorithms for general PGMs.

preprint2013arXiv

Bayesian Inference and Learning in Gaussian Process State-Space Models with Particle MCMC

State-space models are successfully used in many areas of science, engineering and economics to model time series and dynamical systems. We present a fully Bayesian approach to inference \emph{and learning} (i.e. state estimation and system identification) in nonlinear nonparametric state-space models. We place a Gaussian process prior over the state transition dynamics, resulting in a flexible model able to capture complex dynamical phenomena. To enable efficient inference, we marginalize over the transition dynamics function and infer directly the joint smoothing distribution using specially tailored Particle Markov Chain Monte Carlo samplers. Once a sample from the smoothing distribution is computed, the state transition predictive distribution can be formulated analytically. Our approach preserves the full nonparametric expressivity of the model and can make use of sparse Gaussian processes to greatly reduce computational complexity.

preprint2013arXiv

Identification of Gaussian Process State-Space Models with Particle Stochastic Approximation EM

Gaussian process state-space models (GP-SSMs) are a very flexible family of models of nonlinear dynamical systems. They comprise a Bayesian nonparametric representation of the dynamics of the system and additional (hyper-)parameters governing the properties of this nonparametric representation. The Bayesian formalism enables systematic reasoning about the uncertainty in the system dynamics. We present an approach to maximum likelihood identification of the parameters in GP-SSMs, while retaining the full nonparametric description of the dynamics. The method is based on a stochastic approximation version of the EM algorithm that employs recent developments in particle Markov chain Monte Carlo for efficient identification.

preprint2013arXiv

Inference in Gaussian models with missing data using Equalisation Maximisation

Equalisation Maximisation (EqM) is an algorithm for estimating parameters in auto-regressive (AR) models where some fraction of the data is missing. It has previously been shown that the EqM algorithm is a competitive alternative to expectation maximisation, estimating models with equal predictive capability at a lower computational cost. The EqM algorithm has previously been motivated as a heuristic. In this paper, we instead show that EqM can be viewed as an approximation of a proximal point algorithm. We also derive the method for the entire class of Gaussian models and exemplify its use for estimation of ARMA models with missing data. The resulting method is evaluated in numerical simulations, resulting in similar results as for the AR processes.

preprint2012arXiv

Ancestor Sampling for Particle Gibbs

We present a novel method in the family of particle MCMC methods that we refer to as particle Gibbs with ancestor sampling (PG-AS). Similarly to the existing PG with backward simulation (PG-BS) procedure, we use backward sampling to (considerably) improve the mixing of the PG kernel. Instead of using separate forward and backward sweeps as in PG-BS, however, we achieve the same effect in a single forward sweep. We apply the PG-AS framework to the challenging class of non-Markovian state-space models. We develop a truncation strategy of these models that is applicable in principle to any backward-simulation-based method, but which is particularly well suited to the PG-AS framework. In particular, as we show in a simulation study, PG-AS can yield an order-of-magnitude improved accuracy relative to PG-BS due to its robustness to the truncation error. Several application examples are discussed, including Rao-Blackwellized particle smoothing and inference in degenerate state-space models.

preprint2012arXiv

On the use of backward simulation in particle Markov chain Monte Carlo methods

Recently, Andrieu, Doucet and Holenstein (2010) introduced a general framework for using particle filters (PFs) to construct proposal kernels for Markov chain Monte Carlo (MCMC) methods. This framework, termed Particle Markov chain Monte Carlo (PMCMC), was shown to provide powerful methods for joint Bayesian state and parameter inference in nonlinear/non-Gaussian state-space models. However, the mixing of the resulting MCMC kernels can be quite sensitive, both to the number of particles used in the underlying PF and to the number of observations in the data. In the discussion following (Andrieu et al., 2010), Whiteley suggested a modified version of one of the PMCMC samplers, namely the particle Gibbs (PG) sampler, and argued that this should improve its mixing. In this paper we explore the consequences of this modification and show that it leads to a method which is much more robust to a low number of particles as well as a large number of observations. Furthermore, we discuss how the modified PG sampler can be used as a basis for alternatives to all three PMCMC samplers derived in (Andrieu et al., 2010). We evaluate these methods on several challenging inference problems in a simulation study. One of these is the identification of an epidemiological model for predicting influenza epidemics, based on search engine query data.

Thomas B. Schön

What is connected

Connect this record

See the researcher in context

Building this map preview

47 published item(s)

How Do Electrocardiogram Models Scale?

Structure-Preserving Gaussian Processes Via Discrete Euler-Lagrange Equations

Efficient Learning of the Parameters of Non-Linear Models using Differentiable Resampling in Particle Filters

How Convolutional Neural Networks Deal with Aliasing

Probabilistic programming for birth-death models of evolution using an alive particle filter with delayed sampling

Automated learning with a probabilistic programming language: Birch

Automatic diagnosis of the 12-lead ECG using a deep neural network

Beyond exploding and vanishing gradients: analysing RNN training using attractors and smoothness

Energy-Based Models for Deep Probabilistic Regression

Evaluating Scalable Bayesian Deep Learning Methods for Robust Computer Vision

How to Train Your Energy-Based Model for Regression

On the smoothness of nonlinear system identification

Particle filter with rejection control and unbiased estimator of the marginal likelihood

Registration by tracking for sequential 2D MRI

A Scalable and Distributed Solution to the Inertial Motion Capture Problem

Computationally Efficient Bayesian Learning of Gaussian Process State Space Models

Coupling of Particle Filters

High-dimensional Filtering using Nested Sequential Monte Carlo

Linear System Identification via EM with Latent Disturbances and Lagrangian Relaxation

Magnetometer calibration using inertial sensors

Mean and variance of the LQG cost function

Particle-based Gaussian process optimization for input design in nonlinear dynamical models

Sequential Monte Carlo Methods for System Identification

Accelerating pseudo-marginal Metropolis-Hastings by correlating auxiliary variables

Data-Efficient Learning of Feedback Policies from Image Pixels using Deep Dynamical Models

From Pixels to Torques: Policy Learning with Deep Dynamical Models

Marginalizing Gaussian Process Hyperparameters using Sequential Monte Carlo

Nested Sequential Monte Carlo Methods

Newton-based maximum likelihood estimation in nonlinear state space models

Nonlinear State Space Model Identification Using a Regularized Basis Function Expansion

Nonlinear state space smoothing using the conditional particle filter

Particle ancestor sampling for near-degenerate or intractable state transition models

Quasi-Newton particle Metropolis-Hastings

Rao-Blackwellized particle smoothers for conditionally linear Gaussian models

A graph/particle-based method for experiment design in nonlinear systems

A new structure exploiting derivation of recursive direct weight optimization

Capacity estimation of two-dimensional channels using Sequential Monte Carlo

Identification of jump Markov linear models using particle filters

Learning deep dynamical models from image pixels

Particle Gibbs with Ancestor Sampling

Particle Metropolis-Hastings using gradient and Hessian information

Sequential Monte Carlo for Graphical Models

Bayesian Inference and Learning in Gaussian Process State-Space Models with Particle MCMC

Identification of Gaussian Process State-Space Models with Particle Stochastic Approximation EM

Inference in Gaussian models with missing data using Equalisation Maximisation

Ancestor Sampling for Particle Gibbs

On the use of backward simulation in particle Markov chain Monte Carlo methods