Source author record

Moritz Schauer

Moritz Schauer appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Methodology Computation Machine Learning math.PR Applications cs.CY math.NA math.ST Mathematical Software Numerical Analysis Programming Languages Statistics Theory

Catalog footprint

What is connected

7works

12topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2023arXiv

Automatic Differentiation of Programs with Discrete Randomness

Automatic differentiation (AD), a technique for constructing new programs which compute the derivative of an original program, has become ubiquitous throughout scientific computing and deep learning due to the improved performance afforded by gradient-based optimization. However, AD systems have been restricted to the subset of programs that have a continuous dependence on parameters. Programs that have discrete stochastic behaviors governed by distribution parameters, such as flipping a coin with probability $p$ of being heads, pose a challenge to these systems because the connection between the result (heads vs tails) and the parameters ($p$) is fundamentally discrete. In this paper we develop a new reparameterization-based methodology that allows for generating programs whose expectation is the derivative of the expectation of the original program. We showcase how this method gives an unbiased and low-variance estimator which is as automated as traditional AD mechanisms. We demonstrate unbiased forward-mode AD of discrete-time Markov chains, agent-based models such as Conway's Game of Life, and unbiased reverse-mode AD of a particle filter. Our code package is available at https://github.com/gaurav-arya/StochasticAD.jl.

preprint2022arXiv

Applied Measure Theory for Probabilistic Modeling

Probabilistic programming and statistical computing are vibrant areas in the development of the Julia programming language, but the underlying infrastructure dramatically predates recent developments. The goal of MeasureTheory.jl is to provide Julia with the right vocabulary and tools for these tasks. In the package we introduce a well-chosen set of notions from the foundations of probability together with powerful combinators and transforms, giving a gentle introduction to the concepts in this article. The task is foremost achieved by recognizing measure as the central object. This enables us to develop a proper concept of densities as objects relating measures with each others. As densities provide local perspective on measures, they are the key to efficient implementations. The need to preserve this computationally so important locality leads to the new notion of locally-dominated measure solving the so-called base measure problem and making work with densities and distributions in Julia easier and more flexible.

preprint2022arXiv

Flexible Group Fairness Metrics for Survival Analysis

Algorithmic fairness is an increasingly important field concerned with detecting and mitigating biases in machine learning models. There has been a wealth of literature for algorithmic fairness in regression and classification however there has been little exploration of the field for survival analysis. Survival analysis is the prediction task in which one attempts to predict the probability of an event occurring over time. Survival predictions are particularly important in sensitive settings such as when utilising machine learning for diagnosis and prognosis of patients. In this paper we explore how to utilise existing survival metrics to measure bias with group fairness metrics. We explore this in an empirical experiment with 29 survival datasets and 8 measures. We find that measures of discrimination are able to capture bias well whereas there is less clarity with measures of calibration and scoring rules. We suggest further areas for research including prediction-based fairness metrics for distribution predictions.

preprint2020arXiv

Fast and scalable non-parametric Bayesian inference for Poisson point processes

We study the problem of non-parametric Bayesian estimation of the intensity function of a Poisson point process. The observations are $n$ independent realisations of a Poisson point process on the interval $[0,T]$. We propose two related approaches. In both approaches we model the intensity function as piecewise constant on $N$ bins forming a partition of the interval $[0,T]$. In the first approach the coefficients of the intensity function are assigned independent gamma priors, leading to a closed form posterior distribution. On the theoretical side, we prove that as $n\rightarrow\infty,$ the posterior asymptotically concentrates around the "true", data-generating intensity function at an optimal rate for $h$-Hölder regular intensity functions ($0 < h\leq 1$). In the second approach we employ a gamma Markov chain prior on the coefficients of the intensity function. The posterior distribution is no longer available in closed form, but inference can be performed using a straightforward version of the Gibbs sampler. Both approaches scale well with sample size, but the second is much less sensitive to the choice of $N$. Practical performance of our methods is first demonstrated via synthetic data examples. We compare our second method with other existing approaches on the UK coal mining disasters data. Furthermore, we apply it to the US mass shootings data and Donald Trump's Twitter data.

preprint2019arXiv

Bayesian wavelet de-noising with the caravan prior

According to both domain expert knowledge and empirical evidence, wavelet coefficients of real signals tend to exhibit clustering patterns, in that they contain connected regions of coefficients of similar magnitude (large or small). A wavelet de-noising approach that takes into account such a feature of the signal may in practice outperform other, more vanilla methods, both in terms of the estimation error and visual appearance of the estimates. Motivated by this observation, we present a Bayesian approach to wavelet de-noising, where dependencies between neighbouring wavelet coefficients are a priori modelled via a Markov chain-based prior, that we term the caravan prior. Posterior computations in our method are performed via the Gibbs sampler. Using representative synthetic and real data examples, we conduct a detailed comparison of our approach with a benchmark empirical Bayes de-noising method (due to Johnstone and Silverman). We show that the caravan prior fares well and is therefore a useful addition to the wavelet de-noising toolbox.

preprint2019arXiv

Nonparametric Bayesian estimation of a Hölder continuous diffusion coefficient

We consider a nonparametric Bayesian approach to estimate the diffusion coefficient of a stochastic differential equation given discrete time observations over a fixed time interval. As a prior on the diffusion coefficient, we employ a histogram-type prior with piecewise constant realisations on bins forming a partition of the time interval. Specifically, these constants are realizations of independent inverse Gamma distributed randoma variables. We justify our approach by deriving the rate at which the corresponding posterior distribution asymptotically concentrates around the data-generating diffusion coefficient. This posterior contraction rate turns out to be optimal for estimation of a Hölder-continuous diffusion coefficient with smoothness parameter $0<λ\leq 1.$ Our approach is straightforward to implement, as the posterior distributions turn out to be inverse Gamma again, and leads to good practical results in a wide range of simulation examples. Finally, we apply our method on exchange rate data sets.

preprint2019arXiv

Simulation of elliptic and hypo-elliptic conditional diffusions

Suppose $X$ is a multidimensional diffusion process. Assume that at time zero the state of $X$ is fully observed, but at time $T>0$ only linear combinations of its components are observed. That is, one only observes the vector $L X_T$ for a given matrix $L$. In this paper we show how samples from the conditioned process can be generated. The main contribution of this paper is to prove that guided proposals, introduced in Schauer et al. (2017), can be used in a unified way for both uniformly and hypo-elliptic diffusions, also when $L$ is not the identity matrix. This is illustrated by excellent performance in two challenging cases: a partially observed twice integrated diffusion with multiple wells and the partially observed FitzHugh-Nagumo model.

Moritz Schauer

What is connected

Connect this record

See the researcher in context

Building this map preview

7 published item(s)

Automatic Differentiation of Programs with Discrete Randomness

Applied Measure Theory for Probabilistic Modeling

Flexible Group Fairness Metrics for Survival Analysis

Fast and scalable non-parametric Bayesian inference for Poisson point processes

Bayesian wavelet de-noising with the caravan prior

Nonparametric Bayesian estimation of a Hölder continuous diffusion coefficient

Simulation of elliptic and hypo-elliptic conditional diffusions