Researcher profile

Michael P. H. Stumpf

Michael P. H. Stumpf contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
9works
0followers
6topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

9 published item(s)

preprint2022arXiv

A group theoretic approach to model comparison with simplicial representations

The complexity of biological systems, and the increasingly large amount of associated experimental data, necessitates that we develop mathematical models to further our understanding of these systems. As biological systems are generally not well understood, most mathematical models of these systems are based on experimental data, resulting in a seemingly heterogeneous collection of models that ostensibly represent the same system. To understand the system we therefore need to know how the different models are related, with a view to obtaining a unified mathematical description. This goal is complicated by the fact that distinct mathematical formalisms may be used to represent the same system, making direct comparison of the models very difficult. In previous work we developed an appropriate framework for model comparison where we represent models as labelled simplicial complexes and compare them with two general methodologies: comparison by distance or equivalence. In this article we continue the development of our model comparison methodology in two directions. First, we present a rigorous and automatable methodology for the core process of comparison by equivalence, namely determining the vertices in a simplicial representation, corresponding to model components, that are conceptually related and the identification of these vertices via simplicial operations. Our methodology is based on considerations of vertex symmetry in the simplicial representation, for which we develop the required mathematical theory of group actions on simplicial complexes. This methodology greatly simplifies and expedites the process of determining model equivalence. Second, we provide an alternative mathematical framework for our model-comparison methodology by representing models as groups, which allows for the direct application of group-theoretic techniques within our model-comparison methodology.

preprint2022arXiv

Open Problems in Mathematical Biology

Biology is data-rich, and it is equally rich in concepts and hypotheses. Part of trying to understand biological processes and systems is therefore to confront our ideas and hypotheses with data using statistical methods to determine the extent to which our hypotheses agree with reality. But doing so in a systematic way is becoming increasingly challenging as our hypotheses become more detailed, and our data becomes more complex. Mathematical methods are therefore gaining in importance across the life- and biomedical sciences. Mathematical models allow us to test our understanding, make testable predictions about future behaviour, and gain insights into how we can control the behaviour of biological systems. It has been argued that mathematical methods can be of great benefit to biologists to make sense of data. But mathematics and mathematicians are set to benefit equally from considering the often bewildering complexity inherent to living systems. Here we present a small selection of open problems and challenges in mathematical biology. We have chosen these open problems because they are of both biological and mathematical interest.

preprint2012arXiv

Information Geometry and Sequential Monte Carlo

This paper explores the application of methods from information geometry to the sequential Monte Carlo (SMC) sampler. In particular the Riemannian manifold Metropolis-adjusted Langevin algorithm (mMALA) is adapted for the transition kernels in SMC. Similar to its function in Markov chain Monte Carlo methods, the mMALA is a fully adaptable kernel which allows for efficient sampling of high-dimensional and highly correlated parameter spaces. We set up the theoretical framework for its use in SMC with a focus on the application to the problem of sequential Bayesian inference for dynamical systems as modelled by sets of ordinary differential equations. In addition, we argue that defining the sequence of distributions on geodesics optimises the effective sample sizes in the SMC run. We illustrate the application of the methodology by inferring the parameters of simulated Lotka-Volterra and Fitzhugh-Nagumo models. In particular we demonstrate that compared to employing a standard adaptive random walk kernel, the SMC sampler with an information geometric kernel design attains a higher level of statistical robustness in the inferred parameters of the dynamical systems.

preprint2012arXiv

On optimality of kernels for approximate Bayesian computation using sequential Monte Carlo

Approximate Bayesian computation (ABC) has gained popularity over the past few years for the analysis of complex models arising in population genetic, epidemiology and system biology. Sequential Monte Carlo (SMC) approaches have become work horses in ABC. Here we discuss how to construct the perturbation kernels that are required in ABC SMC approaches, in order to construct a set of distributions that start out from a suitably defined prior and converge towards the unknown posterior. We derive optimality criteria for different kernels, which are based on the Kullback-Leibler divergence between a distribution and the distribution of the perturbed particles. We will show that for many complicated posterior distributions, locally adapted kernels tend to show the best performance. In cases where it is possible to estimate the Fisher information we can construct particularly efficient perturbation kernels. We find that the added moderate cost of adapting kernel functions is easily regained in terms of the higher acceptance rate. We demonstrate the computational efficiency gains in a range of toy-examples which illustrate some of the challenges faced in real-world applications of ABC, before turning to two demanding parameter inference problem in molecular biology, which highlight the huge increases in efficiency that can be gained from choice of optimal models. We conclude with a general discussion of rational choice of perturbation kernels in ABC SMC settings.

preprint2012arXiv

Optimizing Threshold - Schedules for Approximate Bayesian Computation Sequential Monte Carlo Samplers: Applications to Molecular Systems

The likelihood-free sequential Approximate Bayesian Computation (ABC) algorithms, are increasingly popular inference tools for complex biological models. Such algorithms proceed by constructing a succession of probability distributions over the parameter space conditional upon the simulated data lying in an $ε$--ball around the observed data, for decreasing values of the threshold $ε$. While in theory, the distributions (starting from a suitably defined prior) will converge towards the unknown posterior as $ε$ tends to zero, the exact sequence of thresholds can impact upon the computational efficiency and success of a particular application. In particular, we show here that the current preferred method of choosing thresholds as a pre-determined quantile of the distances between simulated and observed data from the previous population, can lead to the inferred posterior distribution being very different to the true posterior. Threshold selection thus remains an important challenge. Here we propose an automated and adaptive method that allows us to balance the need to minimise the threshold with computational efficiency. Moreover, our method which centres around predicting the threshold - acceptance rate curve using the unscented transform, enables us to avoid local minima - a problem that has plagued previous threshold schemes.

preprint2011arXiv

Considerate Approaches to Achieving Sufficiency for ABC model selection

For nearly any challenging scientific problem evaluation of the likelihood is problematic if not impossible. Approximate Bayesian computation (ABC) allows us to employ the whole Bayesian formalism to problems where we can use simulations from a model, but cannot evaluate the likelihood directly. When summary statistics of real and simulated data are compared --- rather than the data directly --- information is lost, unless the summary statistics are sufficient. Here we employ an information-theoretical framework that can be used to construct (approximately) sufficient statistics by combining different statistics until the loss of information is minimized. Such sufficient sets of statistics are constructed for both parameter estimation and model selection problems. We apply our approach to a range of illustrative and real-world model selection problems.

preprint2011arXiv

Decomposing Noise in Biochemical Signalling Systems Highlights the Role of Protein Degradation

The phenomena of stochasticity in biochemical processes have been intriguing life scientists for the past few decades. We now know that living cells take advantage of stochasticity in some cases and counteract stochastic effects in others. The source of intrinsic stochasticity in biomolecular systems are random timings of individual reactions, which cumulatively drive the variability in outputs of such systems. Despite the acknowledged relevance of stochasticity in the functioning of living cells no rigorous method have been proposed to precisely identify sources of variability. In this paper we propose a novel methodology that allows us to calculate contributions of individual reactions into the variability of a system's output. We demonstrate that some reactions have dramatically different effects on noise than others. Surprisingly, in the class of open conversion systems that serve as an approximate model of signal transduction, the degradation of an output contributes half of the total noise. We also demonstrate the importance of degradation in other relevant systems and propose a degradation feedback control mechanism that has the capability of an effective noise suppression. Application of our method to some well studied biochemical systems such as: gene expression, Michaelis-Menten enzyme kinetics, and the p53 system indicates that our methodology reveals an unprecedented insight into the origins of variability in biochemical systems. For many systems an analytical decomposition is not available; therefore the method has been implemented as a Matlab package and is available from the authors upon request.

preprint2010arXiv

Simulation-based model selection for dynamical systems in systems and population biology

Computer simulations have become an important tool across the biomedical sciences and beyond. For many important problems several different models or hypotheses exist and choosing which one best describes reality or observed data is not straightforward. We therefore require suitable statistical tools that allow us to choose rationally between different mechanistic models of e.g. signal transduction or gene regulation networks. This is particularly challenging in systems biology where only a small number of molecular species can be assayed at any given time and all measurements are subject to measurement uncertainty. Here we develop such a model selection framework based on approximate Bayesian computation and employing sequential Monte Carlo sampling. We show that our approach can be applied across a wide range of biological scenarios, and we illustrate its use on real data describing influenza dynamics and the JAK-STAT signalling pathway. Bayesian model selection strikes a balance between the complexity of the simulation models and their ability to describe observed data. The present approach enables us to employ the whole formal apparatus to any system that can be (efficiently) simulated, even when exact likelihoods are computationally intractable.