Source author record

Jean-Michel Marin

Jean-Michel Marin appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation Methodology math.ST Statistics Theory Applications Machine Learning math.PR Quantitative Methods Populations and Evolution

Catalog footprint

What is connected

23works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2016arXiv

A fully objective Bayesian approach for the Behrens-Fisher problem using historical studies

For in vivo research experiments with small sample sizes and available historical data, we propose a sequential Bayesian method for the Behrens-Fisher problem. We consider it as a model choice question with two models in competition: one for which the two expectations are equal and one for which they are different. The choice between the two models is performed through a Bayesian analysis, based on a robust choice of combined objective and subjective priors, set on the parameters space and on the models space. Three steps are necessary to evaluate the posterior probability of each model using two historical datasets similar to the one of interest. Starting from the Jeffreys prior, a posterior using a first historical dataset is deduced and allows to calibrate the Normal-Gamma informative priors for the second historical dataset analysis, in addition to a uniform prior on the model space. From this second step, a new posterior on the parameter space and the models space can be used as the objective informative prior for the last Bayesian analysis. Bayesian and frequentist methods have been compared on simulated and real data. In accordance with FDA recommendations, control of type I and type II error rates has been evaluated. The proposed method controls them even if the historical experiments are not completely similar to the one of interest.

preprint2016arXiv

Likelihood-free Model Choice

This document is an invited chapter covering the specificities of ABC model choice, intended for the incoming Handbook of ABC by Sisson, Fan, and Beaumont (2017). Beyond exposing the potential pitfalls of ABC based posterior probabilities, the review emphasizes mostly the solution proposed by Pudlo et al. (2016) on the use of random forests for aggregating summary statistics and and for estimating the posterior probability of the most likely model via a secondary random fores.

preprint2015arXiv

Bayesian Essentials with R: The Complete Solution Manual

This is the collection of solutions for all the exercises proposed in Bayesian Essentials with R (2014).

preprint2015arXiv

Reliable ABC model choice via random forests

Approximate Bayesian computation (ABC) methods provide an elaborate approach to Bayesian inference on complex models, including model choice. Both theoretical arguments and simulation experiments indicate, however, that model posterior probabilities may be poorly evaluated by standard ABC techniques. We propose a novel approach based on a machine learning tool named random forests to conduct selection among the highly complex models covered by ABC algorithms. We thus modify the way Bayesian model selection is both understood and operated, in that we rephrase the inferential goal as a classification problem, first predicting the model that best fits the data with random forests and postponing the approximation of the posterior probability of the predicted MAP for a second stage also relying on random forests. Compared with earlier implementations of ABC model choice, the ABC random forest approach offers several potential improvements: (i) it often has a larger discriminative power among the competing models, (ii) it is more robust against the number and choice of statistics summarizing the data, (iii) the computing effort is drastically reduced (with a gain in computation efficiency of at least fifty), and (iv) it includes an approximation of the posterior probability of the selected model. The call to random forests will undoubtedly extend the range of size of datasets and complexity of models that ABC can handle. We illustrate the power of this novel methodology by analyzing controlled experiments as well as genuine population genetics datasets. The proposed methodologies are implemented in the R package abcrf available on the CRAN.

preprint2014arXiv

Consistency of the Adaptive Multiple Importance Sampling

Among Monte Carlo techniques, the importance sampling requires fine tuning of a proposal distribution, which is now fluently resolved through iterative schemes. The Adaptive Multiple Importance Sampling (AMIS) of Cornuet et al. (2012) provides a significant improvement in stability and effective sample size due to the introduction of a recycling procedure. However, the consistency of the AMIS estimator remains largely open. In this work we prove the convergence of the AMIS, at a cost of a slight modification in the learning process. Contrary to Douc et al. (2007a), results are obtained here in the asymptotic regime where the number of iterations is going to infinity while the number of drawings per iteration is a fixed, but growing sequence of integers. Hence some of the results shed new light on adaptive population Monte Carlo algorithms in that last regime.

preprint2013arXiv

Efficient learning in ABC algorithms

Approximate Bayesian Computation has been successfully used in population genetics to bypass the calculation of the likelihood. These methods provide accurate estimates of the posterior distribution by comparing the observed dataset to a sample of datasets simulated from the model. Although parallelization is easily achieved, computation times for ensuring a suitable approximation quality of the posterior distribution are still high. To alleviate the computational burden, we propose an adaptive, sequential algorithm that runs faster than other ABC algorithms but maintains accuracy of the approximation. This proposal relies on the sequential Monte Carlo sampler of Del Moral et al. (2012) but is calibrated to reduce the number of simulations from the model. The paper concludes with numerical experiments on a toy example and on a population genetic study of Apis mellifera, where our algorithm was shown to be faster than traditional ABC schemes.

preprint2012arXiv

Bounding rare event probabilities in computer experiments

We are interested in bounding probabilities of rare events in the context of computer experiments. These rare events depend on the output of a physical model with random input variables. Since the model is only known through an expensive black box function, standard efficient Monte Carlo methods designed for rare events cannot be used. We then propose a strategy to deal with this difficulty based on importance sampling methods. This proposal relies on Kriging metamodeling and is able to achieve sharp upper confidence bounds on the rare event probabilities. The variability due to the Kriging metamodeling step is properly taken into account. The proposed methodology is applied to a toy example and compared to more standard Bayesian bounds. Finally, a challenging real case study is analyzed. It consists of finding an upper bound of the probability that the trajectory of an airborne load will collide with the aircraft that has released it.

preprint2012arXiv

Some discussions of D. Fearnhead and D. Prangle's Read Paper "Constructing summary statistics for approximate Bayesian computation: semi-automatic approximate Bayesian computation"

This report is a collection of comments on the Read Paper of Fearnhead and Prangle (2011), to appear in the Journal of the Royal Statistical Society Series B, along with a reply from the authors.

preprint2011arXiv

A new semi-parametric family of probability distributions for survival analysis

In the context of survival analysis, Marshall and Olkin (1997) introduced families of distributions by adding a scalar parameter to a given survival function, parameterized or not. In that paper, we generalize their approach. We show how it is possible to add more than a single parameter to a given distribution. We then introduce very flexible families of distributions for which we calculate some moments. Notably, we give some tractable expressions of these moments when the given baseline distribution is Log-logistic. Finally, we demonstrate how to generate sample from these new families.

preprint2011arXiv

Adaptive Multiple Importance Sampling

The Adaptive Multiple Importance Sampling (AMIS) algorithm is aimed at an optimal recycling of past simulations in an iterated importance sampling scheme. The difference with earlier adaptive importance sampling implementations like Population Monte Carlo is that the importance weights of all simulated values, past as well as present, are recomputed at each iteration, following the technique of the deterministic multiple mixture estimator of Owen and Zhou (2000). Although the convergence properties of the algorithm cannot be fully investigated, we demonstrate through a challenging banana shape target distribution and a population genetics example that the improvement brought by this technique is substantial.

preprint2011arXiv

An empirical Bayes procedure for the selection of Gaussian graphical models

A new methodology for model determination in decomposable graphical Gaussian models is developed. The Bayesian paradigm is used and, for each given graph, a hyper inverse Wishart prior distribution on the covariance matrix is considered. This prior distribution depends on hyper-parameters. It is well-known that the models's posterior distribution is sensitive to the specification of these hyper-parameters and no completely satisfactory method is registered. In order to avoid this problem, we suggest adopting an empirical Bayes strategy, that is a strategy for which the values of the hyper-parameters are determined using the data. Typically, the hyper-parameters are fixed to their maximum likelihood estimations. In order to calculate these maximum likelihood estimations, we suggest a Markov chain Monte Carlo version of the Stochastic Approximation EM algorithm. Moreover, we introduce a new sampling scheme in the space of graphs that improves the add and delete proposal of Armstrong et al. (2009). We illustrate the efficiency of this new scheme on simulated and real datasets.

preprint2011arXiv

Approximate Bayesian Computational methods

Also known as likelihood-free methods, approximate Bayesian computational (ABC) methods have appeared in the past ten years as the most satisfactory approach to untractable likelihood problems, first in genetics then in a broader spectrum of applications. However, these methods suffer to some degree from calibration difficulties that make them rather volatile in their implementation and thus render them suspicious to the users of more traditional Monte Carlo methods. In this survey, we study the various improvements and extensions made to the original ABC algorithm over the recent years.

preprint2011arXiv

Lack of confidence in ABC model choice

Approximate Bayesian computation (ABC) have become a essential tool for the analysis of complex stochastic models. Earlier, Grelaud et al. (2009) advocated the use of ABC for Bayesian model choice in the specific case of Gibbs random fields, relying on a inter-model sufficiency property to show that the approximation was legitimate. Having implemented ABC-based model choice in a wide range of phylogenetic models in the DIY-ABC software (Cornuet et al., 2008), we now present theoretical background as to why a generic use of ABC for model choice is ungrounded, since it depends on an unknown amount of information loss induced by the use of insufficient summary statistics. The approximation error of the posterior probabilities of the models under comparison may thus be unrelated with the computational effort spent in running an ABC algorithm. We then conclude that additional empirical verifications of the performances of the ABC procedure as those available in DIYABC are necessary to conduct model choice.

preprint2011arXiv

Maximin design on non hypercube domain and kernel interpolation

In the paradigm of computer experiments, the choice of an experimental design is an important issue. When no information is available about the black-box function to be approximated, an exploratory design have to be used. In this context, two dispersion criteria are usually considered: the minimax and the maximin ones. In the case of a hypercube domain, a standard strategy consists of taking the maximin design within the class of Latin hypercube designs. However, in a non hypercube context, it does not make sense to use the Latin hypercube strategy. Moreover, whatever the design is, the black-box function is typically approximated thanks to kernel interpolation. Here, we first provide a theoretical justification to the maximin criterion with respect to kernel interpolations. Then, we propose simulated annealing algorithms to determine maximin designs in any bounded connected domain. We prove the convergence of the different schemes.

preprint2011arXiv

Regularization in regression: comparing Bayesian and frequentist methods in a poorly informative situation

Using a collection of simulated an real benchmarks, we compare Bayesian and frequentist regularization approaches under a low informative constraint when the number of variables is almost equal to the number of observations on simulated and real datasets. This comparison includes new global noninformative approaches for Bayesian variable selection built on Zellner's g-priors that are similar to Liang et al. (2008). The interest of those calibration-free proposals is discussed. The numerical experiments we present highlight the appeal of Bayesian regularization methods, when compared with non-Bayesian alternatives. They dominate frequentist methods in the sense that they provide smaller prediction errors while selecting the most relevant variables in a parsimonious way.

preprint2011arXiv

Why approximate Bayesian computational (ABC) methods cannot handle model choice problems

Approximate Bayesian computation (ABC), also known as likelihood-free methods, have become a favourite tool for the analysis of complex stochastic models, primarily in population genetics but also in financial analyses. We advocated in Grelaud et al. (2009) the use of ABC for Bayesian model choice in the specific case of Gibbs random fields (GRF), relying on a sufficiency property mainly enjoyed by GRFs to show that the approach was legitimate. Despite having previously suggested the use of ABC for model choice in a wider range of models in the DIY ABC software (Cornuet et al., 2008), we present theoretical evidence that the general use of ABC for model choice is fraught with danger in the sense that no amount of computation, however large, can guarantee a proper approximation of the posterior probabilities of the models under comparison.

preprint2010arXiv

Bayesian Inference

This chapter provides a overview of Bayesian inference, mostly emphasising that it is a universal method for summarising uncertainty and making estimates and predictions using probability statements conditional on observed data and an assumed model (Gelman 2008). The Bayesian perspective is thus applicable to all aspects of statistical inference, while being open to the incorporation of information items resulting from earlier experiments and from expert opinions. We provide here the basic elements of Bayesian analysis when considered for standard models, refering to Marin and Robert (2007) and to Robert (2007) for book-length entries.1 In the following, we refrain from embarking upon philosophical discussions about the nature of knowledge (see, e.g., Robert 2007, Chapter 10), opting instead for a mathematically sound presentation of an eminently practical statistical methodology. We indeed believe that the most convincing arguments for adopting a Bayesian version of data analyses are in the versatility of this tool and in the large range of existing applications, rather than in those polemical arguments.

preprint2010arXiv

Discussions on "Riemann manifold Langevin and Hamiltonian Monte Carlo methods"

This is a collection of discussions of `Riemann manifold Langevin and Hamiltonian Monte Carlo methods" by Girolami and Calderhead, to appear in the Journal of the Royal Statistical Society, Series B.

preprint2010arXiv

On computational tools for Bayesian data analysis

While Robert and Rousseau (2010) addressed the foundational aspects of Bayesian analysis, the current chapter details its practical aspects through a review of the computational methods available for approximating Bayesian procedures. Recent innovations like Monte Carlo Markov chain, sequential Monte Carlo methods and more recently Approximate Bayesian Computation techniques have considerably increased the potential for Bayesian applications and they have also opened new avenues for Bayesian inference, first and foremost Bayesian model choice.

preprint2010arXiv

On Particle Learning

This document is the aggregation of six discussions of Lopes et al. (2010) that we submitted to the proceedings of the Ninth Valencia Meeting, held in Benidorm, Spain, on June 3-8, 2010, in conjunction with Hedibert Lopes' talk at this meeting, and of a further discussion of the rejoinder by Lopes et al. (2010). The main point in those discussions is the potential for degeneracy in the particle learning methodology, related with the exponential forgetting of the past simulations. We illustrate in particular the resulting difficulties in the case of mixtures.

preprint2010arXiv

On resolving the Savage-Dickey paradox

The Savage-Dickey ratio is known as a specialised representation of the Bayes factor (O'Hagan and Forster, 2004) that allows for a functional plugging approximation of this quantity. We demonstrate here that the Savage-Dickey representation is in fact a generic representation of the Bayes factor that relies on specific measure-theoretic versions of the densities involved in the ratio, instead of a special identity imposing the above constraints on the prior distributions. We completely clarify the measure-theoretic foundations of the representation as well as the generalisation of Verdinelli and Wasserman (1995) and propose a comparison of this new approximation with their version, as well as with bridge sampling and Chib's approaches.

preprint2009arXiv

Adaptive approximate Bayesian computation

Sequential techniques can enhance the efficiency of the approximate Bayesian computation algorithm, as in Sisson et al.'s (2007) partial rejection control version. While this method is based upon the theoretical works of Del Moral et al. (2006), the application to approximate Bayesian computation results in a bias in the approximation to the posterior. An alternative version based on genuine importance sampling arguments bypasses this difficulty, in connection with the population Monte Carlo method of Cappe et al. (2004), and it includes an automatic scaling of the forward kernel. When applied to a population genetics example, it compares favourably with two other versions of the approximate algorithm.

preprint2008arXiv

On some difficulties with a posterior probability approximation technique

In Scott (2002) and Congdon (2006), a new method is advanced to compute posterior probabilities of models under consideration. It is based solely on MCMC outputs restricted to single models, i.e., it is bypassing reversible jump and other model exploration techniques. While it is indeed possible to approximate posterior probabilities based solely on MCMC outputs from single models, as demonstrated by Gelfand and Dey (1994) and Bartolucci et al. (2006), we show that the proposals of Scott (2002) and Congdon (2006) are biased and advance several arguments towards this thesis, the primary one being the confusion between model-based posteriors and joint pseudo-posteriors. From a practical point of view, the bias in Scott's (2002) approximation appears to be much more severe than the one in Congdon's (2006), the later being often of the same magnitude as the posterior probability it approximates, although we also exhibit an example where the divergence from the true posterior probability is extreme.

Jean-Michel Marin

What is connected

Connect this record

See the researcher in context

Building this map preview

23 published item(s)

A fully objective Bayesian approach for the Behrens-Fisher problem using historical studies

Likelihood-free Model Choice

Bayesian Essentials with R: The Complete Solution Manual

Reliable ABC model choice via random forests

Consistency of the Adaptive Multiple Importance Sampling

Efficient learning in ABC algorithms

Bounding rare event probabilities in computer experiments

Some discussions of D. Fearnhead and D. Prangle's Read Paper "Constructing summary statistics for approximate Bayesian computation: semi-automatic approximate Bayesian computation"

A new semi-parametric family of probability distributions for survival analysis

Adaptive Multiple Importance Sampling

An empirical Bayes procedure for the selection of Gaussian graphical models

Approximate Bayesian Computational methods

Lack of confidence in ABC model choice

Maximin design on non hypercube domain and kernel interpolation

Regularization in regression: comparing Bayesian and frequentist methods in a poorly informative situation

Why approximate Bayesian computational (ABC) methods cannot handle model choice problems

Bayesian Inference

Discussions on "Riemann manifold Langevin and Hamiltonian Monte Carlo methods"

On computational tools for Bayesian data analysis

On Particle Learning

On resolving the Savage-Dickey paradox

Adaptive approximate Bayesian computation

On some difficulties with a posterior probability approximation technique