Séance Séminaire

Séminaire de Probabilités et Statistique

Monday 17 November 2014 à 15:00 - UM2 - Bât 09 - Salle de conférence (1er étage)
Jean-Michel Marin (Université Montpellier 2)

Reliable ABC model choice via random forests

Approximate Bayesian computation (ABC) methods provide an elaborate approach to Bayesian inference on complex models, including model choice. Both theoretical arguments and simulation experiments indicate, however, that model posterior probabilities are poorly evaluated by ABC. We propose a novel approach based on a machine learning tool named random forests to conduct selection among the highly complex models covered by ABC algorithms. We strongly shift the way Bayesian model selection is both understood and operated, since we replace the evidential use of model posterior probabilities by predicting the model that best fits the data with random forests and computing an associated posterior error rate. Compared with past implementations of ABC model choice, the ABC random forest approach offers several improvements: (i) it has a larger discriminative power among the competing models, (ii) it is robust to the number and choice of statistics summarizing the data, (iii) the computing effort is drastically reduced, and (iv) it includes an embedded and cost-free error evaluation conditional on the actual analyzed dataset. Random forest will undoubtedly extend the range of size of datasets and complexity of models that ABC can handle. We illustrate the power of the ABC random forest methodology by analyzing controlled experiments as well as real population genetics datasets. joint work with Pierre Pudlo, Arnaud Estoup, Jean-Marie Cornuet, Mathieu Gauthier, Christian P. Robert