Séminaire de Probabilités et Statistique :
Le 17 janvier 2011 à 15:00 - SupAgro, salle 11/104 (château)
Présentée par Rigail Guillem - AgroParisTech
Détection de ruptures multiples dans la moyenne, application à l'analyse du nombre de copies d'ADN
A DNA copy number profile can be viewed as a succession of segments representing regions in the genome that share the same DNA copy number. Multiple-change-point detection methods constitute a natural framework for their analysis and the detection of change-points. Assessing the quality of a segmentation and in particular the confidence we have in a particular change-point is a difficult problem. In a Bayesian context, I will present exact and explicit formulas for the posterior distribution of variables such as the number of change-points or their positions. I will also show that several Bayesian model selection criteria (BIC, ICL, IC) can be computed exactly. These results are based on an efficient strategy to explore the whole segmentation space. Due to the increasing size of DNA copy number profiles (n > 106), the computational burden is now one of the foremost issues when analysing DNA copy number profiles. The fastest exact algorithm is in O(n2), which is prohibitive for large signals. I will present a fast algorithm ( O(n \log(n) ) to recover the optimal segmentation (w.r.t. maximum likelihood) in 1 to K segments for models which have one parameter per segment.