Séance Séminaire

Séminaire de Probabilités et Statistique

lundi 30 janvier 2017 à 13:45 - UM - Bât 09 - Salle de conférence (1er étage)

Philippe Saint-Pierre (Université de Toulouse)

Correlation and functional data analysis

Random forest algorithm provides a predictor ensemble based on a set of randomized decision trees. The good performances in practical use can explain the growing interest in this approach. However, there is still a need to better understand the algorithm and the related importance measures. We first study the permutation importance measure in presence of correlated predictors. We describe how the correlation between predictors impacts the permutation importance in an additive model. Our results motivate the use of the Recursive Feature Elimination (RFE) algorithm for variable selection in this context. We then propose an extension of the permutation importance for groups of variable. This original criterion is used in a functional data analysis framework for selecting functional variables. Based on wavelet decomposition, the wavelet coefficients for a given functional variable are regroup and a selection algorithm based on grouped importance is proposed. Various other groupings which take advantage of the frequency and time localization of the wavelet basis can be proposed. These methods have been developed jointly with the startup Safety Line for aviation safety purposes. The aim was to predict and explain the risk of long landing using data from flight data recorders.