Séance Séminaire

Séminaire de Probabilités et Statistique

lundi 02 décembre 2024 à 13:45 - UM - Bât 09 - Salle 109 (1er étage)

Charles Tiller (Université de Versailles Saint-Quentin-en-Yvelines)

Infinite random forests for imbalanced classification tasks

During this talk we will investigate predictive probability inference for classification tasks using random forests in the context of imbalanced data. In this setting, we analyze the asymptotic properties of simplified versions of the original Breiman's algorithm, namely subsampling and under-sampling Infinite Random Forests (IRFs), and establish the asymptotic normality of these two models. The under-sampling IRFs, that tackle the challenge of the predicting the minority class by a downsampling strategy to under-represent the majority class show asymptotic bias. To address this problem, we introduce a new estimator based on an Importance Sampling debiasing procedure built upon on odds ratios. We apply our results considering 1-Nearest Neighbors (1-NN) as individual trees of the IRFs. The resulting bagged 1-NN estimator achieves the same asymptotic rate in the three configurations but with a different asymptotic variance. Finally, we conduct simulations to validate the empirical relevance of our theoretical findings

Séminaire en salle 109, également retransmis sur zoom : https://umontpellier-fr.zoom.us/j/7156708132