Séminaire de Probabilités et Statistique
lundi 11 mars 2024 à 13:45 - Institut Agro 11/101 (bat. 11 niveau 1)
Timothee Mathieu (INRIA Lille)
Robust Multivariate Mean estimation with M-estimators with applications in classification and regression
Mean estimation is a fundamental problem in statistics, as it is a tool on which a lot of the statistical procedures are based in particular in Machine Learning. In the well-controlled case of Gaussian random variables (or sub-gaussian random variables), it is known that the empirical mean perform fairly well. On the other hand, as soon as the distribution becomes either heavy-tailed or corrupted, things get complicated. This can be a major difficulty because in practice a lot of datasets contains outliers (typically in life sciences there are outliers in most datasets). In this presentation, I will first present usual methods for robust mean estimation in dimension one, then I will explain how to use M-estimators for mean estimation in a multivariate setting and finally how to use robust univariate mean estimators to answer the problem of empirical risk minimization for classification and regression with some illustrations using real datasets. I will also explain the problems that remain open as, for now, there is no estimator that solve optimally the problem of robust mean estimation even when the dimension is equal to one.