Séminaire des Doctorant·e·s
mercredi 12 décembre 2018 à 15h - Salle 109
Mathurin Massias (Télécom ParisTech)
Celer: Exploiting structure in sparse Generalized Linear Models
Generalized Linear Models (GLM) are a wide class of regression and classification models, where the predicted variable is obtained from a linear combination of the input variables. For statistical inference in high dimensions, sparsity inducing regularization have proven useful while offering statistical guarantees. However, solving the resulting optimization problems can be challenging: even for popular iterative algorithms such as coordinate descent, one needs to loop over a large number of variables. To mitigate this, techniques known as \emph{screening rules} and \emph{working sets} diminish the size of the optimization problem at hand, either by progressively removing variables, or by solving a growing sequence of smaller problems. For both of these techniques, significant variables are identified by convex duality. In this paper, we show that the dual iterates of a GLM exhibit a Vector AutoRegressive (VAR) behavior after support identification, when the primal problem is solved with proximal gradient or cyclic coordinate descent. Exploiting this regularity one can construct dual points that offer tighter control of optimality, enhancing the performance of screening rules and helping to design a competitive working set algorithm.