Soutenances de thèses
mardi 07 juin 2022 à 13h30 - 10.01
Yinneth-Lorena Leon-Valasco ()
About the link function in generalized linear models for categorical responses
The logit, proportional-odds logit, and multinomial logit models are the most common models for binary, ordinal, and nominal responses, respectively. Although these models have outstanding properties, they are too sensitive to the presence of outliers, and they do not capture specific characteristics of categorical data, such as the order type or the potential grouping relationships among categories. The link function is a key component of GLMs to address these particularities. The purpose of this thesis is precisely to study this link function in various forms for categorical regression models. We first investigate the robustness of the Student link function in the case of binary outcomes according to different data separation settings. For the case of more than two categories, we then propose in the framework of a unified R-package, a practical guide to identify the most suitable model for ordered categories according to the nature of the data and the properties of the model. Finally, when assuming a binary hierarchical structure among categories, we elaborate a two-step methodology to infer it. The first step is to construct a partition tree based on the agglomerative hierarchical clustering algorithm. The second step consists of a search algorithm based on rotations to efficiently visit the space of partition trees. Overall, this thesis aims to explore, popularize, and extend the range of regression models for categorical responses.