Séance Séminaire

Séminaire de Probabilités et Statistique

Monday 24 September 2018 à 13:45 - UM - Bât 09 - Salle de conférence (1er étage)
Julien Jacques (Université Lyon Lumière)

Classification, clustering and co-clustering for ordinal data

A model-based approach for analyzing and modeling ordinal data is presented. This model relies on the latent block model embedding a probability distribution specific to ordinal data (the so-called BOS or Binary Ordinal Search distribution). Classification, clustering and co-clustering algorithms are derived from the proposed model. Model inference relies on a stochastic EM algorithm coupled with a Gibbs sampler, and the ICL-BIC criterion is used for selecting the number of clusters in clustering, the number of co-clusters in co-clustering, and the level of parsimony in classification. The main advantages of these ordinal dedicated models are their par- simony, the interpretability of the parameters (mode, precision) and the possibility to take into account missing data. All these algorithms are available in the ordinalClust package for R. The usefulness of the proposed algorithms are illustrated by analyzing a psychological survey on women affected by a breast tumor. References Biernacki C. and Jacques J. (2016), Model-based clustering of multivariate ordinal data relying on a stochastic binary search algorithm, Statistics and Computing, 26 [5], 929-943. Jacques J. and Biernacki C.(2018), Model-based co-clustering for ordinal data, Com- putational Statistics and Data Analysis, 123, 101-115. Selosse M., Jacques J., Biernacki C. (2017). ordinalClust: a package for analyzing ordinal data, Preprint HAL n?01678800. Selosse M., Jacques J., Biernacki C. (2017). ordinalClust. R package Version 1.2, available at https://CRAN.R-project.org/package=ordinalClust. Selosse M., Jacques J., Biernacki C. and Cousson-Gélie F. (2017). Analyzing health quality survey using constrained co-clustering model for ordinal data and some dynamic implication, Preprint HAL n?01643910.