Kernel and dissimilarity methods for exploratory analysis in a social context

Archive ouverte

Mariette, Jérôme, J. | Olteanu, Madalina | Vialaneix, Nathalie

Edité par CCSD -

International audience. While most of statistical methods for prediction or data mining have been built for data made of independent observations of a common set of p numerical variables, many real-world applications do not fit in this framework. A more common and general situation is the case where a relevant similarity or dissimilarity can be computed between the observations, providing a summary of their relations to each other. This setting is related to the kernel framework that has allowed to extend most of standard statistical supervised and unsupervised methods to any type of data for which a relevant such kernel can be obtained. The present chapter aims at presenting kernel methods in general, with a specific focus on the less studied unsupervised framework. We illustrate its usefulness by describing the extension of self-organizing maps and by proposing an approach to combine kernels in an efficient way. The overall approach is illustrated on categorical time series in a social-science context and allows to illustrate how the choice of a given type of dissimilarity or group of dissimilarities can influence the output of the exploratory analysis.

Suggestions

Du même auteur

Efficient interpretable variants of online SOM for large dissimilarity data

Archive ouverte | Mariette, Jérôme, J. | CCSD

International audience. Self-organizing maps (SOM) are a useful tool for exploring data. In its original version, the SOM algorithm was designed for numerical vectors. Since then, several extensions have been propos...

Unsupervised multiple kernel learning for heterogeneous data integration

Archive ouverte | Mariette, Jérôme, J. | CCSD

International audience. Motivation: Recent high-throughput sequencing advances have expanded the breadth of available omics datasets and the integrated analysis of multiple datasets obtained on the same samples has ...

Des noyaux pour les omiques

Archive ouverte | Mariette, Jérôme, J. | CCSD

International audience. Le développement des techniques de séquençage haut débit génère un volume de données en forte croissance à des coûts relativement faibles. Ces données sont souvent de très grande dimension, h...

Chargement des enrichissements...