Are labels informative in semi-supervised learning?Estimating and leveraging the missing-data mechanism

Archive ouverte

Sportisse, Aude | Schmutz, Hugo | Humbert, Olivier | Bouveyron, Charles | Mattei, Pierre-Alexandre

Edité par CCSD -

International audience. Semi-supervised learning is a powerful technique for leveraging unlabeled data to improve machine learning models, but it can be affected by the presence of “informative” labels, which occur when some classes are more likely to be labeled than others. In the missing data literature, such labels are called missing not at random. In this paper, we propose a novel approach to address this issue by estimating the missing-data mechanism and using inverse propensity weighting to debias any SSL algorithm, including those using data augmentation. We also propose a likelihood ratio test to assess whether or not labels are indeed informative. Finally, we demonstrate the performance of the proposed methods on different datasets, in particular on two medical datasets for which we design pseudo-realistic missing data scenarios.

Suggestions

Du même auteur

18FDG PET/CT and Machine Learning for the prediction of lung cancer response to immunotherapy

Archive ouverte | Schmutz, Hugo | CCSD

International audience. In patients with non-small cell lung cancer (NSCLC) treated with immunotherapy, individual biological and PET imaging prognostic biomarkers have been recently identified. However, combination...

Don't fear the unlabelled: safe deep semi-supervised learning via simple debiasing

Archive ouverte | Schmutz, Hugo | CCSD

International audience. Semi supervised learning (SSL) provides an effective means of leveraging unlabelled data to improve a model's performance. Even though the domain has received a considerable amount of attenti...

Model-agnostic out-of-distribution detection using combined statistical tests

Archive ouverte | Bergamin, Federico | CCSD

International audience. We present simple methods for out-of-distribution detection using a trained generative model. These techniques, based on classical statistical tests, are model-agnostic in the sense that they...

Chargement des enrichissements...