On Binary Classification in Extreme Regions

Archive ouverte

Jalalzai, Hamid | Clémençon, Stéphan | Sabourin, Anne

Edité par CCSD -

International audience. In pattern recognition, a random label Y is to be predicted based upon observing a random vector X valued in Rd with d ≥ 1 by means of a classificationrule with minimum probability of error. In a wide variety of applications, rangingfrom finance/insurance to environmental sciences through teletraffic data analysisfor instance, extreme (i.e. very large) observations X are of crucial importance,while contributing in a negligible manner to the (empirical) error however, simplybecause of their rarity. As a consequence, empirical risk minimizers generallyperform very poorly in extreme regions. It is the purpose of this paper to develop a general framework for classification in the extremes. Precisely, undernon-parametric heavy-tail assumptions for the class distributions, we prove thata natural and asymptotic notion of risk, accounting for predictive performance inextreme regions of the input space, can be defined and show that minimizers of anempirical version of a non-asymptotic approximant of this dedicated risk, basedon a fraction of the largest observations, lead to classification rules with goodgeneralization capacity, by means of maximal deviation inequalities in low probability regions. Beyond theoretical results, numerical experiments are presented inorder to illustrate the relevance of the approach developed

Suggestions

Du même auteur

Concentration bounds for the empirical angular measure with statistical learning applications

Archive ouverte | Clémençon, Stéphan | CCSD

International audience. The angular measure on the unit sphere characterizes the first-order dependence structure of the components of a random vector in extreme regions and is defined in terms of standardized margi...

Heavy-tailed Representations, Text Polarity Classification & Data Augmentation

Archive ouverte | Jalalzai, Hamid | CCSD

The dominant approaches to text representation in natural language rely on learning embeddings on massive corpora which have convenient properties such as compositionality and distance preservation. In this paper, we develop a nov...

A Multivariate Extreme Value Theory Approach to Anomaly Clustering and Visualization

Archive ouverte | Chiapino, Maël | CCSD

International audience. In a wide variety of situations, anomalies in the behaviour of a complex system, whose health is monitored through the observation of a random vector X = (X1,. .. , X d) valued in R d , corre...

Chargement des enrichissements...