Concentration bounds for the empirical angular measure with statistical learning applications

Archive ouverte

Clémençon, Stéphan | Jalalzai, Hamid | Lhaut, Stéphane | Sabourin, Anne | Segers, Johan

Edité par CCSD ; Bernoulli Society for Mathematical Statistics and Probability -

International audience. The angular measure on the unit sphere characterizes the first-order dependence structure of the components of a random vector in extreme regions and is defined in terms of standardized margins. Its statistical recovery is an important step in learning problems involving observations far away from the center. In the common situation that the components of the vector have different distributions, the rank transformation offers a convenient and robust way of standardizing data in order to build an empirical version of the angular measure based on the most extreme observations. However, the study of the sampling distribution of the resulting empirical angular measure is challenging. It is the purpose of the paper to establish finite-sample bounds for the maximal deviations between the empirical and true angular measures, uniformly over classes of Borel sets of controlled combinatorial complexity. The bounds are valid with high probability and, up to logarithmic factors, scale as the square root of the effective sample size. The bounds are applied to provide performance guarantees for two statistical learning procedures tailored to extreme regions of the input space and built upon the empirical angular measure: binary classification in extreme regions through empirical risk minimization and unsupervised anomaly detection through minimumvolume sets of the sphere.

Suggestions

Du même auteur

On Binary Classification in Extreme Regions

Archive ouverte | Jalalzai, Hamid | CCSD

International audience. In pattern recognition, a random label Y is to be predicted based upon observing a random vector X valued in Rd with d ≥ 1 by means of a classificationrule with minimum probability of error. ...

Heavy-tailed Representations, Text Polarity Classification & Data Augmentation

Archive ouverte | Jalalzai, Hamid | CCSD

The dominant approaches to text representation in natural language rely on learning embeddings on massive corpora which have convenient properties such as compositionality and distance preservation. In this paper, we develop a nov...

MARGINAL STANDARDIZATION OF UPPER SEMICONTINUOUS PROCESSES. WITH APPLICATION TO MAX-STABLE PROCESSES

Archive ouverte | Sabourin, Anne | CCSD

Extreme-value theory for random vectors and stochastic processes with continuous trajectories is usually formulated for random objects all of whose univariate marginal distributions are identical. In the spirit of Sklar's theorem ...

Chargement des enrichissements...