Heavy-tailed Representations, Text Polarity Classification & Data Augmentation

Archive ouverte

Jalalzai, Hamid | Colombo, Pierre | Clavel, Chloé | Gaussier, Éric | Varni, Giovanna | Vignon, Emmanuel | Sabourin, Anne

Edité par CCSD -

The dominant approaches to text representation in natural language rely on learning embeddings on massive corpora which have convenient properties such as compositionality and distance preservation. In this paper, we develop a novel method to learn a heavy-tailed embedding with desirable regularity properties regarding the distributional tails, which allows to analyze the points far away from the distribution bulk using the framework of multivariate extreme value theory. In particular, a classifier dedicated to the tails of the proposed embedding is obtained which exhibits a scale invariance property exploited in a novel text generation method for label preserving dataset augmentation. Experiments on synthetic and real text data show the relevance of the proposed framework and confirm that this method generates meaningful sentences with controllable attribute, e.g. positive or negative sentiments.

Suggestions

Du même auteur

Guider l'attention dans les modèles de séquence à séquence pour la prédiction des actes de dialogue

Archive ouverte | Chapuis, Emile | CCSD

International audience. La prédiction d’actes de dialogue (AD) basés sur le dialogue conversationnel est un élément clé dans le développement des agents conversationnels. La prédiction précise des AD nécessite une m...

Automatic Text Evaluation through the Lens of Wasserstein Barycenters

Archive ouverte | Colombo, Pierre | CCSD

International audience. A new metric BaryScore to evaluate text generation based on deep contextualized embeddings (e.g., BERT, Roberta, ELMo) is introduced. This metric is motivated by a new framework relying on op...

Concentration bounds for the empirical angular measure with statistical learning applications

Archive ouverte | Clémençon, Stéphan | CCSD

International audience. The angular measure on the unit sphere characterizes the first-order dependence structure of the components of a random vector in extreme regions and is defined in terms of standardized margi...

Chargement des enrichissements...