Evaluating Diversity of Multiword Expressions in Annotated Text

Archive ouverte

Lion-Bouton, Adam | Öztürk, Yağmur | Savary, Agata | Antoine, Jean-Yves

Edité par CCSD -

International audience. Diversity can be decomposed into three distinct concepts, namely: variety, balance and disparity. This paper borrows from the extensive formalization and measures of diversity developed in ecology in order to evaluate the variety and balance of multiword expression annotation produced by automatic annotation systems. The measures of richness, normalized richness, and two variations of Hill's evenness are considered in this paper. We observe how these measures behave against increasingly smaller samples of gold annotations of multiword expressions and use their comportment to validate or invalidate their pertinence for multiword expressions in annotated texts. We apply the validated measures to annotations in 14 languages produced by systems during the PARSEME shared task on automatic identification of multiword expressions and on the gold versions of the corpora. We also explore the limits of such evaluation by studying the impact of lemmatization errors in the Turkish corpus used in the shared task.

Suggestions

Du même auteur

Enhancing the PARSEME Turkish Corpus of Verbal Multiword Expressions

Archive ouverte | Ozturk, Yagmur | CCSD

International audience. The PARSEME (Parsing and Multiword Expressions) project proposes multilingual corpora annotated for multiword expressions (MWEs). In this case study, we focus on the Turkish corpus of PARSEME...

Combining Automatic Parsing and Manual Revision for the Constitution of a Spontaneous Speech Treebank : Experience Feedback on the ODIL_Syntaxe Corpus. Combiner parseur automatique et révision manuelle pour la constitution d'un corpus arboré de parole spontanée : retour d'expérience sur le corpus ODIL_syntaxe

Archive ouverte | Wang, Ilaine | CCSD

International audience. This paper describes a syntactic annotation platform (Contemplata) that integrates a parser (Stanford Parser precisely) to automatically annotate written text or oral transcriptions and then ...

Seen2Unseen at PARSEME Shared Task 2020: All Roads do not Lead to Unseen Verb-Noun VMWEs

Archive ouverte | Pasquer, Caroline | CCSD

International audience. We describe the Seen2Unseen system that participated in edition 1.2 of the PARSEME shared task on automatic identification of verbal multiword expressions (VMWEs). The identification of VMWEs...

Chargement des enrichissements...