A French corpus annotated for multiword expressions and named entities

Archive ouverte

Candito, Marie | Constant, Mathieu | Ramisch, Carlos | Savary, Agata | Guillaume, Bruno | Parmentier, Yannick | Cordeiro, Silvio Ricardo

Edité par CCSD ; Institute of Computer Science, Polish Academy of Sciences, Poland -

Available at https://jlm.ipipan.waw.pl/index.php/JLM/article/view/265.. International audience. We present the enrichment of a French treebank of various genres with a new annotation layer for multiword expressions (MWEs) and named entities (NEs).1 Our contribution with respect to previous work on NE and MWE annotation is the particular care taken to use formal criteria, organized into decision flowcharts, shedding some light on the interactions between NEs and MWEs. Moreover, in order to cope with the well-known difficulty to draw a clear-cut frontier between compositional expressions and MWEs, we chose to use sufficient criteria only. As a result, annotated MWEs satisfy a varying number of sufficient criteria, accounting for the scalar nature of the MWE status.In addition to the span of the elements, annotation includes the subcategory of NEs (e.g., person, location) and one matching sufficient criterion for non-verbal MWEs (e.g., lexical substitution). The 3,099 sentences of the treebank were double-annotated and adjudicated, and we paid attention to cross-type consistency and compatibility with thesyntactic layer. Overall inter-annotator agreement on non-verbal MWEs and NEs reached 71.1%. The released corpus contains 3,112 annotated NEs and 3,440 MWEs, and is distributed under an open license.

Suggestions

Du même auteur

PARSEME corpus release 1.3

Archive ouverte | Savary, Agata | CCSD

International audience

Advances in Multiword Expression Identification for the Italian language: The PARSEME shared task edition 1.1

Archive ouverte | Monti, Johanna | Accademia University Press

This contribution describes the results of the second edition of the shared task on automatic identification of verbal multiword expressions, organized as part of the LAW-MWE-CxG 2018 workshop, co-located with COLING 2018, concern...

UniDive: A COST Action on Universality, Diversity and Idiosyncrasy in Language Technology

Archive ouverte | Savary, Agata | CCSD

International audience. This paper presents the objectives, organization and activities of the UniDive COST Action, a scientific network dedicated to universality, diversity and idiosyncrasy in language technology. ...

Chargement des enrichissements...