Enhancing the PARSEME Turkish Corpus of Verbal Multiword Expressions

Archive ouverte

Ozturk, Yagmur | Hadj Mohamed, Najet | Lion-Bouton, Adam | Savary, Agata

Edité par CCSD -

International audience. The PARSEME (Parsing and Multiword Expressions) project proposes multilingual corpora annotated for multiword expressions (MWEs). In this case study, we focus on the Turkish corpus of PARSEME. Turkish is an agglutinative language and shows high inflection and derivation in word forms. This can cause some issues in terms of automatic morphosyntactic annotation. We provide an overview of the problems observed in the morphosyntactic annotation of the Turkish PARSEME corpus. These issues are mostly observed on the lemmas, which is important for the approximation of a type of an MWE. We propose modifications of the original corpus with some enhancements on the lemmas and parts of speech. The enhancements are then evaluated with an identification system from the PARSEME Shared Task 1.2 to detect MWEs, namely Seen2Seen. Results show increase in the F-measure for MWE identification, emphasizing the necessity of robust morphosyntactic annotation for MWE processing, especially for languages that show high surface variability.

Suggestions

Du même auteur

Evaluating Diversity of Multiword Expressions in Annotated Text

Archive ouverte | Lion-Bouton, Adam | CCSD

International audience. Diversity can be decomposed into three distinct concepts, namely: variety, balance and disparity. This paper borrows from the extensive formalization and measures of diversity developed in ec...

PARSEME corpus release 1.3

Archive ouverte | Savary, Agata | CCSD

International audience

Annotating Verbal Multiword Expressions in Arabic: Assessing the Validity of a Multilingual Annotation Procedure

Archive ouverte | Hadj Mohamed, Najet | CCSD

International audience. This paper describes our efforts to extend the PARSEME framework to Modern Standard Arabic. The applicability of the PARSEME guidelines was tested by measuring the inter-annotator agreement i...

Chargement des enrichissements...