Annotating Verbal Multiword Expressions in Arabic: Assessing the Validity of a Multilingual Annotation Procedure

Archive ouverte

Hadj Mohamed, Najet | Khelil, Cherifa, Ben | Savary, Agata | Keskes, Iskandar | Antoine, Jean-Yves | Hadrich, Lamia, Belguith

Edité par CCSD -

International audience. This paper describes our efforts to extend the PARSEME framework to Modern Standard Arabic. The applicability of the PARSEME guidelines was tested by measuring the inter-annotator agreement in the early annotation stage. A subset of 1,062 sentences from the Prague Arabic Dependency Treebank PADT was selected and annotated by two Arabic native speakers independently. Following their annotations, a new Arabic corpus with over 1,250 annotated VMWEs has been built. This corpus already exceeds the smallest corpora of the PARSEME suite, and enables first observations. We discuss our annotation guideline schema that shows full MWE annotation is realizable in Arabic where we get good inter-annotator agreement.

Suggestions

Du même auteur

Enhancing the PARSEME Turkish Corpus of Verbal Multiword Expressions

Archive ouverte | Ozturk, Yagmur | CCSD

International audience. The PARSEME (Parsing and Multiword Expressions) project proposes multilingual corpora annotated for multiword expressions (MWEs). In this case study, we focus on the Turkish corpus of PARSEME...

PARSEME corpus release 1.3

Archive ouverte | Savary, Agata | CCSD

International audience

Evaluating Diversity of Multiword Expressions in Annotated Text

Archive ouverte | Lion-Bouton, Adam | CCSD

International audience. Diversity can be decomposed into three distinct concepts, namely: variety, balance and disparity. This paper borrows from the extensive formalization and measures of diversity developed in ec...

Chargement des enrichissements...