ODIL Syntax : a Free Spontaneous Spoken French Treebank Annotated with Constituent Trees

Archive ouverte

Wang, Ilaine | Pelletier, Aurore | Antoine, Jean-Yves | Halftermeyer, Anaïs

Edité par CCSD -

International audience. This paper describes ODIL Syntax, a French treebank built on spontaneous speech transcripts. The syntactic structure of every speech turn is represented by constituent trees, through a procedure which combines an automatic annotation provided by a parser (here, the Stanford Parser) and a manual revision. ODIL Syntax respects the annotation scheme designed for the French TreeBank (FTB), with the addition of some annotation guidelines that aims at representing specific features of the spoken language such as speech disfluencies. The corpus will be freely distributed by January 2020 under a Creative Commons licence. It will ground a further semantic enrichment dedicated to the representation of temporal entities and temporal relations, as a second phase of the ODIL@Temporal project. The paper details the annotation scheme we followed with a emphasis on the representation of speech disfluencies. We then present the annotation procedure that was carried out on the Contemplata annotation platform. In the last section, we provide some distributional characteristics of the annotated corpus (POS distribution, multiword expressions).

Suggestions

Du même auteur

Annoter la parole spontanée en arbres de constituants pour les besoins de l’analyse temporelle : résultats et comparaison français parlé / français écrit

Archive ouverte | Wang, Ilaine | CCSD

International audience. This paper presents the main results drawn from the syntactic part of Temporal@ODIL, a project whose objective is the construction of a temporally annotated corpus of spontaneous speech for F...

Combining Automatic Parsing and Manual Revision for the Constitution of a Spontaneous Speech Treebank : Experience Feedback on the ODIL_Syntaxe Corpus. Combiner parseur automatique et révision manuelle pour la constitution d'un corpus arboré de parole spontanée : retour d'expérience sur le corpus ODIL_syntaxe

Archive ouverte | Wang, Ilaine | CCSD

International audience. This paper describes a syntactic annotation platform (Contemplata) that integrates a parser (Stanford Parser precisely) to automatically annotate written text or oral transcriptions and then ...

A Methodology for the Comparison of Human Judgments With Metrics for Coreference Resolution

Archive ouverte | Borovikova, Mariya | CCSD

International audience. We propose a method for investigating the interpretability of metrics used for the coreference resolution task through comparisons with human judgments. We provide a corpus with annotations o...

Chargement des enrichissements...