Annoter la parole spontanée en arbres de constituants pour les besoins de l’analyse temporelle : résultats et comparaison français parlé / français écrit

Archive ouverte

Wang, Ilaine | Antoine, Jean-Yves | Abouda, Lotfi | Waszczuk, Jakub | Pelletier, Aurore | Halftermeyer, Anaïs

Edité par CCSD -

International audience. This paper presents the main results drawn from the syntactic part of Temporal@ODIL, a project whose objective is the construction of a temporally annotated corpus of spontaneous speech for French. We describe ODIL_Syntax, a freely distributed constituency treebank on which our temporal annotation is grounded. The syntactic annotation was performed on Contemplata, a Web-based annotation platform developed specifically for our project, which is also freely distributed and which integrates a syntactic parser, allowing a semiautomatic annotation. This paper gives a description of the annotation guidelines and the annotation procedure using Contemplata, as well as a statistical description of our corpus, compared with the French Treebank, the largest constituency-based resource for written French . Cet article présente les principaux résultats de la partie syntaxique du projet Temporal@ODIL, une initiative visant la construction d'un corpus de français parlé spontané annoté en temporalité. Nous présentons ici ODIL_Syntax, corpus arboré en constituants sur lequel s'appuie l'annotation temporelle et qui est diffusé librement sous licence Creative Commons. ODIL_Syntax a été créé à l'aide de Contemplata, une plateforme Web d'annotation développée spécifiquement dans le cadre du projet, diffusée elle aussi librement et qui présente l'intérêt de permettre une annotation semi-automatique utilisant un analyseur syntaxique. L'article décrit la procédure d'annotation avec cet outil, nos choix d'annotation ainsi que le corpus produit, en s'intéressant en particulier à une comparaison avec le corpus équivalent FTB (French Treebank) développé pour l'écrit. Abstract. Constituency annotation of spontaneous speech for temporal analysis needs: results and comparison between spoken and written French. This paper presents the main results drawn from the syntactic part of Temporal@ODIL, a project whose objective is the construction of a temporally annotated corpus of spontaneous speech for French. We describe ODIL_Syntax, a freely distributed constituency treebank on which our temporal annotation is grounded. The syntactic annotation was performed on Contemplata, a Web-based annotation platform developed specifically for our project, which is also freely distributed and which integrates a syntactic parser, allowing a semi-automatic annotation. This paper gives a description of the annotation guidelines and the annotation procedure using Contemplata, as well as a statistical description of our corpus, compared with the French Treebank, the largest constituency-based resource for written French.

Suggestions

Du même auteur

Combining Automatic Parsing and Manual Revision for the Constitution of a Spontaneous Speech Treebank : Experience Feedback on the ODIL_Syntaxe Corpus. Combiner parseur automatique et révision manuelle pour la constitution d'un corpus arboré de parole spontanée : retour d'expérience sur le corpus ODIL_syntaxe

Archive ouverte | Wang, Ilaine | CCSD

International audience. This paper describes a syntactic annotation platform (Contemplata) that integrates a parser (Stanford Parser precisely) to automatically annotate written text or oral transcriptions and then ...

ODIL Syntax : a Free Spontaneous Spoken French Treebank Annotated with Constituent Trees

Archive ouverte | Wang, Ilaine | CCSD

International audience. This paper describes ODIL Syntax, a French treebank built on spontaneous speech transcripts. The syntactic structure of every speech turn is represented by constituent trees, through a proced...

A Methodology for the Comparison of Human Judgments With Metrics for Coreference Resolution

Archive ouverte | Borovikova, Mariya | CCSD

International audience. We propose a method for investigating the interpretability of metrics used for the coreference resolution task through comparisons with human judgments. We provide a corpus with annotations o...

Chargement des enrichissements...