Unifying annotated discourse hierarchies to create a gold standard

Marco Carbone, Ya'akov Gal, Stuart Shieber, Barbara Grosz

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Scopus citations

Abstract

Human annotation of discourse corpora typically results in segmentation hierarchies that vary in their degree of agreement. This paper presents several techniques for unifying multiple discourse annotations into a single hierarchy, deemed a "gold standard" - the segmentation that best captures the underlying linguistic structure of the discourse. It proposes and analyzes methods that consider the level of embeddedness of a segmentation as well as methods that do not. A corpus containing annotated hierarchical discourses, the Boston Directions Corpus, was used to evaluate the "goodness" of each technique, by comparing the similarity of the segmentation it derives to the original annotations in the corpus. Several metrics of similarity between hierarchical segmentations are computed: precision/recall of matching utterances, pairwise inter-reliability scores (κ), and non-crossing-brackets. A novel method for unification that minimizes conflicts among annotators outperforms methods that require consensus among a majority for the κ and precision metrics, while capturing much of the structure of the discourse. When high recall is preferred, methods requiring a majority are preferable to those that demand full consensus among annotators.

Original languageEnglish
Title of host publicationProceedings of the SIGDIAL 2004 Workshop - 5th Annual Meeting of the Special Interest Group on Discourse and Dialogue
PublisherAssociation for Computational Linguistics (ACL)
Pages118-126
Number of pages9
ISBN (Electronic)1932432272, 9781932432275
StatePublished - 1 Jan 2004
Externally publishedYes
Event5th Annual Meeting of the Special Interest Group on Discourse and Dialogue, SIGDIAL 2004 Workshop - Cambridge, United States
Duration: 30 Apr 20041 May 2004

Publication series

NameProceedings of the SIGDIAL 2004 Workshop - 5th Annual Meeting of the Special Interest Group on Discourse and Dialogue

Conference

Conference5th Annual Meeting of the Special Interest Group on Discourse and Dialogue, SIGDIAL 2004 Workshop
Country/TerritoryUnited States
CityCambridge
Period30/04/041/05/04

ASJC Scopus subject areas

  • Computer Graphics and Computer-Aided Design
  • Computer Vision and Pattern Recognition
  • Modeling and Simulation
  • Human-Computer Interaction

Fingerprint

Dive into the research topics of 'Unifying annotated discourse hierarchies to create a gold standard'. Together they form a unique fingerprint.

Cite this