An unsupervised approach to biography production using Wikipedia

  • Fadi Biadsy
  • , Julia Hirschberg
  • , Elena Filatova

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

49 Scopus citations

Abstract

We describe an unsupervised approach to multi-document sentence-extraction based summarization for the task of producing biographies. We utilize Wikipedia to automatically construct a corpus of biographical sentences and TDT4 to construct a corpus of non-biographical sentences. We build a biographical-sentence classifier from these corpora and an SVM regression model for sentence ordering from the Wikipedia corpus. We evaluate our work on the DUC2004 evaluation data and with human judges. Overall, our system significantly outperforms all systems that participated in DUC2004, according to the ROUGE-L metric, and is preferred by human subjects.

Original languageEnglish
Title of host publicationACL-08
Subtitle of host publicationHLT - 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference
Pages807-815
Number of pages9
StatePublished - 1 Dec 2008
Externally publishedYes
Event46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, ACL-08: HLT - Columbus, OH, United States
Duration: 15 Jun 200820 Jun 2008

Publication series

NameACL-08: HLT - 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference

Conference

Conference46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, ACL-08: HLT
Country/TerritoryUnited States
CityColumbus, OH
Period15/06/0820/06/08

ASJC Scopus subject areas

  • Language and Linguistics
  • Computer Networks and Communications
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'An unsupervised approach to biography production using Wikipedia'. Together they form a unique fingerprint.

Cite this