A new approach to improving multilingual summarization using a genetic algorithm

Marina Litvak, Mark Last, Menahem Friedman

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

77 Scopus citations

Abstract

Automated summarization methods can be defined as "language- independent," if they are not based on any languagespecific knowledge. Such methods can be used for multilingual summarization defined by Mani (2001) as "processing several languages, with summary in the same language as input." In this paper, we introduce MUSE, a languageindependent approach for extractive summarization based on the linear optimization of several sentence ranking measures using a genetic algorithm. We tested our methodology on two languages-English and Hebrew-and evaluated its performance with ROUGE-1 Recall vs. stateof- the-art extractive summarization approaches. Our results show that MUSE performs better than the best known multilingual approach (TextRank1) in both languages. Moreover, our experimental results on a bilingual (English and Hebrew) document collection suggest that MUSE does not need to be retrained on each language and the same model can be used across at least two different languages.

Original languageEnglish
Title of host publicationACL 2010 - 48th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference
Pages927-936
Number of pages10
StatePublished - 1 Dec 2010
Event48th Annual Meeting of the Association for Computational Linguistics, ACL 2010 - Uppsala, Sweden
Duration: 11 Jul 201016 Jul 2010

Publication series

NameACL 2010 - 48th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference

Conference

Conference48th Annual Meeting of the Association for Computational Linguistics, ACL 2010
Country/TerritorySweden
CityUppsala
Period11/07/1016/07/10

ASJC Scopus subject areas

  • Language and Linguistics
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'A new approach to improving multilingual summarization using a genetic algorithm'. Together they form a unique fingerprint.

Cite this