Language-independent Techniques for Automated Text Summarization.

Mark Last, Marina Litvak

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

Abstract

Text summarization is the process of distilling the most important information from source/sources to produce an abridged version for a particular user/users and task/tasks. Automatically generated summaries can significantly reduce the information overload on intelligence analysts in their daily work. Moreover, automated text summarization can be utilized for automated classification and filtering of text documents, information search over the Internet, content recommendation systems, online social networks, etc. The increasing trend of cross-border globalization accompanied by the growing multi-linguality of the Internet requires text summarization techniques to work
equally well on multiple languages. However, only some of the automated summarization methods proposed in the literature can be defined as “multi-lingual" or
“language-independent," as they are not based on any morphological analysis of
the summarized text.
In this chapter, we present a novel approach called MUSE (MUltilingual Sentence Extractor) to “language-independent" extractive summarization, which represents the summary as a collection of the most informative fragments of the
summarized document without any language-specific text analysis. We use a Genetic Algorithm to find the best linear combination of 31 sentence scoring metrics
based on vector and graph representations of text documents. Our summarization
methodology is evaluated on two monolingual corpora of English and Hebrew documents, and, in addition, on a bilingual collection of English and Hebrew documents. The results are compared to 15 statistical sentence scoring methods for extractive single-document summarization found in the literature and to several stateof-the-art summarization tools. These bilingual experiments show that the MUSE
methodology significantly outperforms the existing approaches and tools in both
languages.
Original languageEnglish
Title of host publicationWeb Intelligence and Security
Subtitle of host publicationAdvances in Data and Text Mining Techniques for Detecting and Preventing Terrorist Activities on the Web
EditorsMark Last, Abraham Kandel
PublisherIOS Press
Pages207-237
Number of pages31
Volume27
ISBN (Electronic)9781607506119
ISBN (Print)9781607506102
DOIs
StatePublished - Oct 2010

Publication series

NameNATO Science for Peace and Security Series - D: Information and Communication Security
Volume27

Fingerprint

Dive into the research topics of 'Language-independent Techniques for Automated Text Summarization.'. Together they form a unique fingerprint.

Cite this