MDL Approach for Unsupervised Multilingual Document Summarization

Natalia Vanetik, Marina Litvak

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

Abstract

In this chapter, we describe an approach for extractive summarization based on the minimum description length (MDL) principle and relying on the Krimp dataset compression algorithm1. We represent text as a transactional dataset, with sentences as transactions and normalized words as items; then describing the dataset by frequent itemsets of different types that provide the best compressed representation. The summary is compiled from sentences that best describe the document. The problem of extractive summarization is therefore reduced to the maximal coverage problem, following the assumption that a summary that best describes the original text should cover most of the itemsets describing the document. We test this approach on generic summarization tasks in English and Chinese, and on a query-based summarization (QS) task for English.

Original languageEnglish
Title of host publicationMultilingual Text Analysis
Subtitle of host publicationChallenges, Models, and Approaches
PublisherWorld Scientific Publishing Co.
Pages81-117
Number of pages37
ISBN (Electronic)9789813274884
ISBN (Print)9789813274877
DOIs
StatePublished - 1 Jan 2019
Externally publishedYes

ASJC Scopus subject areas

  • General Computer Science

Fingerprint

Dive into the research topics of 'MDL Approach for Unsupervised Multilingual Document Summarization'. Together they form a unique fingerprint.

Cite this