An unsupervised constrained optimization approach to compressive summarization

Natalia Vanetik, Marina Litvak, Elena Churkin, Mark Last

Research output: Contribution to journalArticlepeer-review

18 Scopus citations

Abstract

Automatic summarization is typically aimed at selecting as much information as possible from text documents using a predefined number of words. Extracting complete sentences into a summary is not an optimal way to solve this problem due to redundant information that is contained in some sentences. Removing the redundant information and compiling a summary from compressed sentences should provide a much more accurate result. Major challenges of compressive approaches include the cost of creating large summarization corpora for training the supervised methods, the linguistic quality of compressed sentences, the coverage of the relevant content, and the time complexity of the compression procedure. In this work, we attempt to address these challenges by proposing an unsupervised polynomial-time compressive summarization algorithm. The proposed algorithm iteratively removes redundant parts from original sentences. It uses constituency-based parse trees and hand-crafted rules for generating elementary discourse units (EDUs) from their subtrees (standing for phrases) and selects ones with a sufficient tree gain. We define a parse tree gain as a weighted function of its node weights, which can be computed by any extractive summarization model capable of assigning importance weights to terms. The results of automatic evaluations on a single-document summarization task confirm that the proposed sentence compression procedure helps to avoid redundant information in the generated summaries. Furthermore, the results of human evaluations confirm that the linguistic quality—in terms of readability and coherency—is preserved in the compressed summaries while improving their coverage. However, the same evaluations show that compression in general harms the grammatical correctness of compressed sentences though, in most cases, this effect is not significant for the proposed compression procedure.

Original languageEnglish
Pages (from-to)22-35
Number of pages14
JournalInformation Sciences
Volume509
DOIs
StatePublished - 1 Jan 2020

Keywords

  • Budgeted sentence compression
  • Compressive summarization
  • Polytope model

ASJC Scopus subject areas

  • Software
  • Control and Systems Engineering
  • Theoretical Computer Science
  • Computer Science Applications
  • Information Systems and Management
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'An unsupervised constrained optimization approach to compressive summarization'. Together they form a unique fingerprint.

Cite this