Sentence Compression as a Supervised Learning with a Rich Feature Space

Elena Churkin, Mark Last, Marina Litvak, Natalia Vanetik

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

We present a novel supervised approach to sentence compression, based on classification and removal of word sequences generated from subtrees of the original sentence dependency tree. Our system may use any known classifier like Support Vector Machines or Logistic Model Tree to identify word sequences that can be removed without compromising the grammatical correctness of the compressed sentence. We trained our system using several classifiers on a small annotated dataset of 100 sentences, which included around 1500 manually labeled subtrees (removal candidates) represented by 25 features. The highest cross-validation classification accuracy of 80% was obtained with the SMO (Normalized Poly Kernel) algorithm. We evaluated the readability and the informativeness of the sentences compressed by the SMO-based classification model with the help of human raters using a separate benchmark dataset of 200 sentences.

Original languageEnglish
Title of host publicationComputational Linguistics and Intelligent Text Processing - 19th International Conference, CICLing 2018, Revised Selected Papers
EditorsAlexander Gelbukh
PublisherSpringer Science and Business Media Deutschland GmbH
Pages261-271
Number of pages11
ISBN (Print)9783031238031
DOIs
StatePublished - 1 Jan 2023
Event19th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2018 - Hanoi, Viet Nam
Duration: 18 Mar 201824 Mar 2018

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume13397 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference19th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2018
Country/TerritoryViet Nam
CityHanoi
Period18/03/1824/03/18

Keywords

  • Sentence compression
  • Supervised learning
  • Syntactic dependencies

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Sentence Compression as a Supervised Learning with a Rich Feature Space'. Together they form a unique fingerprint.

Cite this