Topic concentration in query focused summarization datasets

Tal Baumel, Raphael Cohen, Michael Elhadad

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

12 Scopus citations


Query-Focused Summarization (QFS) summarizes a document cluster in response to a specific input query. QFS algorithms must combine query relevance assessment, central content identification, and redundancy avoidance. Frustratingly, state of the art algorithms designed for QFS do not significantly improve upon generic summarization methods, which ignore query relevance, when evaluated on traditional QFS datasets. We hypothesize this lack of success stems from the nature of the dataset. We define a task-based method to quantify topic concentration in datasets, i.e., the ratio of sentences within the dataset that are relevant to the query, and observe that the DUC 2005, 2006 and 2007 datasets suffer from very high topic concentration. We introduce TD-QFS, a new QFS dataset with controlled levels of topic concentration. We compare competitive baseline algorithms on TD-QFS and report strong improvement in ROUGE performance for algorithms that properly model query relevance as opposed to generic summarizers. We further present three new and simple QFS algorithms, RelSum, ThresholdSum, and TFIDF-KLSum that outperform state of the art QFS algorithms on the TD-QFS dataset by a large margin.

Original languageEnglish
Title of host publication30th AAAI Conference on Artificial Intelligence, AAAI 2016
PublisherAAAI press
Number of pages7
ISBN (Electronic)9781577357605
StatePublished - 1 Jan 2016
Event30th AAAI Conference on Artificial Intelligence, AAAI 2016 - Phoenix, United States
Duration: 12 Feb 201617 Feb 2016

Publication series

Name30th AAAI Conference on Artificial Intelligence, AAAI 2016


Conference30th AAAI Conference on Artificial Intelligence, AAAI 2016
Country/TerritoryUnited States

ASJC Scopus subject areas

  • Artificial Intelligence


Dive into the research topics of 'Topic concentration in query focused summarization datasets'. Together they form a unique fingerprint.

Cite this