The Likelihood Gain of a Language Model as a Metric for Text Summarization

Dana Levin, Alon Kipnis

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

The gain in the log-likelihood (LLG) of a text under a language model (LM) when the text's summary is provided as a context to the LM, compared to no summary in the context, has been proposed as a reference-free index for the relevance of the summary to the text. We provide an information-theoretic interpretation of the LLG and an empirical analysis of the parts of speech affecting it most. We first show that the LLG describes the reduction in the binary codelength when the summary text is provided as side information to a lossless text compression system involving the LM and an entropy encoder. Consequently, under proper normalization, LLG is a form of the Normalized Compression Distance (NCD) and thus adheres to a universal information distance that is motivated by algorithmic information theory. Empirical results show that an NCD based on LLG is better correlated with human annotators than a gzip-based NCD. Additionally, we empirically show that LLG is affected almost exclusively by tokens associated with the text's content rather than tokens associated with its structure. Our findings support LLG as a natural and useful metric for evaluating text summarization methods.

Original languageEnglish
Title of host publication2024 IEEE International Symposium on Information Theory, ISIT 2024 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers
Pages2044-2049
Number of pages6
ISBN (Electronic)9798350382846
DOIs
StatePublished - 1 Jan 2024
Externally publishedYes
Event2024 IEEE International Symposium on Information Theory, ISIT 2024 - Athens, Greece
Duration: 7 Jul 202412 Jul 2024

Publication series

NameIEEE International Symposium on Information Theory - Proceedings
ISSN (Print)2157-8095

Conference

Conference2024 IEEE International Symposium on Information Theory, ISIT 2024
Country/TerritoryGreece
CityAthens
Period7/07/2412/07/24

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Information Systems
  • Modeling and Simulation
  • Applied Mathematics

Fingerprint

Dive into the research topics of 'The Likelihood Gain of a Language Model as a Metric for Text Summarization'. Together they form a unique fingerprint.

Cite this