Information rate of meaningful communication

Doron Sivan, Misha Tsodyks

Research output: Contribution to journalArticlepeer-review

Abstract

In Shannon’s seminal paper, the entropy of printed English, treated as a stationary stochastic process, was estimated to be roughly 1 bit per character. However, considered as a means of communication, language differs considerably from its printed form: i) the units of information are not characters or even words but clauses, i.e., shortest meaningful parts of speech; and ii) what is transmitted is principally the meaning of what is being said or written, while the precise phrasing that was used to communicate the meaning is typically ignored. In this study, we show that one can leverage recently developed large language models to quantify information communicated in meaningful narratives in terms of bits of meaning per clause.

Original languageEnglish
Article numbere2502353122
JournalProceedings of the National Academy of Sciences of the United States of America
Volume122
Issue number25
DOIs
StatePublished - 24 Jun 2025
Externally publishedYes

Keywords

  • information theory
  • large language models
  • semantics

ASJC Scopus subject areas

  • General

Fingerprint

Dive into the research topics of 'Information rate of meaningful communication'. Together they form a unique fingerprint.

Cite this