Optimized Couplings for Watermarking Large Language Models

Carol Xuan Long, Dor Tsur, Claudio Mayrink Verdun, Hsiang Hsu, Haim Permuter, Flavio P. Calmon

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Large-language models (LLMs) are now able to produce text that is indistinguishable from human-generated content. This has fueled the development of watermarks that imprint a 'signal' in LLM-generated text with minimal perturbation of an LLM's output. This paper provides an analysis of text watermarking in a one-shot setting. Through the lens of hypothesis testing with side information, we formulate and analyze the fundamental trade-off between watermark detection power and distortion in generated textual quality. We argue that a key component in watermark design is generating a coupling between the side information shared with the watermark detector and a random partition of the LLM vocabulary. Our analysis identifies the optimal coupling and randomization strategy under the worst-case LLM next-token distribution that satisfies a minentropy constraint. We provide a closed-form expression of the resulting detection rate under the proposed scheme and quantify the cost in a max-min sense. Finally, we numerically compare the proposed scheme with the theoretical optimum.

Original languageEnglish
Title of host publicationISIT 2025 - 2025 IEEE International Symposium on Information Theory, Proceedings
PublisherInstitute of Electrical and Electronics Engineers
ISBN (Electronic)9798331543990
DOIs
StatePublished - 1 Jan 2025
Event2025 IEEE International Symposium on Information Theory, ISIT 2025 - Ann Arbor, United States
Duration: 22 Jun 202527 Jun 2025

Publication series

NameIEEE International Symposium on Information Theory - Proceedings
ISSN (Print)2157-8095

Conference

Conference2025 IEEE International Symposium on Information Theory, ISIT 2025
Country/TerritoryUnited States
CityAnn Arbor
Period22/06/2527/06/25

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Information Systems
  • Modeling and Simulation
  • Applied Mathematics

Fingerprint

Dive into the research topics of 'Optimized Couplings for Watermarking Large Language Models'. Together they form a unique fingerprint.

Cite this