NYTWIT: A Dataset of Novel Words in the New York Times

Yuval Pinter, Cassandra L. Jacobs, Max Bittker

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Scopus citations

Abstract

We present the New York Times Word Innovation Types dataset, or NYTWIT, a collection of over 2,500 novel English words published in the New York Times between November 2017 and March 2019, manually annotated for their class of novelty (such as lexical derivation, dialectal variation, blending, or compounding). We present baseline results for both uncontextual and contextual prediction of novelty class, showing that there is room for improvement even for state-of-the-art NLP systems. We hope this resource will prove useful for linguists and NLP practitioners by providing a real-world environment of novel word appearance.

Original languageEnglish
Title of host publicationCOLING 2020 - 28th International Conference on Computational Linguistics, Proceedings of the Conference
EditorsDonia Scott, Nuria Bel, Chengqing Zong
PublisherAssociation for Computational Linguistics (ACL)
Pages6509-6515
Number of pages7
ISBN (Electronic)9781952148279
StatePublished - 1 Jan 2020
Externally publishedYes
Event28th International Conference on Computational Linguistics, COLING 2020 - Virtual, Online, Spain
Duration: 8 Dec 202013 Dec 2020

Publication series

NameCOLING 2020 - 28th International Conference on Computational Linguistics, Proceedings of the Conference

Conference

Conference28th International Conference on Computational Linguistics, COLING 2020
Country/TerritorySpain
CityVirtual, Online
Period8/12/2013/12/20

ASJC Scopus subject areas

  • Computer Science Applications
  • Computational Theory and Mathematics
  • Theoretical Computer Science

Fingerprint

Dive into the research topics of 'NYTWIT: A Dataset of Novel Words in the New York Times'. Together they form a unique fingerprint.

Cite this