NYTWIT: A Dataset of Novel Words in the New York Times

Yuval Pinter, Cassandra L. Jacobs, Max Bittker

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

We present the New York Times Word Innovation Types dataset, or NYTWIT, a collection of over 2,500 novel English words published in the New York Times between November 2017 and March 2019, manually annotated for their class of novelty (such as lexical derivation, dialectal variation, blending, or compounding). We present baseline results for both uncontextual and contextual prediction of novelty class, showing that there is room for improvement even for state-of-the-art NLP systems. We hope this resource will prove useful for linguists and NLP practitioners by providing a real-world environment of novel word appearance.
Original languageEnglish
Title of host publicationProceedings of the 28th International Conference on Computational Linguistics (Online)
Place of PublicationBarcelona
PublisherInternational Committee on Computational Linguistics
Pages6509-6515
Number of pages7
DOIs
StatePublished - 1 Dec 2020

Fingerprint

Dive into the research topics of 'NYTWIT: A Dataset of Novel Words in the New York Times'. Together they form a unique fingerprint.

Cite this