On the role of lexical features in sequence labeling

Yoav Goldberg, Michael Elhadad

Research output: Contribution to conferencePaperpeer-review

9 Scopus citations

Abstract

We use the technique of SVM anchoring to demonstrate that lexical features extracted from a training corpus are not necessary to obtain state of the art results on tasks such as Named Entity Recognition and Chunking. While standard models require as many as 100K distinct features, we derive models with as little as 1K features that perform as well or better on different domains. These robust reduced models indicate that the way rare lexical features contribute to classification in NLP is not fully understood. Contrastive error analysis (with and without lexical features) indicates that lexical features do contribute to resolving some semantic and complex syntactic ambiguities - but we find this contribution does not generalize outside the training corpus. As a general strategy, we believe lexical features should not be directly derived from a training corpus but instead, carefully inferred and selected from other sources.

Original languageEnglish
Pages1142-1151
Number of pages10
DOIs
StatePublished - 1 Jan 2009
Event2009 Conference on Empirical Methods in Natural Language Processing, EMNLP 2009, Held in Conjunction with ACL-IJCNLP 2009 - Singapore, Singapore
Duration: 6 Aug 20097 Aug 2009

Conference

Conference2009 Conference on Empirical Methods in Natural Language Processing, EMNLP 2009, Held in Conjunction with ACL-IJCNLP 2009
Country/TerritorySingapore
CitySingapore
Period6/08/097/08/09

ASJC Scopus subject areas

  • Information Systems
  • Computational Theory and Mathematics
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'On the role of lexical features in sequence labeling'. Together they form a unique fingerprint.

Cite this