Domain adaptation of a dependency parser with a class-class selectional preference model

Raphael Cohen, Yoav Goldberg, Michael Elhadad

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

When porting parsers to a new domain, many of the errors are related to wrong attachment of out-of-vocabulary words. Since there is no available annotated data to learn the attachment preferences of the target domain words, we attack this problem using a model of selectional preferences based on domainspecific word classes. Our method uses Latent Dirichlet Allocations (LDA) to learn a
domain-specific Selectional Preference model in the target domain using un-annotated data. The model provides features that model the affinities among pairs of words in the domain. To incorporate these new features in the
parsing model, we adopt the co-training approach and retrain the parser with the selectional preferences features. We apply this method for adapting Easy First, a fast nondirectional parser trained on WSJ, to the biomedical domain (Genia Treebank). The Selectional Preference features reduce error by
4.5% over the co-training baseline
Original languageEnglish GB
Title of host publicationProceedings of ACL 2012 Student Research Workshop
PublisherAssociation for Computational Linguistics (ACL)
Pages43-48
Number of pages6
StatePublished - 2012

Cite this