Enhancing unlexicalized parsing performance using a wide coverage lexicon, fuzzy tag-set mapping, and EM-HMM-based lexical probabilities

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    20 Scopus citations

    Abstract

    We present a framework for interfacing a PCFG parser with lexical information from an external resource following a different tagging scheme than the treebank. This is achieved by defining a stochastic mapping layer between the two resources. Lexical probabilities for rare events are estimated in a semi-supervised manner from a lexicon and large unannotated corpora. We show that this solution greatly enhances the performance of an unlexicalized Hebrew PCFG parser, resulting in state-of-the-art Hebrew parsing results both when a segmentation oracle is assumed, and in a real-word parsing scenario of parsing unsegmented tokens.

    Original languageEnglish
    Title of host publicationEACL 2009 - 12th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings
    PublisherAssociation for Computational Linguistics (ACL)
    Pages327-335
    Number of pages9
    ISBN (Print)9781932432169
    DOIs
    StatePublished - 1 Jan 2009
    Event12th Conference of the European Chapter of the Association for Computational Linguistics , EACL 2009 Student Research Workshop - Athens, Greece
    Duration: 30 Mar 20093 Apr 2009

    Publication series

    NameEACL 2009 - 12th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings

    Conference

    Conference12th Conference of the European Chapter of the Association for Computational Linguistics , EACL 2009 Student Research Workshop
    Country/TerritoryGreece
    CityAthens
    Period30/03/093/04/09

    ASJC Scopus subject areas

    • Language and Linguistics
    • Linguistics and Language

    Fingerprint

    Dive into the research topics of 'Enhancing unlexicalized parsing performance using a wide coverage lexicon, fuzzy tag-set mapping, and EM-HMM-based lexical probabilities'. Together they form a unique fingerprint.

    Cite this