An unsupervised morpheme-based HMM for Hebrew morphological disambiguation

Meni Adler, Michael Elhadad

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

45 Scopus citations

Abstract

Morphological disambiguation is the process of assigning one set of morphological features to each individual word in a text. When the word is ambiguous (there are several possible analyses for the word), a disambiguation procedure based on the word context must be applied. This paper deals with morphological disambiguation of the Hebrew language, which combines morphemes into a word in both agglutinative and fusional ways. We present an unsupervised stochastic model - the only resource we use is a morphological analyzer - which deals with the data sparseness problem caused by the affixational morphology of the Hebrew language. We present a text encoding method for languages with affixational morphology in which the knowledge of word formation rules (which are quite restricted in Hebrew) helps in the disambiguation. We adapt HMM algorithms for learning and searching this text representation, in such a way that segmentation and tagging can be learned in parallel in one step. Results on a large scale evaluation indicate that this learning improves disambiguation for complex tag sets. Our method is applicable to other languages with affix morphology.

Original languageEnglish
Title of host publicationCOLING/ACL 2006 - 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference
PublisherAssociation for Computational Linguistics (ACL)
Pages665-672
Number of pages8
ISBN (Print)1932432655, 9781932432657
DOIs
StatePublished - 1 Jan 2006
Event21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, COLING/ACL 2006 - Sydney, NSW, Australia
Duration: 17 Jul 200621 Jul 2006

Publication series

NameCOLING/ACL 2006 - 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference
Volume1

Conference

Conference21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, COLING/ACL 2006
Country/TerritoryAustralia
CitySydney, NSW
Period17/07/0621/07/06

ASJC Scopus subject areas

  • Language and Linguistics
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'An unsupervised morpheme-based HMM for Hebrew morphological disambiguation'. Together they form a unique fingerprint.

Cite this