Restoring Hebrew DiacriticsWithout a Dictionary

Elazar Gershuni, Yuval Pinter

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

We demonstrate that it is feasible to accurately diacritize Hebrew script without any human-curated resources other than plain diacritized text. We present NAKDIMON, a two-layer character-level LSTM, that performs on par with much more complicated curationdependent systems, across a diverse array of modern Hebrew sources. The model is accompanied by a training set and a test set, collected from diverse sources.

Original languageEnglish
Title of host publicationFindings of the Association for Computational Linguistics
Subtitle of host publicationNAACL 2022 - Findings
PublisherAssociation for Computational Linguistics (ACL)
Pages1010-1018
Number of pages9
ISBN (Electronic)9781955917766
StatePublished - 1 Jan 2022
Event2022 Findings of the Association for Computational Linguistics: NAACL 2022 - Seattle, United States
Duration: 10 Jul 202215 Jul 2022

Publication series

NameFindings of the Association for Computational Linguistics: NAACL 2022

Conference

Conference2022 Findings of the Association for Computational Linguistics: NAACL 2022
Country/TerritoryUnited States
CitySeattle
Period10/07/2215/07/22

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computer Science Applications
  • Information Systems

Fingerprint

Dive into the research topics of 'Restoring Hebrew DiacriticsWithout a Dictionary'. Together they form a unique fingerprint.

Cite this