Embible: Reconstruction of Ancient Hebrew and Aramaic Texts Using Transformers

Niv Fono, Harel Moshayof, Eldar Karol, Itay Asraf, Mark Last

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

Hebrew and Aramaic inscriptions serve as an essential source of information on the ancient history of the Near East. Unfortunately, some parts of the inscribed texts become illegible over time. Special experts, called epigraphists, use time-consuming manual procedures to estimate the missing content. This problem can be considered an extended masked language modeling task, where the damaged content can comprise single characters, character n-grams (partial words), single complete words, and multi-word n-grams. This study is the first attempt to apply the masked language modeling approach to corrupted inscriptions in Hebrew and Aramaic languages, both using the Hebrew alphabet consisting mostly of consonant symbols. In our experiments, we evaluate several transformer-based models, which are fine-tuned on the Biblical texts and tested on three different percentages of randomly masked parts in the testing corpus. For any masking percentage, the highest text completion accuracy is obtained with a novel ensemble of word and character prediction models.

Original languageEnglish
Title of host publicationEACL 2024 - 18th Conference of the European Chapter of the Association for Computational Linguistics, Findings of EACL 2024
EditorsYvette Graham, Matthew Purver, Matthew Purver
PublisherAssociation for Computational Linguistics (ACL)
Pages846-852
Number of pages7
ISBN (Electronic)9798891760936
StatePublished - 1 Jan 2024
Event18th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2024 - Findings of EACL 2024 - St. Julian's, Malta
Duration: 17 Mar 202422 Mar 2024

Publication series

NameEACL 2024 - 18th Conference of the European Chapter of the Association for Computational Linguistics, Findings of EACL 2024

Conference

Conference18th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2024 - Findings of EACL 2024
Country/TerritoryMalta
CitySt. Julian's
Period17/03/2422/03/24

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Software
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Embible: Reconstruction of Ancient Hebrew and Aramaic Texts Using Transformers'. Together they form a unique fingerprint.

Cite this