Is a deep learning algorithm effective for the classification of medieval Hebrew Scripts?

Daria Vasyutinsky Shapira, Irina Rabaev, Ahmad Droby, Berat Kurar Barakat, Jihad El Sana

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

Abstract

In this research, we apply deep-learning techniques to Hebrew paleography to automatically classify and process medieval Hebrew manuscripts. Our
work is based on contemporary Hebrew paleography (Malachi Beit-Arié, Colette
Sirat, Norman Golb, Ada Yardeni, Benjamin Richler) that recognizes fifteen subtypes of medieval Hebrew script. Automatic recognition of these scripts allows to determine the approximate origin and date of writing for not-dated, fragmentary, and damaged manuscripts. To train the deep neural network, we compile a Visual Media Lab – Hebrew Paleography (VML-HP) dataset that contains 537 high- resolution manuscript page images. The images were hand-picked from the SfarData (http:/sfardata.nli.org.il/) dataset; in some rare cases, we also included pages from other manuscripts’ collections. For testing the model, we define a notion of typical and blind test sets. The typical test set consists of the unseen pages of the manuscripts used in training. The blind test set, on the contrary, consists of pages from unseen manuscripts, thus, providing us with a real-life scenario. To train the model, we used patches extracted from the documents’ pages. To filter irrelevant patches (empty patches or patches that contain decorations), we developed a clean patch generation algorithm that can generate patches containing pure text regions (for the VML-HP dataset, we generated 150K train patches). In all the experiments, we trained the network on the training set and tested it on both test sets, typical and blind. The objective training function was cross-entropy loss and was minimized using the Adam optimizer algorithm. The training was performed until there was no improvement in validation loss with five epochs’ patience. The model with the least validation loss was used for testing.
Original languageEnglish
Title of host publicationJewish Studies in the Digital Age
PublisherDe Gruyter Oldenbourg
Pages349-362
Number of pages14
ISBN (Electronic)978-3-11-074482-8, 978-3-11-074488-0
ISBN (Print)9783110744699
DOIs
StatePublished - 2022

Fingerprint

Dive into the research topics of 'Is a deep learning algorithm effective for the classification of medieval Hebrew Scripts?'. Together they form a unique fingerprint.

Cite this