TY - JOUR
T1 - Tikkoun Sofrim
T2 - Making Ancient Manuscripts Digitally Accessible: The Case of Midrash Tanhuma
AU - Wecker, Alan J.
AU - Raziel-Kretzmer, Vered
AU - Kiessling, Benjamin
AU - Ezra, Daniel Stökl Ben
AU - Lavee, Moshe
AU - Kuflik, Tsvi
AU - Elovits, Dror
AU - Schorr, Moshe
AU - Schor, Uri
AU - Jablonski, Pawel
N1 - Publisher Copyright:
© 2022 Copyright held by the owner/author(s). Publication rights licensed to ACM.
PY - 2022/6/1
Y1 - 2022/6/1
N2 - Making ancient handwritten manuscripts accessible to the general public is challenging, for several reasons. Foremost, they are handwritten. Each and every one is unique, so there is a need for manual transcription for providing enough examples for training a machine-learning-based algorithm to automatically transcribe the handwritten text. Moreover, the quality of the text is diverse-over time the ink faded, pages were damaged, and so forth. Furthermore, the boundaries of the textual regions on a page and the lines of text are not standard. Sometimes there are corrections above the lines, the lines are curved, there are comments and annotations on the margins, and more. A possible solution for these challenges is having a "person in the loop."However, manual correction brings with it another challenge-how to address disagreement between annotations (as usually several corrections are considered before a decision is taken about the correct transcription). Tikkoun-Sofrim is a system that integrates automatic handwritten text recognition with manual, crowdsourced error correction, introducing an automatic decision process about when to stop asking for additional transcription and selecting the best transcription, declaring it as the recommended agreed reading. The system was applied to several manuscripts of "Midrash Tanhuma,"a medieval Hebrew rabbinic homiletic text, achieving a high level of success.
AB - Making ancient handwritten manuscripts accessible to the general public is challenging, for several reasons. Foremost, they are handwritten. Each and every one is unique, so there is a need for manual transcription for providing enough examples for training a machine-learning-based algorithm to automatically transcribe the handwritten text. Moreover, the quality of the text is diverse-over time the ink faded, pages were damaged, and so forth. Furthermore, the boundaries of the textual regions on a page and the lines of text are not standard. Sometimes there are corrections above the lines, the lines are curved, there are comments and annotations on the margins, and more. A possible solution for these challenges is having a "person in the loop."However, manual correction brings with it another challenge-how to address disagreement between annotations (as usually several corrections are considered before a decision is taken about the correct transcription). Tikkoun-Sofrim is a system that integrates automatic handwritten text recognition with manual, crowdsourced error correction, introducing an automatic decision process about when to stop asking for additional transcription and selecting the best transcription, declaring it as the recommended agreed reading. The system was applied to several manuscripts of "Midrash Tanhuma,"a medieval Hebrew rabbinic homiletic text, achieving a high level of success.
KW - CATTI
KW - HTR
KW - crowd-sourcing
KW - handwritten text recognition
KW - transcription
UR - https://www.scopus.com/pages/publications/85133705422
U2 - 10.1145/3476776
DO - 10.1145/3476776
M3 - Article
AN - SCOPUS:85133705422
SN - 1556-4673
VL - 15
JO - Journal on Computing and Cultural Heritage
JF - Journal on Computing and Cultural Heritage
IS - 2
M1 - 20
ER -