The HHD Dataset

Irina Rabaev, Berat Kurar Barakat, Alexander Churkin, Jihad El-Sana

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

6 Scopus citations

Abstract

Benchmark datasets are important in document image processing field, as they allow to analyze different approaches and compare their performances in a fair manner. There exist benchmark datasets for several alphabets such as Latin, Arabic and Chinese, but not the Hebrew alphabet. In this paper, a handwritten Hebrew dataset, HHD, is introduced. The HHD dataset is collected from hand-filled forms, and accompanied by their ground truth at character, word and text line levels. Presently, the dataset contains around 1000 document images, and we continue to further enlarge it. To the best of our knowledge, this is the first comprehensive corpus of Hebrew handwritten documents, and we believe it will help leveraging Hebrew documents processing and document processing in general. The dataset can be useful for various research applications, such as word spotting, word recognition, text line alignment, and writer identification. The initial small subset of the HDD for character classification can be downloaded from https://www.cs.bgu.ac.illr-vberatldatalhhd-dataset.zip together with the training and test sets subdivisions. We also provide baseline results for character classification on this initial subset. In the near future, the full HHD dataset will be made freely available to the research community.

Original languageEnglish
Title of host publicationProceedings - 2020 17th International Conference on Frontiers in Handwriting Recognition, ICFHR 2020
PublisherInstitute of Electrical and Electronics Engineers
Pages228-233
Number of pages6
ISBN (Electronic)9781728199665
DOIs
StatePublished - 1 Sep 2020
Event17th International Conference on Frontiers in Handwriting Recognition, ICFHR 2020 - Dortmund, Germany
Duration: 7 Sep 202010 Sep 2020

Publication series

NameProceedings of International Conference on Frontiers in Handwriting Recognition, ICFHR
Volume2020-September
ISSN (Print)2167-6445
ISSN (Electronic)2167-6453

Conference

Conference17th International Conference on Frontiers in Handwriting Recognition, ICFHR 2020
Country/TerritoryGermany
CityDortmund
Period7/09/2010/09/20

Keywords

  • Ground truth
  • Handwritten document image dataset
  • Hebrew handwritten documents

ASJC Scopus subject areas

  • Computer Science Applications
  • Computer Vision and Pattern Recognition

Fingerprint

Dive into the research topics of 'The HHD Dataset'. Together they form a unique fingerprint.

Cite this