The pinkas dataset

Berat Kurar Barakat, Jihad El-Sana, Irina Rabaev

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

13 Scopus citations

Abstract

In historical document image processing, datasets account for a significant part of any research, and are crucial for the diversity and abundance of experimental results, which contribute to the development of new algorithms to meet the new challenge. Moreover, they are very important for benchmarking processing algorithms. Numerous publicly available document image datasets of different languages have been emerged. However, current segmentation and recognition performances are nearly saturated with respect to the present publicly available datasets. As such, collecting and labelling historical document images is a burden on historical document image processing researchers. This paper introduces a public historical document image dataset, Pinkas dataset, with new challenges to open room for improvement and identify strengths and weaknesses of available processing algorithms. It is the first dataset in medieval handwritten Hebrew and fully labeled at word, line and page level by an expert of historical Hebrew manuscripts. Pinkas dataset contributes to the diversity of benchmarking standards. In this paper we present meta features of Pinkas dataset and apply recent word spotting algorithms to analyze the room for improvement in terms of performance.

Original languageEnglish
Title of host publicationProceedings - 15th IAPR International Conference on Document Analysis and Recognition, ICDAR 2019
PublisherInstitute of Electrical and Electronics Engineers
Pages732-737
Number of pages6
ISBN (Electronic)9781728128610
DOIs
StatePublished - 1 Sep 2019
Event15th IAPR International Conference on Document Analysis and Recognition, ICDAR 2019 - Sydney, Australia
Duration: 20 Sep 201925 Sep 2019

Publication series

NameProceedings of the International Conference on Document Analysis and Recognition, ICDAR
ISSN (Print)1520-5363

Conference

Conference15th IAPR International Conference on Document Analysis and Recognition, ICDAR 2019
Country/TerritoryAustralia
CitySydney
Period20/09/1925/09/19

Keywords

  • Handwritten dataset
  • Handwritten hebrew dataset
  • Historical document image analysis

ASJC Scopus subject areas

  • Computer Vision and Pattern Recognition

Fingerprint

Dive into the research topics of 'The pinkas dataset'. Together they form a unique fingerprint.

Cite this