VML-HD: The historical Arabic documents dataset for recognition systems

  • Majeed Kassis
  • , Alaa Abdalhaleem
  • , Ahmad Droby
  • , Reem Alaasam
  • , Jihad El-Sana

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    43 Scopus citations

    Abstract

    In this paper we present a new database with handwritten Arabic script. It is based on five books written by different writers from the years 1088-1451. We took 680 pages from these five books, and fully annotated them on the sub-word level. For each page we manually applied bounding boxes on the different sub-words and annotated the sequence of characters. It consists of 121,636 sub-word appearances consisted of 244,553 characters out of a vocabulary of 1,731 forms of sub-words. The database is described in detail and is designed for training and testing recognition systems for handwritten Arabic sub-words. This database is available for the purpose of research, and we encourage researchers to develop and test new methods using our database.

    Original languageEnglish
    Title of host publication1st IEEE International Workshop on Arabic Script Analysis and Recognition, ASAR 2017
    PublisherInstitute of Electrical and Electronics Engineers
    Pages11-14
    Number of pages4
    ISBN (Electronic)9781509066285
    DOIs
    StatePublished - 13 Oct 2017
    Event1st IEEE International Workshop on Arabic Script Analysis and Recognition, ASAR 2017 - Nancy, France
    Duration: 3 Apr 20175 Apr 2017

    Publication series

    Name1st IEEE International Workshop on Arabic Script Analysis and Recognition, ASAR 2017

    Conference

    Conference1st IEEE International Workshop on Arabic Script Analysis and Recognition, ASAR 2017
    Country/TerritoryFrance
    CityNancy
    Period3/04/175/04/17

    ASJC Scopus subject areas

    • Computer Vision and Pattern Recognition
    • Linguistics and Language
    • Computer Science Applications

    Fingerprint

    Dive into the research topics of 'VML-HD: The historical Arabic documents dataset for recognition systems'. Together they form a unique fingerprint.

    Cite this