VML-HD: The historical Arabic documents dataset for recognition systems

Majeed Kassis, Alaa Abdalhaleem, Ahmad Droby, Reem Alaasam, Jihad El-Sana

Research output: Contribution to conferencePaperpeer-review

19 Scopus citations

Abstract

In this paper we present a new database with handwritten Arabic script. It is based on five books written by different writers from the years 1088-1451. We took 680 pages from these five books, and fully annotated them on the sub-word level. For each page we manually applied bounding boxes on the different sub-words and annotated the sequence of characters. It consists of 121,636 sub-word appearances consisted of 244,553 characters out of a vocabulary of 1,731 forms of sub-words. The database is described in detail and is designed for training and testing recognition systems for handwritten Arabic sub-words. This database is available for the purpose of research, and we encourage researchers to develop and test new methods using our database.
Original languageEnglish
Pages11-14
Number of pages4
DOIs
StatePublished - 13 Oct 2017
Event1st IEEE International Workshop on Arabic Script Analysis and Recognition, ASAR 2017 - Nancy, France
Duration: 3 Apr 20175 Apr 2017

Conference

Conference1st IEEE International Workshop on Arabic Script Analysis and Recognition, ASAR 2017
Country/TerritoryFrance
CityNancy
Period3/04/175/04/17

ASJC Scopus subject areas

  • Computer Vision and Pattern Recognition
  • Linguistics and Language
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'VML-HD: The historical Arabic documents dataset for recognition systems'. Together they form a unique fingerprint.

Cite this