Comprehensive synthetic Arabic database for on/off-line script recognition research

Raid M. Saabni, Jihad A. El-Sana

Research output: Contribution to journalArticlepeer-review

24 Scopus citations

Abstract

Developing and maintaining large comprehensive databases for script recognition that include different shapes for each word in the lexicon is expensive and difficult. In this paper, we present an efficient system that automatically generates prototypes for each word in a lexicon using multiple appearances of each letter. Large sets of different shapes are created for each letter in each position. These sets are then used to generate valid shapes for each word-part. The number of valid permutations for each word is large and prohibits practical training and searching for various tasks, such as script recognition and word spotting. We apply dimensionality reduction and clustering techniques to maintain compact representation of these databases, without affecting their ability to represent the wide variety of handwriting styles. In addition, a database for off-line script recognition is generated from the on-line strokes using a standard dilation technique, while making special efforts to resemble pen's path. We also examined and used several layout techniques for producing words from the generated word-parts. Our experimental results show that the proposed system can automatically generate large databases, whose quality is at least as good as the manually generated ones.

Original languageEnglish
Pages (from-to)285-294
Number of pages10
JournalInternational Journal on Document Analysis and Recognition
Volume16
Issue number3
DOIs
StatePublished - 1 Sep 2013

Keywords

  • Arabic
  • Database
  • Kmeans
  • PCA
  • Recognition
  • Script
  • Synthetic

ASJC Scopus subject areas

  • Software
  • Computer Vision and Pattern Recognition
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'Comprehensive synthetic Arabic database for on/off-line script recognition research'. Together they form a unique fingerprint.

Cite this