TY - GEN
T1 - Efficient generation of comprehensive database for online arabic script recognition
AU - Saabni, Raid
AU - El-sana, Jihad
PY - 2009/12/10
Y1 - 2009/12/10
N2 - The difficulties in segmenting cursive words into individual characters have shifted the focus of handwriting recognition research from segmentation-based approaches to segmentation-free (holistic) methods. However, maintaining and training large number of prototypes (models) that represent the words in the dictionary make the training process extremely expensive and difficult in computing resources. In this paper we present an efficient system that automatically generates prototypes for each word in a given dictionary using multiple appearance of each letter shape. Multiple appearance allows for many permutation of shapes for each word and thus complicates searching for the right prototype. To simplify the training, reduce the maintained prototypes, and avoid over fitting, we used dimensionality reduction followed by clustering techniques to reduce the size of these sets without affecting their ability to represent the wide variations of the handwriting styles. A set of generated fonts are created by professional writers imitating all handwriting styles for each character in each position. These Fonts are used to generate all shapes for writing each word-part in a comprehensive dictionary. Principal component analysis and k-means clustering techniques are performed to select the minimal number of shapes representing the wide variations of handwriting styles for a word-part. Experimental results using an online recognition system proves the credibility of this process compared to manually generated databases.
AB - The difficulties in segmenting cursive words into individual characters have shifted the focus of handwriting recognition research from segmentation-based approaches to segmentation-free (holistic) methods. However, maintaining and training large number of prototypes (models) that represent the words in the dictionary make the training process extremely expensive and difficult in computing resources. In this paper we present an efficient system that automatically generates prototypes for each word in a given dictionary using multiple appearance of each letter shape. Multiple appearance allows for many permutation of shapes for each word and thus complicates searching for the right prototype. To simplify the training, reduce the maintained prototypes, and avoid over fitting, we used dimensionality reduction followed by clustering techniques to reduce the size of these sets without affecting their ability to represent the wide variations of the handwriting styles. A set of generated fonts are created by professional writers imitating all handwriting styles for each character in each position. These Fonts are used to generate all shapes for writing each word-part in a comprehensive dictionary. Principal component analysis and k-means clustering techniques are performed to select the minimal number of shapes representing the wide variations of handwriting styles for a word-part. Experimental results using an online recognition system proves the credibility of this process compared to manually generated databases.
UR - http://www.scopus.com/inward/record.url?scp=71249163782&partnerID=8YFLogxK
U2 - 10.1109/ICDAR.2009.258
DO - 10.1109/ICDAR.2009.258
M3 - Conference contribution
AN - SCOPUS:71249163782
SN - 9780769537252
T3 - Proceedings of the International Conference on Document Analysis and Recognition, ICDAR
SP - 1231
EP - 1235
BT - ICDAR2009 - 10th International Conference on Document Analysis and Recognition
T2 - ICDAR2009 - 10th International Conference on Document Analysis and Recognition
Y2 - 26 July 2009 through 29 July 2009
ER -