TY - GEN
T1 - Layout analysis for Arabic historical document images using machine learning
AU - Bukhari, Syed Saqib
AU - Breuel, Thomas M.
AU - Asi, Abedelkadir
AU - El-Sana, Jihad
PY - 2012/12/1
Y1 - 2012/12/1
N2 - Page layout analysis is a fundamental step of any document image understanding system. We introduce an approach that segments text appearing in page margins (a.k.a side-notes text) from manuscripts with complex layout format. Simple and discriminative features are extracted in a connected-component level and subsequently robust feature vectors are generated. Multilayer perception classifier is exploited to classify connected components to the relevant class of text. A voting scheme is then applied to refine the resulting segmentation and produce the final classification. In contrast to state-of-the-art segmentation approaches, this method is independent of block segmentation, as well as pixel level analysis. The proposed method has been trained and tested on a dataset that contains a variety of complex side-notes layout formats, achieving a segmentation accuracy of about 95%.
AB - Page layout analysis is a fundamental step of any document image understanding system. We introduce an approach that segments text appearing in page margins (a.k.a side-notes text) from manuscripts with complex layout format. Simple and discriminative features are extracted in a connected-component level and subsequently robust feature vectors are generated. Multilayer perception classifier is exploited to classify connected components to the relevant class of text. A voting scheme is then applied to refine the resulting segmentation and produce the final classification. In contrast to state-of-the-art segmentation approaches, this method is independent of block segmentation, as well as pixel level analysis. The proposed method has been trained and tested on a dataset that contains a variety of complex side-notes layout formats, achieving a segmentation accuracy of about 95%.
UR - http://www.scopus.com/inward/record.url?scp=84874270401&partnerID=8YFLogxK
U2 - 10.1109/ICFHR.2012.227
DO - 10.1109/ICFHR.2012.227
M3 - Conference contribution
AN - SCOPUS:84874270401
SN - 9780769547749
T3 - Proceedings - International Workshop on Frontiers in Handwriting Recognition, IWFHR
SP - 639
EP - 644
BT - Proceedings - 13th International Conference on Frontiers in Handwriting Recognition, ICFHR 2012
T2 - 13th International Conference on Frontiers in Handwriting Recognition, ICFHR 2012
Y2 - 18 September 2012 through 20 September 2012
ER -