TY - GEN
T1 - Robust and efficient text‐line extraction by local minimal sub-seams
AU - Saabni, Raid
N1 - Publisher Copyright:
© 2018 Association for Computing Machinery.
PY - 2018/9/21
Y1 - 2018/9/21
N2 - Robust text line extraction from document images is vital prerequisite for any successful text recognition or analyzing process. Generally, most of the proposed algorithms for this task assumed kind of binarization pre-processing step in order to insure well performance. In this paper, we present a novel robust and efficient algorithm to extract text-lines directly from gray level document images. The algorithm tracks minimal energy sub-seams accumulated to perform a full local minimal/maximal separating and medial seams defining the text lines. To improve the ability of extracting such seams, we enhance the image using double-sided adaptive local density projection profile followed by multi-scale anisotropic second derivative of Gaussian filter bank. Following the observation that center of lines are more reliable to follow, we first extract seams that follow the center of lines to constraint the algorithm for evolving the separating seams. The algorithm is parameter-free and we evaluate the free parameters directly by analyzing the image properties and the pixels distribution. We have tested our approach on multi-lingual various datasets written at range of image quality and received very encouraging results, which outperform state-of-the-art algorithms.
AB - Robust text line extraction from document images is vital prerequisite for any successful text recognition or analyzing process. Generally, most of the proposed algorithms for this task assumed kind of binarization pre-processing step in order to insure well performance. In this paper, we present a novel robust and efficient algorithm to extract text-lines directly from gray level document images. The algorithm tracks minimal energy sub-seams accumulated to perform a full local minimal/maximal separating and medial seams defining the text lines. To improve the ability of extracting such seams, we enhance the image using double-sided adaptive local density projection profile followed by multi-scale anisotropic second derivative of Gaussian filter bank. Following the observation that center of lines are more reliable to follow, we first extract seams that follow the center of lines to constraint the algorithm for evolving the separating seams. The algorithm is parameter-free and we evaluate the free parameters directly by analyzing the image properties and the pixels distribution. We have tested our approach on multi-lingual various datasets written at range of image quality and received very encouraging results, which outperform state-of-the-art algorithms.
KW - Document Image Analyzing
KW - Line Extraction
KW - Local projection profile
KW - Minimal Seams
KW - Multi-scale anisotropic Gaussian filter bank
UR - http://www.scopus.com/inward/record.url?scp=85059964522&partnerID=8YFLogxK
U2 - 10.1145/3284557.3284705
DO - 10.1145/3284557.3284705
M3 - Conference contribution
AN - SCOPUS:85059964522
T3 - ACM International Conference Proceeding Series
BT - Proceedings of the 2nd International Symposium on Computer Science and Intelligent Control, ISCSIC 2018
PB - Association for Computing Machinery
T2 - 2nd International Symposium on Computer Science and Intelligent Control, ISCSIC 2018
Y2 - 21 September 2018 through 23 September 2018
ER -