TY - JOUR
T1 - Evolution maps and applications
AU - Biller, Ofer
AU - Rabaev, Irina
AU - Kedem, Klara
AU - Dinstein, Its'hak
AU - El-Sana, Jihad J.
N1 - Funding Information:
This research was supported in part by the DFG-Trilateral Grant no. 8716. We thank Prof. Uri Ehrlich and Uri Safrai from the Goldstein-Goren Department of Jewish Thought, Ben-Gurion University of the Negev, for their assistance in generating the ground truth.
Funding Information:
This project was funded by the German Research Foundation under contract FI 1494/3-2. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Publisher Copyright:
© 2016 Biller et al.
PY - 2016/1/1
Y1 - 2016/1/1
N2 - Common tasks in document analysis, such as binarization, line extraction etc., are still considered difficult for highly degraded text documents. Having reliable fundamental information regarding the characters of the document, such as the distribution of character dimensions and stroke width, can significantly improve the performance of these tasks.We introduce a novel perspective of the image data which maps the evolution of connected components along the change in gray scale threshold. The maps reveal significant information about the sets of elements in the document, such as characters, noise, stains, and words. The information is further employed to improve state of the art binarization algorithm, and achieve automatically character size estimation, line extraction, stroke width estimation, and feature distribution analysis, all of which are hard tasks for highly degraded documents.
AB - Common tasks in document analysis, such as binarization, line extraction etc., are still considered difficult for highly degraded text documents. Having reliable fundamental information regarding the characters of the document, such as the distribution of character dimensions and stroke width, can significantly improve the performance of these tasks.We introduce a novel perspective of the image data which maps the evolution of connected components along the change in gray scale threshold. The maps reveal significant information about the sets of elements in the document, such as characters, noise, stains, and words. The information is further employed to improve state of the art binarization algorithm, and achieve automatically character size estimation, line extraction, stroke width estimation, and feature distribution analysis, all of which are hard tasks for highly degraded documents.
KW - Connected component analysis
KW - Degraded documents
KW - Historical documents
KW - Text document analysis
UR - http://www.scopus.com/inward/record.url?scp=85029886912&partnerID=8YFLogxK
U2 - 10.7717/peerj-cs.39
DO - 10.7717/peerj-cs.39
M3 - Article
AN - SCOPUS:85029886912
SN - 2376-5992
VL - 2016
JO - PeerJ Computer Science
JF - PeerJ Computer Science
IS - 1
M1 - e39
ER -