TY - GEN
T1 - Document image binarization using "multi-scale" predefined filters
AU - Saabni, Raid M.
N1 - Publisher Copyright:
© 2018 SPIE.
PY - 2018/1/1
Y1 - 2018/1/1
N2 - Reading text or searching for key words within a historical document is a very challenging task. one of the first steps of the complete task is binarization, where we separate foreground such as text, figures and drawings from the background. Successful results of this important step in many cases can determine next steps to success or failure, therefore it is very vital to the success of the complete task of reading and analyzing the content of a document image. Generally, historical documents images are of poor quality due to their storage condition and degradation over time, which mostly cause to varying contrasts, stains, dirt and seeping ink from reverse side. In this paper, we use banks of anisotropic predefined filters in different scales and orientations to develop a binarization method for degraded documents and manuscripts. Using the fact, that handwritten strokes may follow different scales and orientations, we use predefined sets of filter banks having various scales, weights, and orientations to seek a compact set of filters and weights in order to generate different layers of foregrounds and background. Results of convolving these filters on the gray level image locally, weighted and accumulated to enhance the original image. Based on the different layers, seeds of components in the gray level image and a learning process, we present an improved binarization algorithm to separate the background from layers of foreground. Different layers of foreground which may be caused by seeping ink, degradation or other factors are also separated from the real foreground in a second phase. Promising experimental results were obtained on the DIBCO2011, DIBCO2013 and H-DIBCO2016 data sets and a collection of images taken from real historical documents.
AB - Reading text or searching for key words within a historical document is a very challenging task. one of the first steps of the complete task is binarization, where we separate foreground such as text, figures and drawings from the background. Successful results of this important step in many cases can determine next steps to success or failure, therefore it is very vital to the success of the complete task of reading and analyzing the content of a document image. Generally, historical documents images are of poor quality due to their storage condition and degradation over time, which mostly cause to varying contrasts, stains, dirt and seeping ink from reverse side. In this paper, we use banks of anisotropic predefined filters in different scales and orientations to develop a binarization method for degraded documents and manuscripts. Using the fact, that handwritten strokes may follow different scales and orientations, we use predefined sets of filter banks having various scales, weights, and orientations to seek a compact set of filters and weights in order to generate different layers of foregrounds and background. Results of convolving these filters on the gray level image locally, weighted and accumulated to enhance the original image. Based on the different layers, seeds of components in the gray level image and a learning process, we present an improved binarization algorithm to separate the background from layers of foreground. Different layers of foreground which may be caused by seeping ink, degradation or other factors are also separated from the real foreground in a second phase. Promising experimental results were obtained on the DIBCO2011, DIBCO2013 and H-DIBCO2016 data sets and a collection of images taken from real historical documents.
KW - "Multi-scale" filters
KW - Ada-Boosting
KW - Binarization
KW - Document Image Analysis
UR - https://www.scopus.com/pages/publications/85046495366
U2 - 10.1117/12.2303604
DO - 10.1117/12.2303604
M3 - Conference contribution
AN - SCOPUS:85046495366
T3 - Proceedings of SPIE - The International Society for Optical Engineering
BT - Ninth International Conference on Graphic and Image Processing, ICGIP 2017
A2 - Yu, Hui
A2 - Dong, Junyu
PB - SPIE
T2 - 9th International Conference on Graphic and Image Processing, ICGIP 2017
Y2 - 14 October 2017 through 16 October 2017
ER -