Using scale-space anisotropic smoothing for text line extraction in historical documents

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

18 Scopus citations

Abstract

Text line extraction is vital pre-requisite for various document processing tasks. This paper presents a novel approach for text line extraction which is based on Gaussian scale space and dedicated binarization that utilize the inherent structure of smoothed text document images. It enhances the text lines in the image using multiscale anisotropic second derivative of Gaussian filter bank at the average height of the text line. It then applies a binarization, which is based on component-tree and is tailored towards line extraction. The final stage of the algorithm is based on an energy minimization framework for removing spurious text line and assigning connected components to lines. We have tested our approach on various datasets written in different languages at range of image quality and received high detection rates, which outperform state-of-the-art algorithms. Our MATLAB code is publicly available. (http://www.cs.bgu.ac.il/~rafico/LineExtraction.zip).

Original languageEnglish
Title of host publicationImage Analysis and Recognition - 11th International Conference, ICIAR 2014, Proceedings
EditorsMohamed Kamel, Aurélio Campilho
PublisherSpringer Verlag
Pages349-358
Number of pages10
ISBN (Electronic)9783319117577
DOIs
StatePublished - 1 Jan 2014
Event11th International Conference on Image Analysis and Recognition, ICIAR 2014 - Vilamoura, Portugal
Duration: 22 Oct 201424 Oct 2014

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume8814
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference11th International Conference on Image Analysis and Recognition, ICIAR 2014
Country/TerritoryPortugal
CityVilamoura
Period22/10/1424/10/14

Keywords

  • Historical document processing
  • Text lines extraction

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Using scale-space anisotropic smoothing for text line extraction in historical documents'. Together they form a unique fingerprint.

Cite this