Text line detection in corrupted and damaged historical manuscripts

Irina Rabaev, Ofer Biller, Jihad El-Sana, Klara Kedem, Itshak Dinstein

Research output: Contribution to journalConference articlepeer-review

12 Scopus citations

Abstract

Most of the algorithms proposed for text line detection are designed to process binary images as input. For severely degraded documents, binarization often introduces significant noise and other artifacts. In this work we present a novel method designed to detect text lines directly in gray scale images. The method consists of two stages. Potential characters are detected in the first stage. This is done by analyzing the evolution maps of connected components obtained by a sliding threshold. The detected potential characters are grouped into text lines in the second stage using sweep-line approach. The suggested method is especially powerful when applied to torn and damaged documents that other algorithms are not able to deal with.

Original languageEnglish
Article number6628731
Pages (from-to)812-816
Number of pages5
JournalProceedings of the International Conference on Document Analysis and Recognition, ICDAR
DOIs
StatePublished - 11 Dec 2013
Event12th International Conference on Document Analysis and Recognition, ICDAR 2013 - Washington, DC, United States
Duration: 25 Aug 201328 Aug 2013

ASJC Scopus subject areas

  • Computer Vision and Pattern Recognition

Fingerprint

Dive into the research topics of 'Text line detection in corrupted and damaged historical manuscripts'. Together they form a unique fingerprint.

Cite this