Evolution maps and applications

Ofer Biller, Irina Rabaev, Klara Kedem, Its'hak Dinstein, Jihad J. El-Sana

Research output: Contribution to journalArticlepeer-review

3 Scopus citations

Abstract

Common tasks in document analysis, such as binarization, line extraction etc., are still considered difficult for highly degraded text documents. Having reliable fundamental information regarding the characters of the document, such as the distribution of character dimensions and stroke width, can significantly improve the performance of these tasks.We introduce a novel perspective of the image data which maps the evolution of connected components along the change in gray scale threshold. The maps reveal significant information about the sets of elements in the document, such as characters, noise, stains, and words. The information is further employed to improve state of the art binarization algorithm, and achieve automatically character size estimation, line extraction, stroke width estimation, and feature distribution analysis, all of which are hard tasks for highly degraded documents.

Original languageEnglish
Article numbere39
JournalPeerJ Computer Science
Volume2016
Issue number1
DOIs
StatePublished - 1 Jan 2016

Keywords

  • Connected component analysis
  • Degraded documents
  • Historical documents
  • Text document analysis

ASJC Scopus subject areas

  • General Computer Science

Fingerprint

Dive into the research topics of 'Evolution maps and applications'. Together they form a unique fingerprint.

Cite this