Robust text and drawing segmentation algorithm for historical documents

Rafi Cohen, Abedelkadir Asi, Klara Kedem, Jihad El-Sana, Itshak Dinstein

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

40 Scopus citations

Abstract

We present a method to segment historical document images into regions of different content. First, we segment text elements from non-text elements using a binarized version of the document. Then, we refine the segmentation of the non-text regions into drawings, background and noise. At this stage, spatial and color features are exploited to guarantee coherent regions in the final segmentation. Experiments show that the suggested approach achieves better segmentation quality with respect to other methods. We examine the segmentation quality on 252 pages of a historical manuscript, for which the suggested method achieves about 92% and 90% segmentation accuracy of drawings and text elements, respectively.

Original languageEnglish
Title of host publicationHIP 2013 - Proceedings of the 2013 Workshop on Historical Document Imaging and Processing
Pages110-117
Number of pages8
DOIs
StatePublished - 23 Dec 2013
Event2nd International Workshop on Historical Document Imaging and Processing, HIP 2013 - Washington, DC, United States
Duration: 24 Aug 201324 Aug 2013

Publication series

NameACM International Conference Proceeding Series

Conference

Conference2nd International Workshop on Historical Document Imaging and Processing, HIP 2013
Country/TerritoryUnited States
CityWashington, DC
Period24/08/1324/08/13

Keywords

  • CRF
  • Historical documents
  • Layout
  • Segmentation
  • Superpixel

ASJC Scopus subject areas

  • Software
  • Human-Computer Interaction
  • Computer Vision and Pattern Recognition
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'Robust text and drawing segmentation algorithm for historical documents'. Together they form a unique fingerprint.

Cite this