Unsupervised Deep Learning for Handwritten Page Segmentation

Ahmad Droby, Berat Kurar Barakat, Borak Madi, Reem Alaasam, Jihad El-Sana

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

8 Scopus citations

Abstract

Segmenting handwritten document images into regions with homogeneous patterns is an important pre-processing step for many document images analysis tasks. Hand-labeling data to train a deep learning model for layout analysis requires significant human effort. In this paper, we present an unsupervised deep learning method for page segmentation, which revokes the need for annotated images. A siamese neural network is trained to differentiate between patches using their measurable properties such as number of foreground pixels, and average component height and width. The network is trained that spatially nearby patches are similar. The network's learned features are used for page segmentation, where patches are classified as main and side text based on the extracted features. We tested the method on a dataset of handwritten document images with quite complex layouts. Our experiments show that the proposed unsupervised method is as effective as typical supervised methods.

Original languageEnglish
Title of host publicationProceedings - 2020 17th International Conference on Frontiers in Handwriting Recognition, ICFHR 2020
PublisherInstitute of Electrical and Electronics Engineers
Pages240-245
Number of pages6
ISBN (Electronic)9781728199665
DOIs
StatePublished - 1 Sep 2020
Event17th International Conference on Frontiers in Handwriting Recognition, ICFHR 2020 - Dortmund, Germany
Duration: 7 Sep 202010 Sep 2020

Publication series

NameProceedings of International Conference on Frontiers in Handwriting Recognition, ICFHR
Volume2020-September
ISSN (Print)2167-6445
ISSN (Electronic)2167-6453

Conference

Conference17th International Conference on Frontiers in Handwriting Recognition, ICFHR 2020
Country/TerritoryGermany
CityDortmund
Period7/09/2010/09/20

Keywords

  • Siamese network
  • deep-learning
  • documents
  • hand-written
  • historical
  • layout analysis
  • page segmentation
  • segmentation
  • unsupervised

ASJC Scopus subject areas

  • Computer Science Applications
  • Computer Vision and Pattern Recognition

Fingerprint

Dive into the research topics of 'Unsupervised Deep Learning for Handwritten Page Segmentation'. Together they form a unique fingerprint.

Cite this