Word spotting for handwritten documents using Chamfer Distance and Dynamic Time Warping

Raid M. Saabni, Jihad A. El-Sana

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

A large amount of handwritten historical documents are located in libraries around the world. The desire to access, search, and explore these documents paves the way for a new age of knowledge sharing and promotes collaboration and understanding between human societies. Currently, the indexes for these documents are generated manually, which is very tedious and time consuming. Results produced by state of the art techniques, for converting complete images of handwritten documents into textual representations, are not yet sufficient. Therefore, word-spotting methods have been developed to archive and index images of handwritten documents in order to enable efficient searching within documents. In this paper, we present a new matching algorithm to be used in word-spotting tasks for historical Arabic documents. We present a novel algorithm based on the Chamfer Distance to compute the similrity between shapes of word-parts. Matching results are used to cluster images of Arabic word-parts into different classes using the Nearest Neighbor rule. To compute the distance between two word-part images, the algorithm subdivides each image into equal-sized slices (windows). A modified version of the Chamfer Distance, incorporating geometric gradient features and distance transform data, is used as a similarity distance between the different slices. Finally, the Dynamic Time Warping (DTW) algorithm is used to measure the distance between two images of word-parts. By using the DTW we enabled our system to cluster similar word-parts, even though they are transformed non-linearly due to the nature of handwriting. We tested our implementation of the presented methods using various documents in different writing styles, taken from Juma'a Al Majid Center - Dubai, and obtained encouraging results.

Original languageEnglish
Title of host publicationProceedings of SPIE-IS and T Electronic Imaging - Document Recognition and Retrieval XVIII
DOIs
StatePublished - 12 May 2011
Externally publishedYes
EventDocument Recognition and Retrieval XVIII - San Francisco, CA, United States
Duration: 26 Jan 201127 Jan 2011

Publication series

NameProceedings of SPIE - The International Society for Optical Engineering
Volume7874
ISSN (Print)0277-786X

Conference

ConferenceDocument Recognition and Retrieval XVIII
Country/TerritoryUnited States
CitySan Francisco, CA
Period26/01/1127/01/11

Keywords

  • Chamfer Distance
  • Dynamic Time Warping
  • Handwriting Recognition
  • Word Spotting

ASJC Scopus subject areas

  • Electronic, Optical and Magnetic Materials
  • Condensed Matter Physics
  • Computer Science Applications
  • Applied Mathematics
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Word spotting for handwritten documents using Chamfer Distance and Dynamic Time Warping'. Together they form a unique fingerprint.

Cite this