Aligning transcript of historical documents using dynamic programming

Irina Rabaev, Rafi Cohen, Jihad El-Sana, Klara Kedem

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

6 Scopus citations

Abstract

We present a simple and accurate approach for aligning historical documents with their corresponding transcription. First, a representative of each letter in the historical document is cropped. Then, the transcription is transformed to synthetic word images by representing the letters in the transcription by the cropped letters. These synthetic word images are aligned to groups of connected components in the original text, along each line, using dynamic programming. For measuring image similarities we experimented with a variety of feature extraction and matching methods. The presented alignment algorithm was tested on two historical datasets and provided excellent results.

Original languageEnglish
Title of host publicationProceedings of SPIE-IS and T Electronic Imaging - Document Recognition and Retrieval XXII
EditorsBart Lamiroy, Eric K. Ringger
PublisherSPIE
ISBN (Electronic)9781628414929
DOIs
StatePublished - 1 Jan 2015
Event22nd Document Recognition and Retrieval Conference, DRR 2015 - San Francisco, United States
Duration: 11 Feb 201512 Feb 2015

Publication series

NameProceedings of SPIE - The International Society for Optical Engineering
Volume9402
ISSN (Print)0277-786X
ISSN (Electronic)1996-756X

Conference

Conference22nd Document Recognition and Retrieval Conference, DRR 2015
Country/TerritoryUnited States
CitySan Francisco
Period11/02/1512/02/15

Keywords

  • GSC features
  • HOG features
  • LBP features
  • alignment
  • dynamic programming
  • historical documents
  • profile-based features

ASJC Scopus subject areas

  • Electronic, Optical and Magnetic Materials
  • Condensed Matter Physics
  • Computer Science Applications
  • Applied Mathematics
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Aligning transcript of historical documents using dynamic programming'. Together they form a unique fingerprint.

Cite this