Segmentation-free keyword retrieval in historical document images

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

8 Scopus citations

Abstract

We present a segmentation-free method to retrieve keywords from degraded historical documents. The proposed method works directly on the gray scale representation and does not require any pre-processing to enhance document images. The document images are subdivided into overlapping patches of varying sizes, where each patch is described by the bag-of-visual-words descriptor. The obtained patch descriptors are hashed into several hash tables using kernelized locality-sensitive hashing scheme for efficient retrieval. In such a scheme the search for a keyword is reduced to a small fraction of the patches from the appropriate entries in the hash tables. Since we need to capture the handwriting variations and the availability of historical documents is limited, we synthesize a small number of samples from the given query to improve the results of the retrieval process. We have tested our approach on historical document images in Hebrew from the Cairo Genizah collection, and obtained impressive results.

Original languageEnglish
Title of host publicationImage Analysis and Recognition - 11th International Conference, ICIAR 2014, Proceedings
EditorsMohamed Kamel, Aurélio Campilho
PublisherSpringer Verlag
Pages369-378
Number of pages10
ISBN (Electronic)9783319117577
DOIs
StatePublished - 1 Jan 2014
Event11th International Conference on Image Analysis and Recognition, ICIAR 2014 - Vilamoura, Portugal
Duration: 22 Oct 201424 Oct 2014

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume8814
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference11th International Conference on Image Analysis and Recognition, ICIAR 2014
Country/TerritoryPortugal
CityVilamoura
Period22/10/1424/10/14

Keywords

  • Bag-of-visual-words
  • Historical document processing
  • Kernelized locality-sensitive hashing
  • retrieval Segmentation-free

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Segmentation-free keyword retrieval in historical document images'. Together they form a unique fingerprint.

Cite this