Language-independent text lines extraction using seam carving

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    49 Scopus citations

    Abstract

    In this paper, we present a novel language-independent algorithm for extracting text-lines from handwritten document images. Our algorithm is based on the seam carving approach for content aware image resizing. We adopted the signed distance transform to generate the energy map, where extreme points indicate the layout of text-lines. Dynamic programming is then used to compute the minimum energy left-to-right paths (seams), which pass along the ''middle'' of the text-lines. Each path intersects a set of components, which determine the extracted text-line and estimate its hight. The estimated hight determines the text-line's region, which guides splitting touching components among consecutive lines. Unassigned components that fall within the region of a text-line are added to the components list of the line. The components between two consecutive lines are processed when the two lines are extracted and assigned to the closest text-line, based on the attributes of extracted lines, the sizes and positions of components. Our experimental results on Arabic, Chinese, and English historical documents show that our approach manage to separate multi-skew text blocks into lines at high success rates.

    Original languageEnglish
    Title of host publicationProceedings - 11th International Conference on Document Analysis and Recognition, ICDAR 2011
    Pages563-568
    Number of pages6
    DOIs
    StatePublished - 2 Dec 2011
    Event11th International Conference on Document Analysis and Recognition, ICDAR 2011 - Beijing, China
    Duration: 18 Sep 201121 Sep 2011

    Publication series

    NameProceedings of the International Conference on Document Analysis and Recognition, ICDAR
    ISSN (Print)1520-5363

    Conference

    Conference11th International Conference on Document Analysis and Recognition, ICDAR 2011
    Country/TerritoryChina
    CityBeijing
    Period18/09/1121/09/11

    Keywords

    • Dynamic programming
    • Handwriting
    • Line Extraction
    • Multilingual
    • Seam Carving
    • Signed Distance Transform

    ASJC Scopus subject areas

    • Computer Vision and Pattern Recognition

    Fingerprint

    Dive into the research topics of 'Language-independent text lines extraction using seam carving'. Together they form a unique fingerprint.

    Cite this