TY - GEN
T1 - Fast key-word searching via embedding and active-DTW
AU - Saabni, Raid
AU - Bronstein, Alex
PY - 2011/12/2
Y1 - 2011/12/2
N2 - In this paper we present a novel approach for fast search of handwritten Arabic word-parts within large lexicons. The algorithm runs through three steps to achieve the required results. First it warps multiple appearances of each word-part in the lexicon for embedding into the same euclidean space. The embedding is done based on the warping path produced by the Dynamic Time Warping (DTW) process while calculating the similarity distance. In the next step, all samples of different word-parts are resampled uniformly to the same size. The kd-tree structure is used to store all shapes representing word-parts in the lexicon. Fast approximation of k-nearest neighbors generates a short list of candidates to be presented to the next step. In the third step, the Active-DTW[15] algorithm is used to examine each sample in the short list and give final accurate results. We demonstrate our method on a database of 23,500 images of word-parts extracted from the IFN/ENIT and 22,000 images collected from 93 writers. Our method achieves a speedup of 5 orders of magnitude over the exact method, at the cost of only a 3.8% reduction in accuracy.
AB - In this paper we present a novel approach for fast search of handwritten Arabic word-parts within large lexicons. The algorithm runs through three steps to achieve the required results. First it warps multiple appearances of each word-part in the lexicon for embedding into the same euclidean space. The embedding is done based on the warping path produced by the Dynamic Time Warping (DTW) process while calculating the similarity distance. In the next step, all samples of different word-parts are resampled uniformly to the same size. The kd-tree structure is used to store all shapes representing word-parts in the lexicon. Fast approximation of k-nearest neighbors generates a short list of candidates to be presented to the next step. In the third step, the Active-DTW[15] algorithm is used to examine each sample in the short list and give final accurate results. We demonstrate our method on a database of 23,500 images of word-parts extracted from the IFN/ENIT and 22,000 images collected from 93 writers. Our method achieves a speedup of 5 orders of magnitude over the exact method, at the cost of only a 3.8% reduction in accuracy.
KW - Dynamic Time Warping
KW - Embedding
KW - Handwriting Recognition
KW - Nearest Neighbor
KW - Word Searching
UR - https://www.scopus.com/pages/publications/82355190334
U2 - 10.1109/ICDAR.2011.23
DO - 10.1109/ICDAR.2011.23
M3 - Conference contribution
AN - SCOPUS:82355190334
SN - 9780769545202
T3 - Proceedings of the International Conference on Document Analysis and Recognition, ICDAR
SP - 68
EP - 72
BT - Proceedings - 11th International Conference on Document Analysis and Recognition, ICDAR 2011
T2 - 11th International Conference on Document Analysis and Recognition, ICDAR 2011
Y2 - 18 September 2011 through 21 September 2011
ER -