TY - JOUR
T1 - Speeding Up HMM Decoding and Training by Exploiting Sequence Repetitions
AU - Lifshits, Yury
AU - Mozes, Shay
AU - Weimann, Oren
AU - Ziv-Ukelson, Michal
N1 - Funding Information:
Y. Lifshits’ research was supported by the Center for the Mathematics of Information and the Lee Center for Advanced Networking.
PY - 2009/7/1
Y1 - 2009/7/1
N2 - We present a method to speed up the dynamic program algorithms used for solving the HMM decoding and training problems for discrete time-independent HMMs. We discuss the application of our method to Viterbi's decoding and training algorithms (IEEE Trans. Inform. Theory IT-13:260-269, 1967), as well as to the forward-backward and Baum-Welch (Inequalities 3:1-8, 1972) algorithms. Our approach is based on identifying repeated substrings in the observed input sequence. Initially, we show how to exploit repetitions of all sufficiently small substrings (this is similar to the Four Russians method). Then, we describe four algorithms based alternatively on run length encoding (RLE), Lempel-Ziv (LZ78) parsing, grammar-based compression (SLP), and byte pair encoding (BPE). Compared to Viterbi's algorithm, we achieve speedups of Θ(log∈n) using the Four Russians method, log r using RLE, k using LZ78, k using SLP, and Ω(r) using BPE, where k is the number of hidden states, n is the length of the observed sequence and r is its compression ratio (under each compression scheme). Our experimental results demonstrate that our new algorithms are indeed faster in practice. We also discuss a parallel implementation of our algorithms.
AB - We present a method to speed up the dynamic program algorithms used for solving the HMM decoding and training problems for discrete time-independent HMMs. We discuss the application of our method to Viterbi's decoding and training algorithms (IEEE Trans. Inform. Theory IT-13:260-269, 1967), as well as to the forward-backward and Baum-Welch (Inequalities 3:1-8, 1972) algorithms. Our approach is based on identifying repeated substrings in the observed input sequence. Initially, we show how to exploit repetitions of all sufficiently small substrings (this is similar to the Four Russians method). Then, we describe four algorithms based alternatively on run length encoding (RLE), Lempel-Ziv (LZ78) parsing, grammar-based compression (SLP), and byte pair encoding (BPE). Compared to Viterbi's algorithm, we achieve speedups of Θ(log∈n) using the Four Russians method, log r using RLE, k using LZ78, k using SLP, and Ω(r) using BPE, where k is the number of hidden states, n is the length of the observed sequence and r is its compression ratio (under each compression scheme). Our experimental results demonstrate that our new algorithms are indeed faster in practice. We also discuss a parallel implementation of our algorithms.
KW - Compression
KW - Dynamic programming
KW - HMM
KW - Viterbi
UR - http://www.scopus.com/inward/record.url?scp=67349186481&partnerID=8YFLogxK
U2 - 10.1007/s00453-007-9128-0
DO - 10.1007/s00453-007-9128-0
M3 - Article
AN - SCOPUS:67349186481
SN - 0178-4617
VL - 54
SP - 379
EP - 399
JO - Algorithmica
JF - Algorithmica
IS - 3
ER -