TY - GEN
T1 - Modeling Non-Linguistic Contextual Signals in LSTM Language Models Via Domain Adaptation
AU - Ma, Min
AU - Kumar, Shankar
AU - Biadsy, Fadi
AU - Nirschl, Michael
AU - Vykruta, Tomas
AU - Moreno, Pedro
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/9/10
Y1 - 2018/9/10
N2 - Language Models (LMs) for Automatic Speech Recognition (ASR) can benefit from utilizing non-linguistic contextual signals in modeling. Examples of these signals include the geographical location of the user speaking to the system and/or the identity of the application (app) being spoken to. In practice, the vast majority of input speech queries typically lack annotations of such signals, which poses a challenge to directly train domain-specific LMs. To obtain robust domain LMs, generally an LM which has been pre-trained on general data will be adapted to specific domains. We propose four domain adaptation schemes to improve the domain performance of Long Short-Term Memory (LSTM) LMs, by incorporating app based contextual signals of voice search queries. We show that most of our adaptation strategies are effective, reducing word perplexity up to 21 % relative to a fine-tuned baseline on a held-out domain-specific development set. Initial experiments using a state-of-the-art Italian ASR system show a 3 % relative reduction in WER on top of an unadapted 5-gram LM. In addition, human evaluations show significant improvements on sub-domains from using app signals.
AB - Language Models (LMs) for Automatic Speech Recognition (ASR) can benefit from utilizing non-linguistic contextual signals in modeling. Examples of these signals include the geographical location of the user speaking to the system and/or the identity of the application (app) being spoken to. In practice, the vast majority of input speech queries typically lack annotations of such signals, which poses a challenge to directly train domain-specific LMs. To obtain robust domain LMs, generally an LM which has been pre-trained on general data will be adapted to specific domains. We propose four domain adaptation schemes to improve the domain performance of Long Short-Term Memory (LSTM) LMs, by incorporating app based contextual signals of voice search queries. We show that most of our adaptation strategies are effective, reducing word perplexity up to 21 % relative to a fine-tuned baseline on a held-out domain-specific development set. Initial experiments using a state-of-the-art Italian ASR system show a 3 % relative reduction in WER on top of an unadapted 5-gram LM. In addition, human evaluations show significant improvements on sub-domains from using app signals.
KW - Domain adaptation
KW - Language model adaptation
KW - Neural network based language models
KW - Speech recognition
UR - https://www.scopus.com/pages/publications/85054224666
U2 - 10.1109/ICASSP.2018.8461979
DO - 10.1109/ICASSP.2018.8461979
M3 - Conference contribution
AN - SCOPUS:85054224666
SN - 9781538646588
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 6094
EP - 6098
BT - 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings
PB - Institute of Electrical and Electronics Engineers
T2 - 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018
Y2 - 15 April 2018 through 20 April 2018
ER -