Abstract
We present a new method for estimating the sparse non-negative model (SNM) by using a small amount of held-out data and the multinomial loss that is natural for language modeling; we validate it experimentally against the previous estimation method which uses leave-one-out on training data and a binary loss function and show that it performs equally well. Being able to train on held-out data is very important in practical situations where training data is mismatched from held-out/test data. We find that fairly small amounts of held-out data (on the order of 30-70 thousand words) are sufficient for training the adjustment model, which is the only model component estimated using gradient descent; the bulk of model parameters are relative frequencies counted on training data. A second contribution is a comparison between SNM and the related class of Maximum Entropy language models. While much cheaper computationally, we show that SNM achieves slightly better perplexity results for the same feature set and same speech recognition accuracy on voice search and short message dictation.
| Original language | English |
|---|---|
| Pages (from-to) | 2725-2729 |
| Number of pages | 5 |
| Journal | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH |
| Volume | 2017-August |
| DOIs | |
| State | Published - 1 Jan 2017 |
| Externally published | Yes |
| Event | 18th Annual Conference of the International Speech Communication Association, INTERSPEECH 2017 - Stockholm, Sweden Duration: 20 Aug 2017 → 24 Aug 2017 |
Keywords
- Language modeling
- Machine learning
- Maximum entropy
- Speech recognition
ASJC Scopus subject areas
- Language and Linguistics
- Human-Computer Interaction
- Signal Processing
- Software
- Modeling and Simulation
Fingerprint
Dive into the research topics of 'Sparse non-negative matrix language modeling: Maximum entropy flexibility on the cheap'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver