Abstract
In this paper, we introduce a new approach to dialect recognition that relies on context-dependent (CD) phonetic differences between dialects as well as phonotactics. Given a speech utterance, we obtain the phone sequence using a CD-phone recognizer. We then identify the most likely dialect of these CD-phones using SVM classifiers. Augmenting these phones with the output of these classifiers, we extract augmented phonotactic features which are subsequently given to a logistic regression classifier to obtain a dialect detection score. We test our approach on the task of detecting four Arabic dialects from 30s utterances. We compare our performance to two baselines, PRLM and GMM-UBM, as well as to our own improved version of GMM-UBM which employs fMLLR adaptation. Our approach performs significantly better than all three baselines at 5% absolute Equal Error Rate (EER). The overall EER of our system is 6%.
| Original language | English |
|---|---|
| Pages | 263-270 |
| Number of pages | 8 |
| State | Published - 1 Jan 2010 |
| Externally published | Yes |
| Event | Speaker and Language Recognition Workshop, Odyssey 2010 - Brno, Czech Republic Duration: 28 Jun 2010 → 1 Jul 2010 |
Conference
| Conference | Speaker and Language Recognition Workshop, Odyssey 2010 |
|---|---|
| Country/Territory | Czech Republic |
| City | Brno |
| Period | 28/06/10 → 1/07/10 |
ASJC Scopus subject areas
- Signal Processing
- Software
- Human-Computer Interaction