Abstract
The assessment of performance for any number of speech processing tasks calls for the use of a suitably large, representative dataset. Dataset design is crucial so as to ensure that any significant variation unrelated to the task in hand is adequately normalised or marginalised. Most datasets are partitioned into training, development and evaluation subsets. Depending on the task, the nature of these three subsets should normally be close to identical. With speech signals being subject to a multitude of different influences, e.g. speaker gender and age, language, dialect, utterance length, etc., the design and validation of speech datasets can become especially challenging. Even if many sources of variation unrelated to the task in hand can easily be marginalised, other sources of more subtle variation can easily be overlooked. Imbalances between training, development and evaluation partitions, can bring into question findings derived from their use. Stringent dataset validation procedures are required. This paper reports a particularly straightforward approach to dataset validation that is based upon waveform entropy.
| Original language | English |
|---|---|
| Pages (from-to) | 2773-2777 |
| Number of pages | 5 |
| Journal | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH |
| Volume | 2018-September |
| DOIs | |
| State | Published - 1 Jan 2018 |
| Externally published | Yes |
| Event | 19th Annual Conference of the International Speech Communication, INTERSPEECH 2018 - Hyderabad, India Duration: 2 Sep 2018 → 6 Sep 2018 |
UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 5 Gender Equality
Keywords
- Database assessment
- Entropy
- Waveform
ASJC Scopus subject areas
- Language and Linguistics
- Human-Computer Interaction
- Signal Processing
- Software
- Modeling and Simulation
Fingerprint
Dive into the research topics of 'Speech database and protocol validation using waveform entropy'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver