Abstract
One inherent deficiency of most diarization systems is their inability to handle co-channel or overlapped speech. Most of the suggested algorithms perform under singular conditions, require high computational complexity in both time and frequency domains. In this study, frame based entropy analysis of the audio data in the time domain serves as a single feature for an overlapped speech detection algorithm. Identification of overlapped speech segments is performed using Gaussian Mixture Modeling (GMM) along with well known classification algorithms applied on two speaker conversations. By employing this methodology, the proposed method eliminates the need for setting a hard threshold for each conversation or database. LDC CALLHOME American English corpus is used for evaluation of the suggested algorithm. The proposed method successfully detects 63.2% of the frames labeled as overlapped speech by the manual segmentation, while keeping a 5.4% false-alarm rate.
Original language | English |
---|---|
Pages (from-to) | 916-919 |
Number of pages | 4 |
Journal | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH |
State | Published - 27 Nov 2009 |
Event | 10th Annual Conference of the International Speech Communication Association, INTERSPEECH 2009 - Brighton, United Kingdom Duration: 6 Sep 2009 → 10 Sep 2009 |
Keywords
- Co-channel
- Diarization
- Overlapped speech
ASJC Scopus subject areas
- Human-Computer Interaction
- Signal Processing
- Software
- Sensory Systems