Entropy based overlapped speech detection as a pre-processing stage for speaker diarization

Oshry Ben-Harush, Itshak Lapidot, Hugo Guterman

Research output: Contribution to journalConference articlepeer-review

11 Scopus citations

Abstract

One inherent deficiency of most diarization systems is their inability to handle co-channel or overlapped speech. Most of the suggested algorithms perform under singular conditions, require high computational complexity in both time and frequency domains. In this study, frame based entropy analysis of the audio data in the time domain serves as a single feature for an overlapped speech detection algorithm. Identification of overlapped speech segments is performed using Gaussian Mixture Modeling (GMM) along with well known classification algorithms applied on two speaker conversations. By employing this methodology, the proposed method eliminates the need for setting a hard threshold for each conversation or database. LDC CALLHOME American English corpus is used for evaluation of the suggested algorithm. The proposed method successfully detects 63.2% of the frames labeled as overlapped speech by the manual segmentation, while keeping a 5.4% false-alarm rate.

Original languageEnglish
Pages (from-to)916-919
Number of pages4
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
StatePublished - 27 Nov 2009
Event10th Annual Conference of the International Speech Communication Association, INTERSPEECH 2009 - Brighton, United Kingdom
Duration: 6 Sep 200910 Sep 2009

Keywords

  • Co-channel
  • Diarization
  • Overlapped speech

ASJC Scopus subject areas

  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Sensory Systems

Fingerprint

Dive into the research topics of 'Entropy based overlapped speech detection as a pre-processing stage for speaker diarization'. Together they form a unique fingerprint.

Cite this