Frame level entropy based overlapped speech detection as a pre-processing stage for speaker diarization

Oshry Ben-Harush, Hugo Guterman, Itshak Lapidot

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

8 Scopus citations

Abstract

Speaker diarization systems attempt to assign temporal speech segments in a conversation to the appropriate speaker, and non-speech segments to non-speech. Speaker diarization systems basically provide an answer to the question "Who spoke when ?". One inherent deficiency of most current systems is their inability to handle co-channel or overlapped speech. During the past few years, several studies have attempted dealing with the problem of overlapped or co-channel speech detection and separation, however, most of the algorithms suggested perform under unique conditions, require high computational complexity and require both time and frequency domain analysis of the audio data. In this study, frame based entropy analysis of the audio data in the time domain serves as a single feature for an overlapped speech detection algorithm. Identification of overlapped speech segments is performed using Gaussian Mixture Modeling (GMM) along with well known classification algorithms applied on two speaker conversations. By employing this methodology, the proposed method eliminates the need for setting a hard threshold for each conversation or database. LDC CALLHOME American English corpus is used for evaluation of the suggested algorithm. The proposed method successfully detects 60.0% of the frames labeled as overlapped speech by the baseline (ground-truth) segmentation, while keeping a 5% false-alarm rate.

Original languageEnglish
Title of host publicationMachine Learning for Signal Processing XIX - Proceedings of the 2009 IEEE Signal Processing Society Workshop, MLSP 2009
DOIs
StatePublished - 1 Dec 2009
EventMachine Learning for Signal Processing XIX - 2009 IEEE Signal Processing Society Workshop, MLSP 2009 - Grenoble, France
Duration: 2 Sep 20094 Sep 2009

Publication series

NameMachine Learning for Signal Processing XIX - Proceedings of the 2009 IEEE Signal Processing Society Workshop, MLSP 2009

Conference

ConferenceMachine Learning for Signal Processing XIX - 2009 IEEE Signal Processing Society Workshop, MLSP 2009
Country/TerritoryFrance
CityGrenoble
Period2/09/094/09/09

ASJC Scopus subject areas

  • Human-Computer Interaction
  • Signal Processing
  • Education

Fingerprint

Dive into the research topics of 'Frame level entropy based overlapped speech detection as a pre-processing stage for speaker diarization'. Together they form a unique fingerprint.

Cite this