Unknown-multiple speaker clustering using HMM

J. Ajmera, H. Bourlard, I. Lapidot, I. McCowan

Research output: Contribution to conferencePaperpeer-review

84 Scopus citations

Abstract

An HMM-based speaker clustering framework is presented, where the number of speakers and segmentation boundaries are unknown a priori. Ideally, the system aims to create one pure cluster for each speaker. The HMM is ergodic in nature with a minimum duration topology. The final number of clusters is determined automatically by merging closest clusters and retraining this new cluster, until a decrease in likelihood is observed. In the same framework, we also examine the effect of using only the features from highly voiced frames as a means of improving the robustness and computational complexity of the algorithm. The proposed system is assessed on the 1996 HUB-4 evaluation test set in terms of both cluster and speaker purity. It is shown that the number of clusters found often correspond to the actual number of speakers.

Original languageEnglish
Pages573-576
Number of pages4
StatePublished - 1 Jan 2002
Externally publishedYes
Event7th International Conference on Spoken Language Processing, ICSLP 2002 - Denver, United States
Duration: 16 Sep 200220 Sep 2002

Conference

Conference7th International Conference on Spoken Language Processing, ICSLP 2002
Country/TerritoryUnited States
CityDenver
Period16/09/0220/09/02

ASJC Scopus subject areas

  • Language and Linguistics
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Unknown-multiple speaker clustering using HMM'. Together they form a unique fingerprint.

Cite this