Resolution limitation in speakers clustering and segmentation problems

Itshak Lapidot, Hugo Guterman

Research output: Contribution to conferencePaperpeer-review

4 Scopus citations


In unlabeled and unsegmented conversation, i.e. no a-priori knowledge about speakers' identity and segments boundaries is provided, it is very important to cluster the conversation (make a segmentation and labeling) with the best possible resolution. For low-resolution cases, i.e. the duration of the segment is long; the segments might contain data from several speakers. On the other hand, when short segments are used (high resolution) not enough statistics is provided to allow correct decision about the identity of the speakers. In this work the performance of a system, which employs different segment lengths, is presented. We assumed that the number of speakers, R, is known, and high-quality conversations were used. Each speaker was modeled by a Self-Organizing-Map (SOM). An iterative algorithm allows the data to move from one model to another and adjust the SOMs. The restriction that the data can move only in small groups but not by moving each and every feature vector separately force the SOMs to adjust to speakers (instead of phonemes or other vocal events). We found that the optimal segment duration was half-second. The system has a clustering performance of about 90% for tow-speaker conversation and over 80% for three-speaker conversations.

Original languageEnglish
Number of pages6
StatePublished - 1 Jan 2001
EventSpeaker Recognition Workshop 2001: A Speaker Odyssey, ODYSSEY 2001 - Crete, Greece
Duration: 18 Jun 200122 Jun 2001


ConferenceSpeaker Recognition Workshop 2001: A Speaker Odyssey, ODYSSEY 2001

ASJC Scopus subject areas

  • Signal Processing
  • Software
  • Human-Computer Interaction


Dive into the research topics of 'Resolution limitation in speakers clustering and segmentation problems'. Together they form a unique fingerprint.

Cite this