Online diarization of telephone conversations

Oshry Ben-Harush, Itshak Lapidot, Hugo Guterman

Research output: Contribution to conferencePaperpeer-review

2 Scopus citations

Abstract

Speaker diarization systems attempts to perform segmentation and labeling of a conversation between R speakers, while no prior information is given regarding the conversation. Diarization systems basically tries to answer the question”Who spoke when?”. In order to perform speaker diarization, most state of the art diarization systems operate in an off-line mode, that is, all of the samples of the audio stream are required prior to the application of the diarization algorithm. Off-line diarization algorithms generally relies on a dendogram or hierarchical clustering approach. Several on-line diarization systems has been previously suggested, however, most require some prior information or offline trained speaker and background models in order to conduct all or part of the diarization process. A new two-stage on-line diarization of telephone conversations algorithm is suggested in this study. On the first stage, a fully unsupervised diarization algorithm is applied over an initial training set of the conversation, this stage generates the speakers and non-speech models and tunes a hyper-state Hidden Markov Model (HMM) to be used on the second, on-line stage of diarization. On-line diarization is then applied by means of time-series clustering using the Viterbi dynamic programming algorithm. Employing this approach provides diarization results a few miliseconds following either a user request or once the conversation has concluded. In order to evaluate diarization performance, the diarization system was applied over 2048, 5Min length, two-speaker conversations extracted from the NIST 2005 Speaker Recognition Evaluation. On-line Diarization Error Rate (DER) is shown to approaches the”optimal” DER (achieved by applying unsupervised diarization over the entire conversation) as the length of the initial training set increases. Using an initial training set of 2Min and applying on-line diarization to the entire conversation incurred approximately 4% increase in DER compared to the”optimal” DER.

Original languageEnglish
Pages125-130
Number of pages6
StatePublished - 1 Jan 2010
EventSpeaker and Language Recognition Workshop, Odyssey 2010 - Brno, Czech Republic
Duration: 28 Jun 20101 Jul 2010

Conference

ConferenceSpeaker and Language Recognition Workshop, Odyssey 2010
Country/TerritoryCzech Republic
CityBrno
Period28/06/101/07/10

ASJC Scopus subject areas

  • Signal Processing
  • Software
  • Human-Computer Interaction

Fingerprint

Dive into the research topics of 'Online diarization of telephone conversations'. Together they form a unique fingerprint.

Cite this