Automatic detection of speaker state: Lexical, prosodic, and phonetic approaches to level-of-interest and intoxication classification

  • William Yang Wang
  • , Fadi Biadsy
  • , Andrew Rosenberg
  • , Julia Hirschberg

Research output: Contribution to journalArticlepeer-review

14 Scopus citations

Abstract

Traditional studies of speaker state focus primarily upon one-stage classification techniques using standard acoustic features. In this article, we investigate multiple novel features and approaches to two recent tasks in speaker state detection: level-of-interest (LOI) detection and intoxication detection. In the task of LOI prediction, we propose a novel Discriminative TFIDF feature to capture important lexical information and a novel Prosodic Event detection approach using AuToBI; we combine these with acoustic features for this task using a new multilevel multistream prediction feedback and similarity-based hierarchical fusion learning approach. Our experimental results outperform published results of all systems in the 2010 Interspeech Paralinguistic Challenge - Affect Subchallenge. In the intoxication detection task, we evaluate the performance of Prosodic Event-based, phone duration-based, phonotactic, and phonetic-spectral based approaches, finding that a combination of the phonotactic and phonetic-spectral approaches achieve significant improvement over the 2011 Interspeech Speaker State Challenge - Intoxication Subchallenge baseline. We discuss our results using these new features and approaches and their implications for future research.

Original languageEnglish
Pages (from-to)168-189
Number of pages22
JournalComputer Speech and Language
Volume27
Issue number1
DOIs
StatePublished - 1 Jan 2013
Externally publishedYes

Keywords

  • Emotional speech
  • Paralinguistic
  • Speaker state

ASJC Scopus subject areas

  • Software
  • Theoretical Computer Science
  • Human-Computer Interaction

Fingerprint

Dive into the research topics of 'Automatic detection of speaker state: Lexical, prosodic, and phonetic approaches to level-of-interest and intoxication classification'. Together they form a unique fingerprint.

Cite this