Perceptual time-varying modelling of speech signals for ASR and compression application

Amir Leibman, Ilan D. Shallom

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

Perceptual audio coders and Automatic Speech Recognition (ASR) systems are commonly based on short-time analysis. This paper presents a generalized model for time-varying coefficients based on psychoacoustic properties of the human ear. The proposed model is evaluated in the framework of speaker independent speech recognition using Hidden Markov Models (HMM). The generalized model is compared to the traditional most popular MFCC. The comparison is made with respect to the models baud rate and the total error rate measured in an extensive Speech recognition experiment. The recognition based on the well established speech recognition development environment, the HTK and using the TIDIGIT as the evaluation database. The time varying model achieves better recognition rate in comparison to MFCC, while the proposed model baud rate is about one third of the baud rate that is used in the case of MFCC. In addition, a preliminary evaluation of the model robustness to noise was carried out and is presented.

Original languageEnglish
Title of host publication13th European Signal Processing Conference, EUSIPCO 2005
Pages1720-1723
Number of pages4
StatePublished - 1 Dec 2005
Event13th European Signal Processing Conference, EUSIPCO 2005 - Antalya, Turkey
Duration: 4 Sep 20058 Sep 2005

Publication series

Name13th European Signal Processing Conference, EUSIPCO 2005

Conference

Conference13th European Signal Processing Conference, EUSIPCO 2005
Country/TerritoryTurkey
CityAntalya
Period4/09/058/09/05

ASJC Scopus subject areas

  • Signal Processing

Fingerprint

Dive into the research topics of 'Perceptual time-varying modelling of speech signals for ASR and compression application'. Together they form a unique fingerprint.

Cite this