Time frequency representation for speech recognition

Avishay Amsalem, Ilan D. Shallom

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

In the field of speech recognition it has been shown that incorporating the dynamics of speech has increased recognition success. This concept is presented in Mel Frequency Cepstral Coefficients (MFCC) and its derivatives which present both the static and the dynamics of the vocal tract. In this paper, a new method for capturing the dynamic features of non-stationary speech signals is presented. The proposed approach is based upon the isolation of each cepstral band and projecting it onto orthogonal space, spanned by a set of well defined orthogonal functions. The major idea is to capture and present energy transitions between successive short term speech frames, along a non-stationary segment about 100ms. Non stationary speech segments have been represented by Time-Frequency Representations (TFR) and the analysis was modified to fit a two dimensional data. The introduced features evaluation conducted on the TIDIGIT corpus revealed an average of 58% improvement in word error rate, compared to MFCC and its derivatives in the context of isolated speech recognition in noisy environments.

Original languageEnglish
Title of host publicationITRE 2006 - 4th International Conference on Information Technology
Subtitle of host publicationResearch and Education, Proceedings
PublisherInstitute of Electrical and Electronics Engineers
Pages99-103
Number of pages5
ISBN (Print)1424408598, 9781424408597
DOIs
StatePublished - 1 Jan 2006
EventITRE 2006 - 4th International Conference on Information Technology: Research and Education - Tel-Aviv, Israel
Duration: 17 Oct 200618 Oct 2006

Publication series

NameITRE 2006 - 4th International Conference on Information Technology: Research and Education, Proceedings

Conference

ConferenceITRE 2006 - 4th International Conference on Information Technology: Research and Education
Country/TerritoryIsrael
CityTel-Aviv
Period17/10/0618/10/06

Keywords

  • Automatic speech recognition
  • Basis function families
  • Mel frequency filter bank
  • Orthogonal projection
  • Speech processing

ASJC Scopus subject areas

  • General Computer Science
  • Information Systems
  • Information Systems and Management

Fingerprint

Dive into the research topics of 'Time frequency representation for speech recognition'. Together they form a unique fingerprint.

Cite this