Skip to main navigation Skip to search Skip to main content

Conformer parrotron: A faster and stronger end-to-end speech conversion and recognition model for atypical speech

  • Zhehuai Chen
  • , Bhuvana Ramabhadran
  • , Fadi Biadsy
  • , Xia Zhang
  • , Youzheng Chen
  • , Liyang Jiang
  • , Fang Chu
  • , Rohan Doshi
  • , Pedro J. Moreno

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

9 Scopus citations

Abstract

Parrotron is an end-to-end personalizable model that enables many-to-one voice conversion (VC) and automated speech recognition (ASR) simultaneously for atypical speech. In this work, we present the next-generation Parrotron model with improvements in overall accuracy, training and inference speeds. The proposed architecture builds on the recent Conformer encoder comprising of convolution and attention layer based blocks used in ASR. We introduce architectural modifications that subsamples encoder activations to achieve speed-ups in training and inference. In order to jointly improve ASR and voice conversion quality, we show that this requires a corresponding upsampling after the Conformer encoder blocks. We provide an in-depth analysis on how the proposed approach can maximize the efficiency of a speech-to-speech conversion model in the context of atypical speech. Experiments on both many-to-one and one-to-one dysarthric speech conversion tasks show that we can achieve up to 7X speedup and 35% relative reduction in WER over the previous best Transformer Parrotron.

Original languageEnglish
Title of host publication22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021
PublisherInternational Speech Communication Association
Pages3101-3105
Number of pages5
ISBN (Electronic)9781713836902
DOIs
StatePublished - 1 Jan 2021
Externally publishedYes
Event22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021 - Brno, Czech Republic
Duration: 30 Aug 20213 Sep 2021

Publication series

NameProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume4
ISSN (Print)2308-457X
ISSN (Electronic)2958-1796

Conference

Conference22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021
Country/TerritoryCzech Republic
CityBrno
Period30/08/213/09/21

Keywords

  • Sequence-to-sequence model
  • Speech impairments
  • Speech recognition
  • Voice conversion

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Language and Linguistics
  • Modeling and Simulation
  • Human-Computer Interaction

Fingerprint

Dive into the research topics of 'Conformer parrotron: A faster and stronger end-to-end speech conversion and recognition model for atypical speech'. Together they form a unique fingerprint.

Cite this