Fitting new speakers based on a short untranscribed sample

Eliya Nachmani, Adam Polyak, Yaniv Taigman, Lior Wolf

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

29 Scopus citations

Abstract

Learning-based Text To Speech systems have the potential to generalize from one speaker to the next and thus require a relatively short sample of any new voice. However, this promise is cur-rently largely unrealized. We present a method that is designed to capture a new speaker from a short untranscribed audio sample. This is done by employing an additional network that given an audio sample, places the speaker in the embedding space. This network is trained as part of the speech synthesis system using various consistency losses. Our results demonstrate a greatly im-proved performance on both the dataset speakers, and, more importantly, when fitting new voices, even from very short samples.

Original languageEnglish
Title of host publication35th International Conference on Machine Learning, ICML 2018
EditorsJennifer Dy, Andreas Krause
PublisherInternational Machine Learning Society (IMLS)
Pages5932-5940
Number of pages9
ISBN (Electronic)9781510867963
StatePublished - 1 Jan 2018
Externally publishedYes
Event35th International Conference on Machine Learning, ICML 2018 - Stockholm, Sweden
Duration: 10 Jul 201815 Jul 2018

Publication series

Name35th International Conference on Machine Learning, ICML 2018
Volume8

Conference

Conference35th International Conference on Machine Learning, ICML 2018
Country/TerritorySweden
CityStockholm
Period10/07/1815/07/18

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Human-Computer Interaction
  • Software

Fingerprint

Dive into the research topics of 'Fitting new speakers based on a short untranscribed sample'. Together they form a unique fingerprint.

Cite this