A study on data augmentation in voice anti-spoofing

Ariel Cohen, Inbal Rimon, Eran Aflalo, Haim H. Permuter

Research output: Contribution to journalArticlepeer-review

2 Scopus citations

Abstract

In this paper we perform an in depth study of how data augmentation techniques improve synthetic or spoofed audio detection. Specifically, we propose methods to deal with channel variability, different audio compressions, different bandwidths and unseen spoofing attacks. These challenges, have all been shown to significantly degrade the performance of audio based systems and anti spoofing systems. Our results are based on the ASVspoof 2021 challenge, in the Logical Access (LA) and Deep Fake (DF) categories. Our study is Data-Centric, meaning that the models are fixed and we significantly improve the results by manipulating the data. We introduce two forms of data augmentation - compression augmentation for the DF part, and compression and channel augmentation for the LA part. In addition, we introduce a double sided log spectrogram feature design that improves the results significantly by centering the sub-bands of interest, where the discriminating spoofing artifacts can be localized. Furthermore, a new type of online data augmentation, SpecAverage, is introduced. This method includes masking the audio features with their average value in order to improve generalization. Our best single system and fusion schemes both achieve state of the art performance in the DF category, with an EER of 15.46% and 14.27%, respectively. Our best system for the LA task reduced the best baseline EER by 50% and the min t-DCF by 16%. Our techniques to deal with spoofed data from a wide variety of distributions can be replicated and can help anti spoofing and speech based systems enhance their results.

Original languageEnglish
Pages (from-to)56-67
Number of pages12
JournalSpeech Communication
Volume141
DOIs
StatePublished - 1 Jun 2022

Keywords

  • ASVspoof 2021
  • Audio data augmentation
  • Data-centric AI
  • SpecAugment
  • Voice anti spoofing
  • Voice deep fake

ASJC Scopus subject areas

  • Software
  • Modeling and Simulation
  • Communication
  • Language and Linguistics
  • Linguistics and Language
  • Computer Vision and Pattern Recognition
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'A study on data augmentation in voice anti-spoofing'. Together they form a unique fingerprint.

Cite this