ATMM-SAGA: Alternating Training for Multi-Module with Score-Aware Gated Attention SASV system

Research output: Contribution to journalConference articlepeer-review

Abstract

The objective of automatic speaker verification (ASV) systems is to determine whether a given test speech utterance corresponds to a claimed enrolled speaker. These systems have a wide range of applications, and ensuring their reliability is crucial. In this paper, we propose a spoofing-robust automatic speaker verification (SASV) system employing a score-aware gated attention (SAGA) fusion scheme, integrating scores from a pre-trained countermeasure (CM) with speaker embeddings from a pre-trained ASV. Specifically, we employ the AASIST and ECAPA-TDNN models. SAGA acts as an adaptive gating mechanism, where the CM score determines how strongly ASV embeddings influence the final SASV decision. Experiments on the ASVspoof2019 logical access dataset demonstrate that the proposed SASV system achieves an SASV equal error rate (SASV-EER) and agnostic detection cost function (a-DCF) of 2.31%, 0.0603 for the development set and 2.18%, 0.0480 for the evaluation set.

Original languageEnglish
Pages (from-to)3708-3712
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
DOIs
StatePublished - 1 Jan 2025
Event26th Interspeech Conference 2025 - Rotterdam, Netherlands
Duration: 17 Aug 202521 Aug 2025

Keywords

  • alternating training for multi-module (ATMM)
  • countermeasure
  • score-aware gated attention
  • spoofing-robust automatic speaker verification

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Language and Linguistics
  • Modeling and Simulation
  • Human-Computer Interaction

Fingerprint

Dive into the research topics of 'ATMM-SAGA: Alternating Training for Multi-Module with Score-Aware Gated Attention SASV system'. Together they form a unique fingerprint.

Cite this