TY - GEN
T1 - Robust Localization of Partially Fake Speech
T2 - 17th Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2025
AU - Luong, Hieu Thi
AU - Rimon, Inbal
AU - Permuter, Haim
AU - Lee, Kong Aik
AU - Chng, Eng Siong
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025/1/1
Y1 - 2025/1/1
N2 - Partial audio deepfake localization poses unique challenges and remain underexplored compared to full-utterance spoofing detection. While recent methods report strong indomain performance, their real-world utility remains unclear. In this analysis, we critically examine the limitations of current evaluation practices, particularly the widespread use of Equal Error Rate (EER), which often obscures generalization and deployment readiness. We propose reframing the localization task as a sequential anomaly detection problem and advocate for the use of threshold-dependent metrics such as accuracy, precision, recall, and F1-score, which better reflect real-world behavior. Specifically, we analyze the performance of the open-source Coarse-to-Fine Proposal Refinement Framework (CFPRF), which achieves a 20-ms EER of 7.61% on the in-domain PartialSpoof evaluation set, but 43.25% and 27.59% on the LlamaPartialSpoof and Half-Truth out-of-domain test sets. Interestingly, our reproduced version of the same model performs worse on in-domain data (9.84%) but better on the out-of-domain sets (41.72% and 14.98 %, respectively). This highlights the risks of over-optimizing for in-domain EER, which can lead to models that perform poorly in real-world scenarios. It also suggests that while deep learning models can be effective on in-domain data, they generalize poorly to out-of-domain scenarios, failing to detect novel synthetic samples and misclassifying unfamiliar bona fide audio. Finally, we observe that adding more bona fide or fully synthetic utterances to the training data often degrades performance, whereas adding partially fake utterances improves it.
AB - Partial audio deepfake localization poses unique challenges and remain underexplored compared to full-utterance spoofing detection. While recent methods report strong indomain performance, their real-world utility remains unclear. In this analysis, we critically examine the limitations of current evaluation practices, particularly the widespread use of Equal Error Rate (EER), which often obscures generalization and deployment readiness. We propose reframing the localization task as a sequential anomaly detection problem and advocate for the use of threshold-dependent metrics such as accuracy, precision, recall, and F1-score, which better reflect real-world behavior. Specifically, we analyze the performance of the open-source Coarse-to-Fine Proposal Refinement Framework (CFPRF), which achieves a 20-ms EER of 7.61% on the in-domain PartialSpoof evaluation set, but 43.25% and 27.59% on the LlamaPartialSpoof and Half-Truth out-of-domain test sets. Interestingly, our reproduced version of the same model performs worse on in-domain data (9.84%) but better on the out-of-domain sets (41.72% and 14.98 %, respectively). This highlights the risks of over-optimizing for in-domain EER, which can lead to models that perform poorly in real-world scenarios. It also suggests that while deep learning models can be effective on in-domain data, they generalize poorly to out-of-domain scenarios, failing to detect novel synthetic samples and misclassifying unfamiliar bona fide audio. Finally, we observe that adding more bona fide or fully synthetic utterances to the training data often degrades performance, whereas adding partially fake utterances improves it.
UR - https://www.scopus.com/pages/publications/105030484488
U2 - 10.1109/APSIPAASC65261.2025.11249067
DO - 10.1109/APSIPAASC65261.2025.11249067
M3 - Conference contribution
AN - SCOPUS:105030484488
T3 - 2025 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2025
SP - 2205
EP - 2210
BT - 2025 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2025
PB - Institute of Electrical and Electronics Engineers
Y2 - 22 October 2025 through 24 October 2025
ER -