Abstract
Our article introduces a novel defense mechanism against black-box attacks, where attackers exploit the victim model as an oracle to craft adversarial examples. Unlike traditional pre-processing defenses that rely on sanitizing input samples, our stateless strategy directly counters the attack process itself. For each query, we evaluate a counter-sample, an optimized version of the original sample, designed to thwart the attacker's objective. By responding to every black-box query with a targeted white-box optimization, our strategy introduces a strategic asymmetry that significantly advantages the defender.Our approach proves to be highly effective against state-of-the-art black-box attacks, outperforming existing defenses on both CIFAR-10 and ImageNet datasets. Specifically, our method achieves an average Attack Failure Rate (AFR) of 74.7% (up from 13%) on ImageNet and 67.7% (up from 3.5%) on CIFAR-10 when tested against 10 state-of-the-art query-based black-box attacks. Moreover, it maintains the model's performance on legitimate inputs, with accuracy (ACC) reduced by only 0.7% on ImageNet and 0.9% on CIFAR-10. This is in stark contrast to other defenses tested, which can cause accuracy drops of up to 50%. Such a modest decrease ensures negligible performance degradation on legitimate tasks.Furthermore, we demonstrate that our defense exhibits superior robustness across datasets and attack scenarios, including adaptive attacks specifically designed to try to bypass our method. This robustness highlights the strength and adaptability of our approach in countering adversarial threats.
| Original language | English |
|---|---|
| Article number | 94 |
| Journal | ACM Transactions on Intelligent Systems and Technology |
| Volume | 16 |
| Issue number | 4 |
| DOIs | |
| State | Published - 18 Aug 2025 |
Keywords
- Adversarial Examples
- Adversarial Machine Learning
- Attack Mitigation
- Model Robustness
- Query Based Attacks
ASJC Scopus subject areas
- Theoretical Computer Science
- Artificial Intelligence