Counter-Samples: A Stateless Strategy to Neutralize Black-Box Adversarial Attacks

    Research output: Contribution to journalArticlepeer-review

    Abstract

    Our article introduces a novel defense mechanism against black-box attacks, where attackers exploit the victim model as an oracle to craft adversarial examples. Unlike traditional pre-processing defenses that rely on sanitizing input samples, our stateless strategy directly counters the attack process itself. For each query, we evaluate a counter-sample, an optimized version of the original sample, designed to thwart the attacker's objective. By responding to every black-box query with a targeted white-box optimization, our strategy introduces a strategic asymmetry that significantly advantages the defender.Our approach proves to be highly effective against state-of-the-art black-box attacks, outperforming existing defenses on both CIFAR-10 and ImageNet datasets. Specifically, our method achieves an average Attack Failure Rate (AFR) of 74.7% (up from 13%) on ImageNet and 67.7% (up from 3.5%) on CIFAR-10 when tested against 10 state-of-the-art query-based black-box attacks. Moreover, it maintains the model's performance on legitimate inputs, with accuracy (ACC) reduced by only 0.7% on ImageNet and 0.9% on CIFAR-10. This is in stark contrast to other defenses tested, which can cause accuracy drops of up to 50%. Such a modest decrease ensures negligible performance degradation on legitimate tasks.Furthermore, we demonstrate that our defense exhibits superior robustness across datasets and attack scenarios, including adaptive attacks specifically designed to try to bypass our method. This robustness highlights the strength and adaptability of our approach in countering adversarial threats.

    Original languageEnglish
    Article number94
    JournalACM Transactions on Intelligent Systems and Technology
    Volume16
    Issue number4
    DOIs
    StatePublished - 18 Aug 2025

    Keywords

    • Adversarial Examples
    • Adversarial Machine Learning
    • Attack Mitigation
    • Model Robustness
    • Query Based Attacks

    ASJC Scopus subject areas

    • Theoretical Computer Science
    • Artificial Intelligence

    Fingerprint

    Dive into the research topics of 'Counter-Samples: A Stateless Strategy to Neutralize Black-Box Adversarial Attacks'. Together they form a unique fingerprint.

    Cite this