Abstract
Few-shot classifiers have been shown to exhibit promising results in use cases where user-provided labels are scarce. These models are able to learn to predict novel classes simply by training on a non-overlapping set of classes. This
can be largely attributed to the differences in their mechanisms as compared to conventional deep networks. However, this also offers new opportunities for novel attackers to induce integrity attacks against such models, which are not present in other machine learning setups. In this work, we aim to close this gap
by studying a conceptually simple approach to defend few-shot classifiers against adversarial attacks. More specifically, we propose a simple attack-agnostic detection method, using the concept of self-similarity and filtering, to flag out adversarial support
sets which destroy the understanding of a victim classifier for a
certain class. Our extended evaluation on the miniImagenet (MI)
and CUB datasets exhibit good attack detection performance,
across three different few-shot classifiers and across different
attack strengths, beating baselines. Our observed results allow
our approach to establishing itself as a strong detection method
for support set poisoning attacks. We also show that our approach
constitutes a generalizable concept, as it can be paired with other
filtering functions. Finally, we provide an analysis of our results
when we vary two components found in our detection approach.
can be largely attributed to the differences in their mechanisms as compared to conventional deep networks. However, this also offers new opportunities for novel attackers to induce integrity attacks against such models, which are not present in other machine learning setups. In this work, we aim to close this gap
by studying a conceptually simple approach to defend few-shot classifiers against adversarial attacks. More specifically, we propose a simple attack-agnostic detection method, using the concept of self-similarity and filtering, to flag out adversarial support
sets which destroy the understanding of a victim classifier for a
certain class. Our extended evaluation on the miniImagenet (MI)
and CUB datasets exhibit good attack detection performance,
across three different few-shot classifiers and across different
attack strengths, beating baselines. Our observed results allow
our approach to establishing itself as a strong detection method
for support set poisoning attacks. We also show that our approach
constitutes a generalizable concept, as it can be paired with other
filtering functions. Finally, we provide an analysis of our results
when we vary two components found in our detection approach.
Original language | English |
---|---|
State | Published - 24 Oct 2021 |