Detecting Adversarial Perturbations through Spatial Behavior in Activation Spaces

Ziv Katzir, Yuval Elovici

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

14 Scopus citations

Abstract

Although neural network-based classifiers outperform humans in a range of tasks, they are still prone to manipulation through adversarial perturbations. Prior research has resulted in the identification of effective defense mechanisms for many reported attack methods, however a defense against the CW attack, as well as a holistic defense mechanism capable of countering multiple different attack methods, are still missing.All attack methods reported so far share a common goal. They aim to avoid detection by limiting the allowed perturbation magnitude, and still trigger incorrect classification. As a result, small perturbations cause classification to shift from one class to another.We coined the term activation spaces to refer to the hyperspaces formed by the activation values of the different network layers. We then use activation spaces to capture the differences in spatial dynamics between normal and adversarial examples, and form a novel adversarial example detector. We induce a set of k-nearest neighbor (k-NN) classifiers, one per activation space, and leverage those classifiers to assign a sequence of class labels to each input of the neural network. We then calculate the likelihood of each observed label sequence and show that sequences associated with adversarial examples are far less likely than those of normal examples.We demonstrate the efficiency of our proposed detector against the CW attack using two image classification datasets (MNIST, CIFAR-10) achieving an AUC of 0.97 for the CIFAR-10 dataset. We further show how our detector can be easily augmented with previously suggested defense methods to form a holistic multi-purpose defense mechanism.

Original languageEnglish
Title of host publication2019 International Joint Conference on Neural Networks, IJCNN 2019
PublisherInstitute of Electrical and Electronics Engineers
ISBN (Electronic)9781728119854
DOIs
StatePublished - 1 Jul 2019
Event2019 International Joint Conference on Neural Networks, IJCNN 2019 - Budapest, Hungary
Duration: 14 Jul 201919 Jul 2019

Publication series

NameProceedings of the International Joint Conference on Neural Networks
Volume2019-July

Conference

Conference2019 International Joint Conference on Neural Networks, IJCNN 2019
Country/TerritoryHungary
CityBudapest
Period14/07/1919/07/19

Keywords

  • Activation Spaces
  • Adversarial Perturbations
  • Detector

ASJC Scopus subject areas

  • Software
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Detecting Adversarial Perturbations through Spatial Behavior in Activation Spaces'. Together they form a unique fingerprint.

Cite this