Understanding adversarial training: Increasing local stability of supervised models through robust optimization

Uri Shaham, Yutaro Yamada, Sahand Negahban

Research output: Contribution to journalArticlepeer-review

173 Scopus citations

Abstract

We show that adversarial training of supervised learning models is in fact a robust optimization procedure. To do this, we establish a general framework for increasing local stability of supervised learning models using robust optimization. The framework is general and broadly applicable to differentiable non-parametric models, e.g., Artificial Neural Networks (ANNs). Using an alternating minimization-maximization procedure, the loss of the model is minimized with respect to perturbed examples that are generated at each parameter update, rather than with respect to the original training data. Our proposed framework generalizes adversarial training, as well as previous approaches for increasing local stability of ANNs. Experimental results reveal that our approach increases the robustness of the network to existing adversarial examples, while making it harder to generate new ones. Furthermore, our algorithm improves the accuracy of the networks also on the original test data.

Original languageEnglish
Pages (from-to)195-204
Number of pages10
JournalNeurocomputing
Volume307
DOIs
StatePublished - 13 Sep 2018
Externally publishedYes

Keywords

  • Adversarial examples
  • Deep learning
  • Non-parametric supervised models
  • Robust optimization

ASJC Scopus subject areas

  • Computer Science Applications
  • Cognitive Neuroscience
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Understanding adversarial training: Increasing local stability of supervised models through robust optimization'. Together they form a unique fingerprint.

Cite this