69 — One pixel attack for fooling deep neural networks
Read on 29 October 2017This paper got a lot of attention recently because its effects are so dramatic: The adversarial system presented in this work alters only one pixel of an image, and completely changes an imagenet’s classification result. Such “few-pixel attacks” have, in the past, usually required changes of small numbers of pixels — usually fewer than ten — but generally more than 1.
Because it is generally easy to spot clusters of altered pixels with a human eye, the goal for adversarial image-generation should be to minimize the number of noticeable artifacts when adjusting an image to be “unrecognizable” by a machine.
This method works best on images that are already at the boundaries between a few image classification spaces; in these cases, it is much easier to affect the resulting classification with a few-pixel attack.
Based on prior work (Szegedy et al, Goodfellow et al), the authors were able to determine which pixels would most dramatically affect the classifier output; this is because the input of a DNN is a high-dimensional cube, along the axes of which large changes in classification can be seen. This can be used to select ideal pixel changes along 7, 5, 3, or single-pixel-correspondent axes. The authors use genetic algorithms, where the DNA is modification instructions, to select for the most potent changes to the image.
Most notable to me is the simplicity and speed of this adversarial algorithm; the pixel modifications usually look like coloring artifacts or dead pixels. That’s frighteningly close to something that could occur ‘in the wild’. It’s strange to think about how brittle these computer vision algorithms still are despite their incredible power, and how something as simple as a single bad pixel could convince, say, a self-driving car that a bird is actually another car.