17 — Adversarial Examples for Evaluating Reading Comprehension Systems
Read on 07 September 2017Recently, researchers have demonstrated susceptibility of facial recognition neural networks to basic perturbation. The authors of this paper reproduce this flavor of conclusion in text-parsing nets.
The authors present several tools to adversarially mutate inputs to a neural network in order to affect the accuracy of the network, focusing on text-comprehension nets. One such tool, AddSent
, is a “concatenative adversary”, appending text to the end of an input paragraph in order to test the susceptibility of a network to irrelevant information. For instance, to this paragraph the network might add: The author of *Harry Potter* developed many characters.
When asked “What did the authors of this paper develop?”, the network might incorrectly respond with “Many characters”, rather than the correct response, “AddSent”.
This paper demonstrates the importance of testing for these vulnerabilities in a neural network through the use of similar tools, such as AddAny
, which appends random english words instead of valid or similar sentences. This ‘noise’, which should have no affect on the accuracy of the network, instead dramatically affects the output, suggesting that many state-of-the-art nets suffer from overfitting, or instead seem to recognize a fundamental structure of “factual” writing (e.g. Wikipedia text) which is easily fooled by basic manipulation.
One important factor in AddSent
and its related tools is that human comprehension is not in any way fooled by the added sentences: They’re explicitly designed not to maliciously alter the meaning of a paragraph. (No one — at least no one reading carefully — would have thought that J. K. Rowling developed AddSent
in the earlier paragraph; but the net in question is fooled nonetheless.)
Rather than proposing a fix for current state-of-the-art, the authors explain that this sort of simple perturbation is important to test for when designing deep-learning algorithms, and release their perturbation code online for public use.