Title | ||
---|---|---|
Detecting Adversarial Examples and Other Misclassifications in Neural Networks by Introspection. |
Abstract | ||
---|---|---|
Despite having excellent performances for a wide variety of tasks, modern neural networks are unable to provide a reliable confidence value allowing to detect misclassifications. This limitation is at the heart of what is known as an adversarial example, where the network provides a wrong prediction associated with a strong confidence to a slightly modified image. Moreover, this overconfidence issue has also been observed for regular errors and out-of-distribution data. We tackle this problem by what we call introspection, i.e. using the information provided by the logits of an already pretrained neural network. We show that by training a simple 3-layers neural network on top of the logit activations, we are able to detect misclassifications at a competitive level. |
Year | Venue | DocType |
---|---|---|
2019 | arXiv: Learning | Journal |
Volume | Citations | PageRank |
abs/1905.09186 | 1 | 0.35 |
References | Authors | |
0 | 2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Jonathan Aigrain | 1 | 19 | 2.16 |
Marcin Detyniecki | 2 | 330 | 39.95 |