Detecting Adversarial Examples and Other Misclassifications in Neural Networks by Introspection. - Citegraph

Paper Info

Title
Detecting Adversarial Examples and Other Misclassifications in Neural Networks by Introspection.

Abstract
Despite having excellent performances for a wide variety of tasks, modern neural networks are unable to provide a reliable confidence value allowing to detect misclassifications. This limitation is at the heart of what is known as an adversarial example, where the network provides a wrong prediction associated with a strong confidence to a slightly modified image. Moreover, this overconfidence issue has also been observed for regular errors and out-of-distribution data. We tackle this problem by what we call introspection, i.e. using the information provided by the logits of an already pretrained neural network. We show that by training a simple 3-layers neural network on top of the logit activations, we are able to detect misclassifications at a competitive level.

Year	Venue	DocType
2019	arXiv: Learning	Journal
Volume	Citations	PageRank
abs/1905.09186	1	0.35
References	Authors
0	2

Authors (2 rows)

Cited by (1 rows)

References (0 rows)

Name	Order	Citations	PageRank
Jonathan Aigrain	1	19	2.16
Marcin Detyniecki	2	330	39.95

1