Title
Assessing Neural Network Scene Classification from Degraded Images
Abstract
Scene recognition is an essential component of both machine and biological vision. Recent advances in computer vision using deep convolutional neural networks (CNNs) have demonstrated impressive sophistication in scene recognition, through training on large datasets of labeled scene images (Zhou et al. 2018, 2014). One criticism of CNN-based approaches is that performance may not generalize well beyond the training image set (Torralba and Efros 2011), and may be hampered by minor image modifications, which in some cases are barely perceptible to the human eye (Goodfellow et al. 2015; Szegedy et al. 2013). While these “adversarial examples” may be unlikely in natural contexts, during many real-world visual tasks scene information can be degraded or limited due to defocus blur, camera motion, sensor noise, or occluding objects. Here, we quantify the impact of several image degradations (some common, and some more exotic) on indoor/outdoor scene classification using CNNs. For comparison, we use human observers as a benchmark, and also evaluate performance against classifiers using limited, manually selected descriptors. While the CNNs outperformed the other classifiers and rivaled human accuracy for intact images, our results show that their classification accuracy is more affected by image degradations than human observers. On a practical level, however, accuracy of the CNNs remained well above chance for a wide range of image manipulations that disrupted both local and global image statistics. We also examine the level of image-by-image agreement with human observers, and find that the CNNs’ agreement with observers varied as a function of the nature of image manipulation. In many cases, this agreement was not substantially different from the level one would expect to observe for two independent classifiers. Together, these results suggest that CNN-based scene classification techniques are relatively robust to several image degradations. However, the pattern of classifications obtained for ambiguous images does not appear to closely reflect the strategies employed by human observers.
Year
DOI
Venue
2019
10.1145/3342349
ACM Transactions on Applied Perception (TAP)
Keywords
Field
DocType
Human perception, human scene recognition
Human eye,Image manipulation,Computer vision,Computer science,Convolutional neural network,Artificial intelligence,Artificial neural network,Perception
Journal
Volume
Issue
ISSN
16
4
1544-3558
Citations 
PageRank 
References 
1
0.36
0
Authors
4
Name
Order
Citations
PageRank
Timothy Tadros110.36
Nicholas C. Cullen210.36
Michelle R Greene341.84
Emily A. Cooper410.70