Title
Do Deep-Learning Saliency Models Really Model Saliency?
Abstract
Visual attention allows the human visual system to effectively deal with the huge flow of visual information acquired by the retina. Since the years 2000, the human visual system began to be modelled in computer vision to predict abnormal, rare and surprising data. Attention is a product of the continuous interaction between bottom-up (mainly feature-based) and top-down (mainly learning-based) information. Deep-learning (DNN) is now well established in visual attention modelling with very effective models. The goal of this paper is to investigate the importance of bottom-up versus top-down attention. First, we enrich with top-down information classical bottom-up models of attention. Then, the results are compared with DNN-based models. Our provocative question is: "do deep-learning saliency models really predict saliency or they simply detect interesting objects?". We found that if DNN saliency models very accurately detect top-down features, they neglect a lot of bottom-up information which is surprising and rare, thus by definition difficult to learn.
Year
Venue
Keywords
2018
2018 25TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP)
attention, saliency, DNN, bottom-up, top-down, object detection, face detection, text detection
Field
DocType
ISSN
Computer vision,Salience (neuroscience),Human visual system model,Visualization,Computer science,Visual attention,Artificial intelligence,Deep learning,Continuous interaction
Conference
1522-4880
Citations 
PageRank 
References 
0
0.34
0
Authors
5
Name
Order
Citations
PageRank
Phutphalla Kong100.34
Matei Mancas231527.50
Nimol Thuon300.34
Seng Kheang400.34
Bernard Gosselin519812.88