Title
Exact Adversarial Attack To Image Captioning Via Structured Output Learning With Latent Variables
Abstract
In this work, we study the robustness of a CNN+RNN based image captioning system being subjected to adversarial noises. We propose to fool an image captioning system to generate some targeted partial captions for an image polluted by adversarial noises, even the targeted captions are totally irrelevant to the image content. A partial caption indicates that the words at some locations in this caption are observed, while words at other locations are not restricted. It is the first work to study exact adversarial attacks of targeted partial captions. Due to the sequential dependencies among words in a caption, we formulate the generation of adversarial noises for targeted partial captions as a structured output learning problem with latent variables. Both the generalized expectation maximization algorithm and structural SVMs with latent variables are then adopted to optimize the problem. The proposed methods generate very successful attacks to three popular CNN+RNN based image captioning models. Furthermore, the proposed attack methods are used to understand the inner mechanism of image captioning systems, providing the guidance to further improve automatic image captioning systems towards human captioning.
Year
DOI
Venue
2019
10.1109/CVPR.2019.00426
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019)
Field
DocType
Volume
Closed captioning,Expectation–maximization algorithm,Computer science,Support vector machine,Image content,Latent variable,Robustness (computer science),Artificial intelligence,Machine learning,Adversarial system
Journal
abs/1905.04016
ISSN
Citations 
PageRank 
1063-6919
7
0.46
References 
Authors
0
7
Name
Order
Citations
PageRank
Xing Xu119927.30
Baoyuan Wu226725.15
Fumin Shen3186891.49
Yanbo Fan4176.36
Yong Zhang5163.97
Heng Tao Shen66020267.19
Wei Liu74041204.19