Title
Remote Sensing Image Captioning with Continuous Output Neural Models
Abstract
ABSTRACTRemote sensing image captioning involves generating a concise textual description for an input aerial image. Most previous methods are based on neural encoder-decoder models trained to generate a sequence of discrete outputs with the standard cross-entropy token-level loss. This paper explores an alternative method based on continuous outputs, generating sequences of embedding vectors instead of directly predicting discrete word tokens. We argue that continuous outputs can facilitate the optimization of semantic similarity, as opposed to exact word-by-word matches. It also facilitates the use of loss functions that compare different views of the data. This includes comparing representations for individual tokens and for the entire captions, and also comparing captions against intermediate image representations. We experimentally compared discrete versus continuous output methods over the RSICD dataset, extensively used in the area. Results show that continuous outputs can indeed lead to better results, and our approach performs competitively with the state-of-the-art model in the area.
Year
DOI
Venue
2021
10.1145/3474717.3483631
Geographic Information Systems
DocType
Citations 
PageRank 
Conference
0
0.34
References 
Authors
0
2
Name
Order
Citations
PageRank
Rita Ramos100.34
Bruno Martins244134.58