Title
Video Captioning with Listwise Supervision.
Abstract
Automatically describing video content with natural language is a fundamental challenging that has received increasing attention. However, existing techniques restrict the model learning on the pairs of each video and its own sentences, and thus fail to capture more holistically semantic relationships among all sentences. In this paper, we propose to model relative relationships of different video-sentence pairs and present a novel framework, named Long Short-Term Memory with Listwise Supervision (LSTM-LS), for video captioning. Given each video in training data, we obtain a ranking list of sentences w.r.t. a given sentence associated with the video using nearest-neighbor search. The ranking information is represented by a set of rank triplets that can be used to assess the quality of ranking list. The video captioning problem is then solved by learning LSTM model for sentence generation, through maximizing the ranking quality over all the sentences in the list. The experiments on MSVD dataset show that our proposed LSTM-LS produces better performance than the state of the art in generating natural sentences: 51.1% and 32.6% in terms of BLEU@4 and METEOR, respectively. Superior performances are also reported on the movie description M-VAD dataset.
Year
Venue
Field
2017
THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE
Closed captioning,Computer science,Artificial intelligence,Multimedia,Machine learning
DocType
Citations 
PageRank 
Conference
3
0.38
References 
Authors
0
3
Name
Order
Citations
PageRank
Yuan Liu121511.43
Xue Li22196186.96
Shi Zhongchao3468.98