Video Captioning with Listwise Supervision. - Citegraph

Paper Info

Title
Video Captioning with Listwise Supervision.

Abstract
Automatically describing video content with natural language is a fundamental challenging that has received increasing attention. However, existing techniques restrict the model learning on the pairs of each video and its own sentences, and thus fail to capture more holistically semantic relationships among all sentences. In this paper, we propose to model relative relationships of different video-sentence pairs and present a novel framework, named Long Short-Term Memory with Listwise Supervision (LSTM-LS), for video captioning. Given each video in training data, we obtain a ranking list of sentences w.r.t. a given sentence associated with the video using nearest-neighbor search. The ranking information is represented by a set of rank triplets that can be used to assess the quality of ranking list. The video captioning problem is then solved by learning LSTM model for sentence generation, through maximizing the ranking quality over all the sentences in the list. The experiments on MSVD dataset show that our proposed LSTM-LS produces better performance than the state of the art in generating natural sentences: 51.1% and 32.6% in terms of BLEU@4 and METEOR, respectively. Superior performances are also reported on the movie description M-VAD dataset.

Year	Venue	Field
2017	THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE	Closed captioning,Computer science,Artificial intelligence,Multimedia,Machine learning
DocType	Citations	PageRank
Conference	3	0.38
References	Authors
0	3

Authors (3 rows)

Cited by (3 rows)

References (0 rows)

Name	Order	Citations	PageRank
Yuan Liu	1	215	11.43
Xue Li	2	2196	186.96
Shi Zhongchao	3	46	8.98

1