Abstract | ||
---|---|---|
Video captioning is a challenging task as it needs to accurately transform visual understanding into natural language description. To date, state-of-the-art methods inadequately model global-local vision representation for sentence generation, leaving plenty of room for improvement. In this work, we approach the video captioning task from a new perspective and propose a GLR framework, namely a global-local representation granularity. Our GLR demonstrates three advantages over the prior efforts. First, we propose a simple solution, which exploits extensive vision representations from different video ranges to improve linguistic expression. Second, we devise a novel global-local encoder, which encodes different video representations including long-range, short-range and local-keyframe, to produce rich semantic vocabulary for obtaining a descriptive granularity of video contents across frames. Finally, we introduce the progressive training strategy which can effectively organize feature learning to incur optimal captioning behavior. Evaluated on the MSR-VTT and MSVD dataset, we outperform recent state-of-the-art methods including a well-tuned SA-LSTM baseline by a significant margin, with shorter training schedules. Because of its simplicity and efficacy, we hope that our GLR could serve as a strong baseline for many video understanding tasks besides video captioning. Code will be available. |
Year | DOI | Venue |
---|---|---|
2022 | 10.1109/TCSVT.2022.3177320 | IEEE Transactions on Circuits and Systems for Video Technology |
Keywords | DocType | Volume |
Computer vision,video captioning,video representation,natural language processing,visual analysis | Journal | 32 |
Issue | ISSN | Citations |
10 | 1051-8215 | 0 |
PageRank | References | Authors |
0.34 | 24 | 7 |
Name | Order | Citations | PageRank |
---|---|---|---|
Liqi Yan | 1 | 0 | 1.01 |
Siqi Ma | 2 | 0 | 0.34 |
Qifan Wang | 3 | 0 | 0.34 |
Victor Yingjie Chen | 4 | 52 | 27.37 |
Xiangyu Zhang | 5 | 2857 | 151.00 |
Andreas Savakis | 6 | 377 | 41.10 |
Dongfang Liu | 7 | 0 | 1.69 |