Title | ||
---|---|---|
Sports Video Captioning by Attentive Motion Representation based Hierarchical Recurrent Neural Networks. |
Abstract | ||
---|---|---|
Sports video captioning is a task of automatically generating a textual description for sports events (e.g. football, basketball or volleyball games). Although previous works have shown promising performance in producing the coarse and general description of a video, it is still quite challenging to caption a sports video with multiple fine-grained player's actions and complex group relationship among players. In this paper, we present a novel hierarchical recurrent neural network (RNN) based framework with an attention mechanism for sports video captioning. A motion representation module is proposed to extract individual pose attribute and group-level trajectory cluster information. Moreover, we introduce a new dataset called Sports Video Captioning Dataset-Volleyball for evaluation. We evaluate our proposed model over two public datasets and our new dataset, and the experimental results demonstrate that our method outperforms the state-of-the-art methods.
|
Year | DOI | Venue |
---|---|---|
2018 | 10.1145/3265845.3265851 | MM '18: ACM Multimedia Conference
Seoul
Republic of Korea
October, 2018 |
DocType | ISBN | Citations |
Conference | 978-1-4503-5981-8 | 3 |
PageRank | References | Authors |
0.38 | 16 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Mengshi Qi | 1 | 36 | 3.91 |
Yunhong Wang | 2 | 3816 | 278.50 |
Annan Li | 3 | 222 | 14.22 |
Jiebo Luo | 4 | 6314 | 374.00 |