Sports Video Captioning by Attentive Motion Representation based Hierarchical Recurrent Neural Networks. - Citegraph

Paper Info

Title
Sports Video Captioning by Attentive Motion Representation based Hierarchical Recurrent Neural Networks.

Abstract
Sports video captioning is a task of automatically generating a textual description for sports events (e.g. football, basketball or volleyball games). Although previous works have shown promising performance in producing the coarse and general description of a video, it is still quite challenging to caption a sports video with multiple fine-grained player's actions and complex group relationship among players. In this paper, we present a novel hierarchical recurrent neural network (RNN) based framework with an attention mechanism for sports video captioning. A motion representation module is proposed to extract individual pose attribute and group-level trajectory cluster information. Moreover, we introduce a new dataset called Sports Video Captioning Dataset-Volleyball for evaluation. We evaluate our proposed model over two public datasets and our new dataset, and the experimental results demonstrate that our method outperforms the state-of-the-art methods.

Year	DOI	Venue
2018	10.1145/3265845.3265851	MM '18: ACM Multimedia Conference Seoul Republic of Korea October, 2018
DocType	ISBN	Citations
Conference	978-1-4503-5981-8	3
PageRank	References	Authors
0.38	16	4

Authors (4 rows)

Cited by (3 rows)

References (16 rows)

Name	Order	Citations	PageRank
Mengshi Qi	1	36	3.91
Yunhong Wang	2	3816	278.50
Annan Li	3	222	14.22
Jiebo Luo	4	6314	374.00

1