Abstract | ||
---|---|---|
This paper proposes an end-to-end temporal attention learning method to improve the performance of action quality assessment in sports video. For temporal weighted training, an attention-learning module is built to simulate the attention mechanism and judgement preference of human perception on action quality assessment. The weights are learned based on the loss of the segmented prediction errors and used to balance the significance of segmented features. We evaluate the proposed method on diving and gym-vault action of the benchmark AQA-7 dataset. The experimental results show that the proposed attention-aware feature training method is more effective than temporal aggregation and existing temporal relationship learning methods. Furthermore, only using the distance loss between the predicated score and the ground-truth score, without considering the ranking loss of different videos for training, this paper has achieved the state-of-the-art performance on both of the spearman rank correlation and mean Euclidean distance of the predicted scores against the judge’s scores. |
Year | DOI | Venue |
---|---|---|
2021 | 10.1007/s11760-021-01890-w | Signal, Image and Video Processing |
Keywords | DocType | Volume |
Attention learning, Temporal weighted video representation, Deep neural network, Action quality assessment | Journal | 15 |
Issue | ISSN | Citations |
7 | 1863-1703 | 0 |
PageRank | References | Authors |
0.34 | 2 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Qing Lei | 1 | 0 | 0.34 |
Hongbo Zhang | 2 | 14 | 5.68 |
Ji-Xiang Du | 3 | 596 | 41.42 |