Abstract | ||
---|---|---|
In recent years, action recognition has become a popular and challenging task in computer vision. Nowadays, two-stream networks with appearance stream and motion stream can make judgment jointly and get excellent action classification results. But many of these networks fused the features or scores simply, and the characteristics in different streams were not utilized effectively. Meanwhile, the spatial context and temporal information were not fully utilized and processed in some networks. In this paper, a novel three-stream network spatiotemporal attention enhanced features fusion network for action recognition is proposed. Firstly, features fusion stream which includes multi-level features fusion blocks, is designed to train the two streams jointly and complement the two-stream network. Secondly, we model the channel features obtained by spatial context to enhance the ability to extract useful spatial semantic features at different levels. Thirdly, a temporal attention module which can model the temporal information makes the extracted temporal features more representative. A large number of experiments are performed on UCF101 dataset and HMDB51 dataset, which verify the effectiveness of our proposed network for action recognition. |
Year | DOI | Venue |
---|---|---|
2021 | 10.1007/s13042-020-01204-5 | INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS |
Keywords | DocType | Volume |
Action recognition, Three-stream, Spatiotemporal attention, Features fusion | Journal | 12 |
Issue | ISSN | Citations |
3 | 1868-8071 | 4 |
PageRank | References | Authors |
0.42 | 0 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Danfeng Zhuang | 1 | 4 | 0.42 |
Min Jiang | 2 | 39 | 13.65 |
Jun Kong | 3 | 111 | 18.94 |
Tianshan Liu | 4 | 9 | 4.27 |