Title
A Two-Stage Spatiotemporal Attention Convolution Network For Continuous Dimensional Emotion Recognition From Facial Video
Abstract
Continuous dimensional emotion recognition for facial video sequence is a crucial and challenging task in Affective Computing and Human-Computer Intelligent Interaction. The key of this task is to effectively extract and discriminate spatial-temporal features in a more fine-grained way. In this paper, a Two-Stage Spatiotemporal Attention Temporal Convolution Network (TS-SATCN) is designed for continuous dimensional emotion recognition of facial videos. The first stage generates an initial recognition result that is later fed into the second for correction. In each stage, the introduced spatiotemporal attention branch helps the network learn different attention levels and focuses on the informative spatial-temporal features adaptively. The network is trained by a proposed smooth loss function which can further improve the predictions' quality. Extensive experiments are performed on two datasets, RECOLA and AFEW-VA, which shows that the proposed method achieves significant improvement over state-of-the-art methods.
Year
DOI
Venue
2021
10.1109/LSP.2021.3063609
IEEE SIGNAL PROCESSING LETTERS
Keywords
DocType
Volume
Feature extraction, Convolution, Emotion recognition, Spatiotemporal phenomena, Faces, Task analysis, Stacking, Continuous emotion recognition, spatiotemporal attention, TCN
Journal
28
ISSN
Citations 
PageRank 
1070-9908
0
0.34
References 
Authors
0
5
Name
Order
Citations
PageRank
Min Hu13112.64
Qian Chu200.34
Xiaohua Wang312.12
Lei He4214.75
Fuji Ren5803135.33