Title | ||
---|---|---|
A Two-Stage Spatiotemporal Attention Convolution Network For Continuous Dimensional Emotion Recognition From Facial Video |
Abstract | ||
---|---|---|
Continuous dimensional emotion recognition for facial video sequence is a crucial and challenging task in Affective Computing and Human-Computer Intelligent Interaction. The key of this task is to effectively extract and discriminate spatial-temporal features in a more fine-grained way. In this paper, a Two-Stage Spatiotemporal Attention Temporal Convolution Network (TS-SATCN) is designed for continuous dimensional emotion recognition of facial videos. The first stage generates an initial recognition result that is later fed into the second for correction. In each stage, the introduced spatiotemporal attention branch helps the network learn different attention levels and focuses on the informative spatial-temporal features adaptively. The network is trained by a proposed smooth loss function which can further improve the predictions' quality. Extensive experiments are performed on two datasets, RECOLA and AFEW-VA, which shows that the proposed method achieves significant improvement over state-of-the-art methods. |
Year | DOI | Venue |
---|---|---|
2021 | 10.1109/LSP.2021.3063609 | IEEE SIGNAL PROCESSING LETTERS |
Keywords | DocType | Volume |
Feature extraction, Convolution, Emotion recognition, Spatiotemporal phenomena, Faces, Task analysis, Stacking, Continuous emotion recognition, spatiotemporal attention, TCN | Journal | 28 |
ISSN | Citations | PageRank |
1070-9908 | 0 | 0.34 |
References | Authors | |
0 | 5 |