Title
A multi-scale multi-attention network for dynamic facial expression recognition
Abstract
Characterizing spatial information and modelling temporal dynamics of facial images are key challenges for dynamic facial expression recognition (FER). In this paper, we propose an end-to-end multi-scale multi-attention network (MSMA-Net) for dynamic FER. In our model, the spatio-temporal features are encoded at two scales, i.e. the entire face and local facial patches. For each scale, we adopt a 2D convolutional neural network (CNN) to capture frame-based spatial information, and a 3D CNN to depict the short-term dynamics in the temporal sequence. Moreover, we propose a multi-attention mechanism by considering both spatial and temporal attention models. The temporal attention is applied on the image sequence to highlight expressive frames within the whole sequence, and the spatial attention mechanism is applied at the patch level to learn salient facial features. Comprehensive experiments on publicly available datasets (Aff-Wild2, RML, and AFEW) show that the proposed MSMA-Net model automatically highlights salient expressive frames, within which salient facial features are learned, allowing better or very competitive results compared to state-of-the-art methods.
Year
DOI
Venue
2022
10.1007/s00530-021-00849-8
Multimedia Systems
Keywords
DocType
Volume
Facial expression recognition, Multi-scale multi-attention network (MSMA-Net), Spatial attention, Temporal attention
Journal
28
Issue
ISSN
Citations 
2
0942-4962
0
PageRank 
References 
Authors
0.34
15
5
Name
Order
Citations
PageRank
Xiaohan Xia100.34
Le Yang210613.26
Xiaoyong Wei300.34
Hichem Sahli400.34
Jiang Dongmei511515.28