Title
Improving Speech Related Facial Action Unit Recognition by Audiovisual Information Fusion.
Abstract
It is challenging to recognize facial action unit (AU) from spontaneous facial displays, especially when they are accompanied by speech. The major reason is that the information is extracted from a single source, i.e., the visual channel, in the current practice. However, facial activity is highly correlated with voice in natural human communications. Instead of solely improving visual observations, this paper presents a novel audiovisual fusion framework, which makes the best use of visual and acoustic cues in recognizing speech-related facial AUs. In particular, a dynamic Bayesian network is employed to explicitly model the semantic and dynamic physiological relationships between AUs and phonemes as well as measurement uncertainty. Experiments on a pilot audiovisual AU-coded database have demonstrated that the proposed framework significantly outperforms the state-of-the-art visual-based methods in terms of recognizing speech-related AUs, especially for those AUs whose visual observations are impaired during speech, and more importantly is also superior to audio-based methods and feature-level fusion methods, which employ low-level audio features, by explicitly modeling and exploiting physiological relationships between AUs and phonemes.
Year
DOI
Venue
2019
10.1109/TCYB.2018.2840090
IEEE transactions on cybernetics
Keywords
DocType
Volume
Gold,Visualization,Face recognition,Speech recognition,Feature extraction,Physiology,Semantics
Journal
abs/1706.10197
Issue
ISSN
Citations 
9
2168-2267
4
PageRank 
References 
Authors
0.42
38
4
Name
Order
Citations
PageRank
Zibo Meng124813.60
Shizhong Han22449.80
Ping Liu335916.70
Yan Tong4142.74