Improving Speech Related Facial Action Unit Recognition by Audiovisual Information Fusion. - Citegraph

Paper Info

Title
Improving Speech Related Facial Action Unit Recognition by Audiovisual Information Fusion.

Abstract
It is challenging to recognize facial action unit (AU) from spontaneous facial displays, especially when they are accompanied by speech. The major reason is that the information is extracted from a single source, i.e., the visual channel, in the current practice. However, facial activity is highly correlated with voice in natural human communications. Instead of solely improving visual observations, this paper presents a novel audiovisual fusion framework, which makes the best use of visual and acoustic cues in recognizing speech-related facial AUs. In particular, a dynamic Bayesian network is employed to explicitly model the semantic and dynamic physiological relationships between AUs and phonemes as well as measurement uncertainty. Experiments on a pilot audiovisual AU-coded database have demonstrated that the proposed framework significantly outperforms the state-of-the-art visual-based methods in terms of recognizing speech-related AUs, especially for those AUs whose visual observations are impaired during speech, and more importantly is also superior to audio-based methods and feature-level fusion methods, which employ low-level audio features, by explicitly modeling and exploiting physiological relationships between AUs and phonemes.

Year	DOI	Venue
2019	10.1109/TCYB.2018.2840090	IEEE transactions on cybernetics
Keywords	DocType	Volume
Gold,Visualization,Face recognition,Speech recognition,Feature extraction,Physiology,Semantics	Journal	abs/1706.10197
Issue	ISSN	Citations
9	2168-2267	4
PageRank	References	Authors
0.42	38	4

Authors (4 rows)

Cited by (4 rows)

References (38 rows)

Name	Order	Citations	PageRank
Zibo Meng	1	248	13.60
Shizhong Han	2	244	9.80
Ping Liu	3	359	16.70
Yan Tong	4	14	2.74

1