Title
Transformer-S2A: Robust and Efficient Speech-to-Animation
Abstract
We propose a novel robust and efficient Speech-to-Animation (S2A) approach for synchronized facial animation generation in human-computer interaction. Compared with conventional approaches, the proposed approach utilize phonetic posteriorgrams (PPGs) of spoken phonemes as input to ensure the cross-language and cross-speaker ability, and introduce corresponding prosody features (i.e. pitch and energy) to further enhance the expression of generated animation. Mixtureof-experts (MOE)-based Transformer is employed to better model contextual information while provide significant optimization on computation efficiency. Experiments demonstrate the effectiveness of the proposed approach on both objective and subjective evaluation with 17x inference speedup compared with the state-of-the-art approach.
Year
DOI
Venue
2022
10.1109/ICASSP43922.2022.9747495
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
DocType
Citations 
PageRank 
Conference
0
0.34
References 
Authors
0
6
Name
Order
Citations
PageRank
Liyang Chen100.34
Wu Zhiyong211936.98
Jun Ling300.34
Runnan Li400.34
Xu Tan58823.94
Sheng Zhao600.34