Title
Evaluation Of Speech-To-Gesture Generation Using Bi-Directional Lstm Network
Abstract
We present a novel framework to automatically generate natural gesture motions accompanying speech from audio utterances. Based on a Bi-Directional LSTM Network, our deep network learns speech-gesture relationships with both backward and forward consistencies over a long period of time. Our network regresses a full 3D skeletal pose of a human from perceptual features extracted from the input audio in each time step. Then, we apply combined temporal filters to smooth out the generated pose sequences. We utilize a speech-gesture dataset recorded with a headset and marker-based motion capture to train our network. We validated our approach with a subjective evaluation and compared it against "original" human gestures and "mismatched" human gestures taken from a different utterance. The evaluation result shows that our generated gestures are significantly better than the "mismatched" gestures with respect to time consistency. The generated gesture also shows marginally significant improvement in terms of semantic consistency when compared to "mismatched" gestures.
Year
DOI
Venue
2018
10.1145/3267851.3267878
18TH ACM INTERNATIONAL CONFERENCE ON INTELLIGENT VIRTUAL AGENTS (IVA'18)
Keywords
Field
DocType
gesture generation, deep learning, neural networks, long short-term memory
Headset,Motion capture,Computer science,Gesture,Time consistency,Utterance,Speech recognition,Artificial intelligence,Deep learning,Artificial neural network,Perception,Multimedia
Conference
Citations 
PageRank 
References 
6
0.44
14
Authors
5
Name
Order
Citations
PageRank
Dai Hasegawa1267.62
Naoshi Kaneko2122.23
Shinichi Shirakawa38311.70
Hiroshi Sakuta4186.18
Kazuhiko Sumi519224.84