Evaluation Of Speech-To-Gesture Generation Using Bi-Directional Lstm Network - Citegraph

Paper Info

Title
Evaluation Of Speech-To-Gesture Generation Using Bi-Directional Lstm Network

Abstract
We present a novel framework to automatically generate natural gesture motions accompanying speech from audio utterances. Based on a Bi-Directional LSTM Network, our deep network learns speech-gesture relationships with both backward and forward consistencies over a long period of time. Our network regresses a full 3D skeletal pose of a human from perceptual features extracted from the input audio in each time step. Then, we apply combined temporal filters to smooth out the generated pose sequences. We utilize a speech-gesture dataset recorded with a headset and marker-based motion capture to train our network. We validated our approach with a subjective evaluation and compared it against "original" human gestures and "mismatched" human gestures taken from a different utterance. The evaluation result shows that our generated gestures are significantly better than the "mismatched" gestures with respect to time consistency. The generated gesture also shows marginally significant improvement in terms of semantic consistency when compared to "mismatched" gestures.

Year	DOI	Venue
2018	10.1145/3267851.3267878	18TH ACM INTERNATIONAL CONFERENCE ON INTELLIGENT VIRTUAL AGENTS (IVA'18)
Keywords	Field	DocType
gesture generation, deep learning, neural networks, long short-term memory	Headset,Motion capture,Computer science,Gesture,Time consistency,Utterance,Speech recognition,Artificial intelligence,Deep learning,Artificial neural network,Perception,Multimedia	Conference
Citations	PageRank	References
6	0.44	14
Authors
5

Authors (5 rows)

Cited by (6 rows)

References (14 rows)

Name	Order	Citations	PageRank
Dai Hasegawa	1	26	7.62
Naoshi Kaneko	2	12	2.23
Shinichi Shirakawa	3	83	11.70
Hiroshi Sakuta	4	18	6.18
Kazuhiko Sumi	5	192	24.84

1