Multi-Modal Audio, Video and Physiological Sensor Learning for Continuous Emotion Prediction. - Citegraph

Paper Info

Title
Multi-Modal Audio, Video and Physiological Sensor Learning for Continuous Emotion Prediction.

Abstract
The automatic determination of emotional state from multimedia content is an inherently challenging problem with a broad range of applications including biomedical diagnostics, multimedia retrieval, and human computer interfaces. The Audio Video Emotion Challenge (AVEC) 2016 provides a well-defined framework for developing and rigorously evaluating innovative approaches for estimating the arousal and valence states of emotion as a function of time. It presents the opportunity for investigating multimodal solutions that include audio, video, and physiological sensor signals. This paper provides an overview of our AVEC Emotion Challenge system, which uses multi-feature learning and fusion across all available modalities. It includes a number of technical contributions, including the development of novel high- and low-level features for modeling emotion in the audio, video, and physiological channels. Low-level features include modeling arousal in audio with minimal prosodic-based descriptors. High-level features are derived from supervised and unsupervised machine learning approaches based on sparse coding and deep learning. Finally, a state space estimation approach is applied for score fusion that demonstrates the importance of exploiting the time-series nature of the arousal and valence states. The resulting system outperforms the baseline systems [10] on the test evaluation set with an achieved Concordant Correlation Coefficient (CCC) for arousal of 0.770 vs 0.702 (baseline) and for valence of 0.687 vs 0.638. Future work will focus on exploiting the time-varying nature of individual channels in the multi-modal framework.

Year	DOI	Venue
2016	10.1145/2988257.2988264	AVEC@ACM Multimedia
Keywords	Field	DocType
Affective Computing, Emotion Recognition, Speech, Deep Learning, CNN, Sparse Coding, Facial Expression, Challenge	Computer science,Unsupervised learning,Artificial intelligence,Deep learning,Arousal,Computer vision,Neural coding,Speech recognition,Facial expression,Affective computing,State space,Machine learning,Modal	Conference
Citations	PageRank	References
18	0.62	7
Authors
7

Authors (7 rows)

Cited by (18 rows)

References (7 rows)

Name	Order	Citations	PageRank
Kevin Brady	1	156	21.40
Youngjune Gwon	2	279	27.58
Pooya Khorrami	3	118	6.27
Elizabeth Godoy	4	83	4.96
William M. Campbell	5	799	70.38
Charlie K. Dagli	6	140	8.44
Thomas S. Huang	7	27815	2618.42

1