Demonstration Of An Hmm-Based Photorealistic Expressive Audio-Visual Speech Synthesis System - Citegraph

Paper Info

Title
Demonstration Of An Hmm-Based Photorealistic Expressive Audio-Visual Speech Synthesis System

Abstract
The usage of conversational agents is rapidly increasing in everyday life (cortana, siri, etc.). It has been shown that the inclusion of a talking face, increases the intelligibility of speech and the naturalness of human-computer interaction. Furthermore, an agent capable of expressing emotions has a stronger appeal to the human party and affects the interlocutoru0027s emotional state. The proposed demonstration is a Hidden Markov Model (HMM) based photorealistic audio-visual speech synthesis system, capable of expressing emotions [1, 2]. The system is capable of generating a talking head speaking in three emotions: happiness, anger, and sadness, plus in neutral speaking style. Further capabilities of the system include 1) the usage of HMM interpolation [3] in order to generate speech with mixtures of the original emotions (e.g., both anger and happiness), and speech with different levels of expressiveness (by mixing with the neutral emotion), 2) the usage of HMM adaptation [4], in order to adapt to a target emotion using only a few number of sentences. Equipment In order to showcase our system we will use a laptop and speakers. The system will run fully on the laptop. Demonstration Experience During the demonstration, viewers will have the opportunity to: 1. Watch videos of the talking head speaking in 3 different emotions (plus neutral) and see how the expressive talking head feels more natural compared to the talking head speaking in neutral style. 2. Watch the talking head speaking in two or more emotions at the same time, and see how the weights assigned to each emotion affects the outcome. It will also be of great interest to see which emotion each viewer perceives. In addition, through interpolation with the neutral emotion, viewers will be able to watch the talking head speak in different expressiveness levels for each emotion. 3. See how the neutral talking head can be adapted to speak in another emotion using only a few sentences, and how the number of sentences used affects the expressiveness of the resulting talking head.

Year	Venue	Field
2017	2017 24TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP)	Computer vision,Sadness,Everyday life,Speech synthesis,Computer science,Naturalness,Speech recognition,Anger,Happiness,Artificial intelligence,Hidden Markov model,Intelligibility (communication)
DocType	ISSN	Citations
Conference	1522-4880	0
PageRank	References	Authors
0.34	0	3

Authors (3 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Panagiotis Paraskevas Filntisis	1	4	3.92
Athanasios Katsamanis	2	301	22.71
Petros Maragos	3	3733	591.97

1