Title
Demonstration Of An Hmm-Based Photorealistic Expressive Audio-Visual Speech Synthesis System
Abstract
The usage of conversational agents is rapidly increasing in everyday life (cortana, siri, etc.). It has been shown that the inclusion of a talking face, increases the intelligibility of speech and the naturalness of human-computer interaction. Furthermore, an agent capable of expressing emotions has a stronger appeal to the human party and affects the interlocutoru0027s emotional state. The proposed demonstration is a Hidden Markov Model (HMM) based photorealistic audio-visual speech synthesis system, capable of expressing emotions [1, 2]. The system is capable of generating a talking head speaking in three emotions: happiness, anger, and sadness, plus in neutral speaking style. Further capabilities of the system include 1) the usage of HMM interpolation [3] in order to generate speech with mixtures of the original emotions (e.g., both anger and happiness), and speech with different levels of expressiveness (by mixing with the neutral emotion), 2) the usage of HMM adaptation [4], in order to adapt to a target emotion using only a few number of sentences. Equipment In order to showcase our system we will use a laptop and speakers. The system will run fully on the laptop. Demonstration Experience During the demonstration, viewers will have the opportunity to: 1. Watch videos of the talking head speaking in 3 different emotions (plus neutral) and see how the expressive talking head feels more natural compared to the talking head speaking in neutral style. 2. Watch the talking head speaking in two or more emotions at the same time, and see how the weights assigned to each emotion affects the outcome. It will also be of great interest to see which emotion each viewer perceives. In addition, through interpolation with the neutral emotion, viewers will be able to watch the talking head speak in different expressiveness levels for each emotion. 3. See how the neutral talking head can be adapted to speak in another emotion using only a few sentences, and how the number of sentences used affects the expressiveness of the resulting talking head.
Year
Venue
Field
2017
2017 24TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP)
Computer vision,Sadness,Everyday life,Speech synthesis,Computer science,Naturalness,Speech recognition,Anger,Happiness,Artificial intelligence,Hidden Markov model,Intelligibility (communication)
DocType
ISSN
Citations 
Conference
1522-4880
0
PageRank 
References 
Authors
0.34
0
3
Name
Order
Citations
PageRank
Panagiotis Paraskevas Filntisis143.92
Athanasios Katsamanis230122.71
Petros Maragos33733591.97