Title
An Anthropomorphic Perspective For Audiovisual Speech Synthesis
Abstract
In speech communication, both the auditory and visual streams play an important role, ensuring both a certain level of redundancy (e.g., lip movement) and transmission of complementary information (e.g., to emphasize a word). The common current approach to audiovisual speech synthesis, generally based on data-driven methods, yields good results, but relies on models controlled by parameters that do not relate with how humans do it, being hard to interpret and adding little to our understanding of the human speech production apparatus. Modelling the actual system, adopting an anthropomorphic perspective would provide a myriad of novel research paths. This article proposes a conceptual framework to support research and development of an articulatory-based audiovisual speech synthesis system. The core idea is that the speech production system is modelled to produce articulatory parameters with anthropomorphic meaning (e.g., lip opening) driving the synthesis of both the auditory and visual streams. A first instantiation of the framework for European Portuguese illustrates its viability and constitutes an important tool for research in speech production and the deployment of audiovisual speech synthesis in multimodal interaction scenarios, of the utmost relevance for the current and future complex services and applications.
Year
DOI
Venue
2017
10.5220/0006150201630172
PROCEEDINGS OF THE 10TH INTERNATIONAL JOINT CONFERENCE ON BIOMEDICAL ENGINEERING SYSTEMS AND TECHNOLOGIES, VOL 4: BIOSIGNALS
Keywords
Field
DocType
Audiovisual Speech, Articulatory Synthesis, European Portuguese
Speech synthesis,Computer science,Speech recognition
Conference
Citations 
PageRank 
References 
0
0.34
0
Authors
2
Name
Order
Citations
PageRank
Samuel S. Silva13114.23
António J. S. Teixeira215235.26