Integrating articulatory features into HMM-based parametric speech synthesis - Citegraph

Paper Info

Title
Integrating articulatory features into HMM-based parametric speech synthesis

Abstract
This paper presents an investigation into ways of integrating articulatory features into hidden Markov model (HMM)-based parametric speech synthesis. In broad terms, this may be achieved by estimating the joint distribution of acoustic and articulatory features during training. This may in turn be used in conjunction with a maximum-likelihood criterion to produce acoustic synthesis parameters for generating speech. Within this broad approach, we explore several variations that are possible in the construction of an HMM-based synthesis system which allow articulatory features to influence acoustic modeling: model clustering, state synchrony and cross-stream feature dependency. Performance is evaluated using the RMS error of generated acoustic parameters as well as formal listening tests. Our results show that the accuracy of acoustic parameter prediction and the naturalness of synthesized speech can be improved when shared clustering and asynchronous-state model structures are adopted for combined acoustic and articulatory features. Most significantly, however, our experiments demonstrate that modeling the dependency between these two feature streams can make speech synthesis systems more flexible. The characteristics of synthetic speech can be easily controlled by modifying generated articulatory features as part of the process of producing acoustic synthesis parameters.

Year	DOI	Venue
2009	10.1109/TASL.2009.2014796	IEEE Transactions on Audio, Speech & Language Processing
Keywords	Field	DocType
hmm-based parametric speech synthesis,speech synthesis system,synthesized speech,acoustic parameter prediction,integrating articulatory features,acoustic synthesis parameter,parametric speech synthesis,articulatory feature,acoustic modeling,hmm-based synthesis system,combined acoustic,acoustic parameter,maximum likelihood,speech processing,acoustics,process control,hidden markov model,speech production,feature extraction,speech synthesis,construction industry,hidden markov models,maximum likelihood estimation,predictive models,speech	Speech processing,Speech synthesis,Pattern recognition,Computer science,Naturalness,Feature extraction,Speech recognition,Parametric statistics,Artificial intelligence,Cluster analysis,Hidden Markov model,Speech production	Journal
Volume	Issue	ISSN
17	6	1558-7916
Citations	PageRank	References
46	2.20	25
Authors
4

Authors (4 rows)

Cited by (46 rows)

References (25 rows)

Name	Order	Citations	PageRank
Zhen-Hua Ling	1	850	83.08
Korin Richmond	2	531	46.14
junichi yamagishi	3	1906	145.51
Ren-Hua Wang	4	344	41.36

1