Clustering Expressive Speech Styles In Audiobooks Using Glottal Source Parameters - Citegraph

Paper Info

Title
Clustering Expressive Speech Styles In Audiobooks Using Glottal Source Parameters

Abstract
A great challenge for text-to-speech synthesis is to produce expressive speech. The main problem is that it is difficult to synthesise high-quality speech using expressive corpora. With the increasing interest in audiobook corpora for speech synthesis, there is a demand to synthesise speech which is rich in prosody, emotions and voice styles. In this work, Self-Organising Feature Maps (SOFM) are used for clustering the speech data using voice quality parameters of the glottal source, in order to map out the variety of voice styles in the corpus. Subjective evaluation showed that this clustering method successfully separated the speech data into groups of utterances associated with different voice characteristics. This work can be applied in unit-selection synthesis by selecting appropriate data sets to synthesise utterances with specific voice styles. It can also be used in parametric speech synthesis to model different voice styles separately.

Year	Venue	Keywords
2011	12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5	expressive speech, voice quality, audiobook, speech synthesis
Field	DocType	Citations
Speech corpus,Prosody,Data set,Voice analysis,Speech synthesis,Computer science,Speech recognition,Parametric statistics,Artificial intelligence,Natural language processing,Cluster analysis	Conference	0
PageRank	References	Authors
0.34	1	4

Authors (4 rows)

Cited by (0 rows)

References (1 rows)

Name	Order	Citations	PageRank
Éva Székely	1	19	4.96
João P. Cabral	2	103	12.77
Peter Cahill	3	8	3.90
Julie Carson-Berndsen	4	75	28.62

1