Prosody-aware subword embedding considering Japanese intonation systems and its application to DNN-based multi-dialect speech synthesis - Citegraph

Paper Info

Title
Prosody-aware subword embedding considering Japanese intonation systems and its application to DNN-based multi-dialect speech synthesis

Abstract
This paper presents prosody-aware subword embedding considering Japanese intonation systems and its application to DNN (deep neural network)-based multi-dialect speech synthesis. In accordance with recent improvements of speech synthesis in rich-resourced languages, the research trend is shifting to more challenging languages such as Japanese dialects that still have undefined prosodic contexts. Conventional prosody-aware word embedding can unsupervisedly extract the contexts in a data-driven manner using words and <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$F_{0}$</tex> sequences. However, accurate contexts for unknown words are difficult to generate. To solve this problem, we propose prosody-aware subword embedding considering Japanese intonation systems. The unsupervised subword model, which is trained considering language and acoustic characteristics, can tokenize an unknown word into known subwords suitable for prosody-aware embedding. We also propose a modulation filtering method considering intra-subword moras to improve the embedding accuracies. We apply the methods to not only Japanese but also Japanese multi-dialect speech synthesis. In the multi-dialect case, we propose subword models shared among dialects and embedding models conditioned by dialect information. The experimental evaluation demonstrates that the proposed multi-dialect methods can improve speech quality in some Japanese dialects.

Year	DOI	Venue
2018	10.23919/APSIPA.2018.8659465	2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)
Keywords	Field	DocType
Speech synthesis,Modulation,Training,Context modeling,Training data,Feature extraction,Data models	Prosody,Data modeling,Speech synthesis,Embedding,Computer science,Feature extraction,Speech recognition,Context model,Word embedding,Artificial neural network	Conference
ISSN	ISBN	Citations
2309-9402	978-9-8814-7685-2	0
PageRank	References	Authors
0.34	0	3

Authors (3 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Takanori Akiyama	1	0	0.68
Shinnosuke Takamichi	2	75	22.08
Saruwatari, H.	3	652	90.81

1