Title | ||
---|---|---|
Prosody-aware subword embedding considering Japanese intonation systems and its application to DNN-based multi-dialect speech synthesis |
Abstract | ||
---|---|---|
This paper presents prosody-aware subword embedding considering Japanese intonation systems and its application to DNN (deep neural network)-based multi-dialect speech synthesis. In accordance with recent improvements of speech synthesis in rich-resourced languages, the research trend is shifting to more challenging languages such as Japanese dialects that still have undefined prosodic contexts. Conventional prosody-aware word embedding can unsupervisedly extract the contexts in a data-driven manner using words and
<tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$F_{0}$</tex>
sequences. However, accurate contexts for unknown words are difficult to generate. To solve this problem, we propose prosody-aware subword embedding considering Japanese intonation systems. The unsupervised subword model, which is trained considering language and acoustic characteristics, can tokenize an unknown word into known subwords suitable for prosody-aware embedding. We also propose a modulation filtering method considering intra-subword moras to improve the embedding accuracies. We apply the methods to not only Japanese but also Japanese multi-dialect speech synthesis. In the multi-dialect case, we propose subword models shared among dialects and embedding models conditioned by dialect information. The experimental evaluation demonstrates that the proposed multi-dialect methods can improve speech quality in some Japanese dialects. |
Year | DOI | Venue |
---|---|---|
2018 | 10.23919/APSIPA.2018.8659465 | 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) |
Keywords | Field | DocType |
Speech synthesis,Modulation,Training,Context modeling,Training data,Feature extraction,Data models | Prosody,Data modeling,Speech synthesis,Embedding,Computer science,Feature extraction,Speech recognition,Context model,Word embedding,Artificial neural network | Conference |
ISSN | ISBN | Citations |
2309-9402 | 978-9-8814-7685-2 | 0 |
PageRank | References | Authors |
0.34 | 0 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Takanori Akiyama | 1 | 0 | 0.68 |
Shinnosuke Takamichi | 2 | 75 | 22.08 |
Saruwatari, H. | 3 | 652 | 90.81 |