A Controllable Multi-Lingual Multi-Speaker Multi-Style Text-to-Speech Synthesis With Multivariate Information Minimization - Citegraph

Paper Info

Title
A Controllable Multi-Lingual Multi-Speaker Multi-Style Text-to-Speech Synthesis With Multivariate Information Minimization

Abstract
In this letter, we propose a multivariate information minimization method that disentangles three or more latent representations. We show that control factors can be disentangled by minimizing interactive dependency, which can be expressed as a sum of mutual information upper bound terms. Since the upper bound estimate converges from the early training stage, there is little performance degradation due to auxiliary loss. The proposed technique is applied to train a text-to-speech synthesizer with multi-lingual, multi-speaker, and multi-style corpora. Subjective listening tests validate that the proposed method can improve the synthesizer in terms of quality as well as controllability.

Year	DOI	Venue
2022	10.1109/LSP.2021.3125259	IEEE Signal Processing Letters
Keywords	DocType	Volume
Disentanglement,mutual information,speech synthesis,style modeling,total correlation	Journal	29
ISSN	Citations	PageRank
1070-9908	0	0.34
References	Authors
0	5

Authors (5 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Sung Jun Cheon	1	0	1.01
Byoung Jin Choi	2	1	2.06
Minchan Kim	3	0	0.34
Hyeonseung Lee	4	0	0.34
Nam Soo Kim	5	3	4.11

1