VoiceMixer: Adversarial Voice Style Mixup. - Citegraph

Paper Info

Title
VoiceMixer: Adversarial Voice Style Mixup.

Abstract
Although recent advances in voice conversion have shown significant improvement, there still remains a gap between the converted voice and target voice. A key factor that maintains this gap is the insufficient decomposition of content and voice style from the source speech. This insufficiency leads to the converted speech containing source speech style or losing source speech content. In this paper, we present VoiceMixer which can effectively decompose and transfer voice style through a novel information bottleneck and adversarial feedback. With self-supervised representation learning, the proposed information bottleneck can decompose the content and style with only a small loss of content information. Also, for adversarial feedback of each information, the discriminator is decomposed into content and style discriminator with self-supervision, which enable our model to achieve better generalization to the voice style of the converted speech. The experimental results show the superiority of our model in disentanglement and transfer performance, and improve audio quality by preserving content information.

Year	Venue	DocType
2021	Annual Conference on Neural Information Processing Systems	Conference
Citations	PageRank	References
0	0.34	0
Authors
4

Authors (4 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Sang-Hoon Lee	1	0	0.68
Ji-Hoon Kim	2	0	0.34
Hyunseung Chung	3	0	0.34
Seong-Whan Lee	4	0	0.34

1