Voice Conversion Using Input-To-Output Highway Networks - Citegraph

Paper Info

Title
Voice Conversion Using Input-To-Output Highway Networks

Abstract
This paper proposes Deep Neural Network (DNN)-based Voice Conversion (VC) using input-to-output highway networks. VC is a speech synthesis technique that converts input features into output speech parameters, and DNN-based acoustic models for VC are used to estimate the output speech parameters from the input speech parameters. Given that the input and output are often in the same domain (e.g., cepstrum) in VC, this paper proposes a VC using highway networks connected from the input to output. The acoustic models predict the weighted spectral differentials between the input and output spectral parameters. The architecture not only alleviates over-smoothing effects that degrade speech quality, but also effectively represents the characteristics of spectral parameters. The experimental results demonstrate that the proposed architecture outperforms Feed-Forward neural networks in terms of the speech quality and speaker individuality of the converted speech.

Year	DOI	Venue
2017	10.1587/transinf.2017EDL8034	IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS
Keywords	Field	DocType
statistical parametric speech synthesis, DNN-based voice conversion, highway networks, over-smoothing	Computer vision,Computer science,Speech recognition,Artificial intelligence	Journal
Volume	Issue	ISSN
E100D	8	1745-1361
Citations	PageRank	References
4	0.47	9
Authors
3

Authors (3 rows)

Cited by (4 rows)

References (9 rows)

Name	Order	Citations	PageRank
Saito, Yuki	1	26	7.87
Shinnosuke Takamichi	2	75	22.08
Saruwatari, H.	3	652	90.81

1