Title
Towards Fine-Grained Prosody Control for Voice Conversion
Abstract
In a typical voice conversion system, previous works utilized various acoustic features (such as the pitch, voiced/unvoiced flag and aperiodicity) of the source speech to control the prosody of converted speech. However, prosody is related with many factors, such as the intonation, stress and rhythm. It is a challenging task to perfectly describe prosody through hand-crafted acoustic features. To address these difficulties, we propose to use prosody embeddings to describe prosody. These embeddings are learned from the source speech in an unsupervised manner. To verify the effectiveness of our proposed method, we conduct experiments on our Mandarin corpus. Experimental results show that our proposed method can improve the speech quality and speaker similarity of the converted speech. What's more, we observe that our method can even achieve promising results in singing conditions.
Year
DOI
Venue
2021
10.1109/ISCSLP49672.2021.9362110
2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP)
Keywords
DocType
ISBN
voice conversion (VC),phonetic posteriorgrams (PPGs),prosody embeddings,LPCNet vocoder
Conference
978-1-7281-6995-8
Citations 
PageRank 
References 
0
0.34
0
Authors
5
Name
Order
Citations
PageRank
Zheng Lian1128.33
Rongxiu Zhong200.34
Zhengqi Wen341.44
Bin Liu41599161.90
Jianhua Tao5848138.00