Title
EditSinger: Zero-Shot Text-Based Singing Voice Editing System with Diverse Prosody Modeling.
Abstract
Zero-shot text-based singing editing enables singing voice modification based on the given edited lyrics without any additional data from the target singer. However, due to the different demands, challenges occur when applying existing speech editing methods to singing voice editing task, mainly including the lack of systematic consideration concerning prosody in insertion and deletion, as well as the trade-off between the naturalness of pronunciation and the preservation of prosody in replacement. In this paper we propose EditSinger, which is a novel singing voice editing model with specially designed diverse prosody modules to overcome the challenges above. Specifically, 1) a general masked variance adaptor is introduced for the comprehensive prosody modeling of the inserted lyrics and the transition of deletion boundary; and 2) we further design a fusion pitch predictor for replacement. By disentangling the reference pitch and fusing the predicted pronunciation, the edited pitch can be reconstructed, which could ensure a natural pronunciation while preserving the prosody of the original audio. In addition, to the best of our knowledge, it is the first zero-shot text-based singing voice editing system. Our experiments conducted on the OpenSinger prove that EditSinger can synthesize high-quality edited singing voices with natural prosody according to the corresponding operations.
Year
DOI
Venue
2022
10.24963/ijcai.2022/625
European Conference on Artificial Intelligence
Keywords
DocType
Citations 
Natural Language Processing: Speech,Natural Language Processing: Applications
Conference
0
PageRank 
References 
Authors
0.34
0
4
Name
Order
Citations
PageRank
Lichao Zhang100.34
Zhou Zhao277390.87
Yi Ren357.55
Liqun Deng401.01