LITESING: TOWARDS FAST, LIGHTWEIGHT AND EXPRESSIVE SINGING VOICE SYNTHESIS - Citegraph

Paper Info

Title
LITESING: TOWARDS FAST, LIGHTWEIGHT AND EXPRESSIVE SINGING VOICE SYNTHESIS

Abstract
LiteSing proposed in this paper is a high-quality singing voice synthesis (SVS) system, which is fast, lightweight and expressive. This model mainly stacks several non-autoregressive WaveNet blocks in the encoder and decoder under a generative adversarial architecture, predicts full conditions from the musical score, and generates acoustic features from these conditions. The full conditions in this paper consist of dynamic spectrogram energy, voiced/unvoiced (V/UV) decision and dynamic pitch curve, which are proven related to the expressiveness. We predict the pitch and the timbre features separately, avoiding the interdependence between these two features. Instead of neural network vocoders, a parametric WORLD vocoder is employed for the pitch curve consistency. Experiment results show that LiteSing outperforms the baseline model using feed-forward Transformer by 1.386 times faster on inference speed, 15 times smaller on training parameters number, and achieves a similar MOS on sound quality. Through an A/B test, LiteSing achieves 67.3% preference rate over baseline in pitch curve and dynamic spectrogram energy prediction. which demonstrates the advantage of LiteSing over the other compared models.

Year	DOI	Venue
2021	10.1109/ICASSP39728.2021.9414043	2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021)
Keywords	DocType	Citations
singing voice synthesis, non-autoregressive model, generative adversarial network, lightweight, expressive	Conference	0
PageRank	References	Authors
0.34	0	6

Authors (6 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Xiaobin Zhuang	1	0	0.34
Tao Jiang	2	0	0.34
Szu-Yu Chou	3	49	6.82
Bin Wu	4	0	0.34
Peng Hu	5	0	0.34
Simon Lui	6	0	1.01

1