Effect Of Data Reduction On Sequence-To-Sequence Neural Tts - Citegraph

Paper Info

Title
Effect Of Data Reduction On Sequence-To-Sequence Neural Tts

Abstract
Recent speech synthesis systems based on sampling from autoregressive neural network models can generate speech almost indistinguishable from human recordings. However, these models require large amounts of data. This paper shows that the lack of data from one speaker can be compensated with data from other speakers. The naturalness of Tacotron2-like models trained on a blend of 5k utterances from 7 speakers is better than or equivalent to that of speaker dependent models trained on 15k utterances. Additionally, in terms of stability multispeaker models are always more stable. We also demonstrate that models mixing only 1250 utterances from a target speaker with 5k utterances from another 6 speakers can produce significantly better quality than state-of-the-art DNN-guided unit selection systems trained on more than 10 times the data from the target speaker.

Year	DOI	Venue
2018	10.1109/icassp.2019.8682168	2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)
Keywords	Field	DocType
statistical parametric speech synthesis, autoregressive, neural vocoder, generative models, sequence-to-sequence	Autoregressive model,Speech synthesis,Computer science,Naturalness,Sampling (statistics),Natural language processing,Artificial intelligence,Artificial neural network,Data reduction	Journal
Volume	ISSN	Citations
abs/1811.06315	1520-6149	0
PageRank	References	Authors
0.34	11	7

Authors (7 rows)

Cited by (0 rows)

References (11 rows)

Name	Order	Citations	PageRank
Javier Latorre	1	61	5.09
Jakub Lachowicz	2	0	0.34
Jaime Lorenzo-Trueba	3	46	9.26
Thomas Merritt	4	18	5.81
Thomas Drugman	5	526	41.79
Srikanth Ronanki	6	0	0.68
Viacheslav Klimkov	7	5	3.19

1