Sample Efficient Adaptive Text-to-Speech. - Citegraph

Paper Info

Title
Sample Efficient Adaptive Text-to-Speech.

Abstract
We present a meta-learning approach for adaptive text-to-speech (TTS) with few data. During training, we learn a multi-speaker model using a shared conditional WaveNet core and independent learned embeddings for each speaker. The aim of training is not to produce a neural network with fixed weights, which is then deployed as a TTS system. Instead, the aim is to produce a network that requires few data at deployment time to rapidly adapt to new speakers. We introduce and benchmark three strategies: (i) learning the speaker embedding while keeping the WaveNet core fixed, (ii) fine-tuning the entire architecture with stochastic gradient descent, and (iii) predicting the speaker embedding with a trained neural network encoder. The experiments show that these approaches are successful at adapting the multi-speaker neural network to new speakers, obtaining state-of-the-art results in both sample naturalness and voice similarity with merely a few minutes of audio data from new speakers.

Year	Venue	Field
2018	international conference on learning representations	Stochastic gradient descent,Architecture,Speech synthesis,Software deployment,Embedding,Naturalness,Speech recognition,Encoder,Artificial intelligence,Artificial neural network,Mathematics,Machine learning
DocType	Volume	Citations
Journal	abs/1809.10460	2
PageRank	References	Authors
0.41	31	14

Authors (14 rows)

Cited by (2 rows)

References (31 rows)

Name	Order	Citations	PageRank
Yutian Chen	1	680	36.28
Yannis M. Assael	2	129	6.51
Brendan Shillingford	3	14	2.73
David Budden	4	167	18.45
s reed	5	1750	80.25
Heiga Zen	6	1922	103.73
Quan Wang	7	115	20.15
Luis C. Cobo	8	2	0.74
andrew trask	9	26	2.54
Ben Laurie	10	10	2.89
Çaglar Gülçehre	11	3010	133.22
Aäron Van Den Oord	12	1585	64.43
Oriol Vinyals	13	9419	418.45
Nando De Freitas	14	3284	273.68

1