Low Bit-Rate Speech Coding With Vq-Vae And A Wavenet Decoder - Citegraph

Paper Info

Title
Low Bit-Rate Speech Coding With Vq-Vae And A Wavenet Decoder

Abstract
In order to efficiently transmit and store speech signals, speech codecs create a minimally redundant representation of the input signal which is then decoded at the receiver with the best possible perceptual quality. In this work we demonstrate that a neural network architecture based on VQ-VAE with a WaveNet decoder can be used to perform very low bit-rate speech coding with high reconstruction quality. A prosody-transparent and speaker-independent model trained on the LibriSpeech corpus coding audio at 1.6 kbps exhibits perceptual quality which is around halfway between the MELP codec at 2.4 kbps and AMR-WB codec at 23.05 kbps. In addition, when training on high-quality recorded speech with the test speaker included in the training set, a model coding speech at 1.6 kbps produces output of similar perceptual quality to that generated by AMR-WB at 23.05 kbps.

Year	DOI	Venue
2019	10.1109/icassp.2019.8683277	2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)
Keywords	Field	DocType
Speech coding, low bit-rate, generative models, WaveNet, VQ-VAE	Training set,Low bit rate,Speech coding,Pattern recognition,Computer science,Neural network architecture,Coding (social sciences),Artificial intelligence,Codec	Conference
ISSN	Citations	PageRank
1520-6149	0	0.34
References	Authors
0	7

Authors (7 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Cristina Garbacea	1	4	1.10
Aäron Van Den Oord	2	1585	64.43
Yazhe Li	3	40	1.65
Felicia Lim	4	35	5.70
Alejandro Luebs	5	2	2.05
Oriol Vinyals	6	9419	418.45
Thomas C Walters	7	0	0.68

1