Deep Speech: Scaling up end-to-end speech recognition. - Citegraph

Paper Info

Title
Deep Speech: Scaling up end-to-end speech recognition.

Abstract
We present a state-of-the-art speech recognition system developed using end-to-end deep learning. Our architecture is significantly simpler than traditional speech systems, which rely on laboriously engineered processing pipelines; these traditional systems also tend to perform poorly when used in noisy environments. In contrast, our system does not need hand-designed components to model background noise, reverberation, or speaker variation, but instead directly learns a function that is robust to such effects. We do not need a phoneme dictionary, nor even the concept of a "phoneme." Key to our approach is a well-optimized RNN training system that uses multiple GPUs, as well as a set of novel data synthesis techniques that allow us to efficiently obtain a large amount of varied data for training. Our system, called Deep Speech, outperforms previously published results on the widely studied Switchboard Hub5'00, achieving 16.0% error on the full test set. Deep Speech also handles challenging noisy environments better than widely used, state-of-the-art commercial speech systems.

Year	Venue	DocType
2014	CoRR	Journal
Volume	Citations	PageRank
abs/1412.5567	185	8.06
References	Authors
21	11

Search Limit

100185

Authors (11 rows)

Cited by (100 rows)

References (21 rows)

Name	Order	Citations	PageRank
Awni Y. Hannun	1	517	27.54
Carl Case	2	437	16.75
Jared Casper	3	824	34.12
Bryan C. Catanzaro	4	1191	75.56
Gregory Frederick Diamos	5	1117	51.07
Erich Elsen	6	185	10.42
Ryan J. Prenger	7	486	20.61
Sanjeev Satheesh	8	5591	233.55
Shubho Sengupta	9	505	19.84
Adam Coates	10	2493	160.95
Andrew Y. Ng	11	26065	1987.54

1