Title
Streaming End-To-End Speech Recognition For Mobile Devices
Abstract
End-to-end (E2E) models, which directly predict output character sequences given input speech, are good candidates for on-device speech recognition. E2E models, however, present numerous challenges: In order to be truly useful, such models must decode speech utterances in a streaming fashion, in real time; they must be robust to the long tail of use cases; they must be able to leverage user-specific context (e.g., contact lists); and above all, they must be extremely accurate. In this work, we describe our efforts at building an E2E speech recognizer using a recurrent neural network transducer. In experimental evaluations, we find that the proposed approach can outperform a conventional CTC-based model in terms of both latency and accuracy in a number of evaluation categories.
Year
DOI
Venue
2018
10.1109/icassp.2019.8682336
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)
Field
DocType
Volume
Transducer,Use case,End-to-end principle,Latency (engineering),Computer science,Recurrent neural network,Speech recognition,Mobile device,Artificial intelligence,Long tail,Machine learning
Journal
abs/1811.06621
ISSN
Citations 
PageRank 
1520-6149
2
0.39
References 
Authors
13
20
Name
Order
Citations
PageRank
Yanzhang He16416.36
Tara N. Sainath23497232.43
Rohit Prabhavalkar316322.56
Ian McGraw425324.41
Raziel Álvarez5303.84
Ding Zhao611027.07
David Rybach718820.31
Anjuli Kannan8907.17
Yonghui Wu9106572.78
Ruoming Pang10109292.99
Qiao Liang117719.86
Deepti Bhatia1230.74
Yuan Shangguan1320.39
Bo Li1420642.46
Golan Pundak15363.91
Khe Chai Sim1630031.13
Tom Bagby1722.08
Shuo-Yiin Chang18274.71
Kanishka Rao1918911.94
Alexander Gruenstein2021623.52