Abstract | ||
---|---|---|
End-to-end (E2E) models, which directly predict output character sequences given input speech, are good candidates for on-device speech recognition. E2E models, however, present numerous challenges: In order to be truly useful, such models must decode speech utterances in a streaming fashion, in real time; they must be robust to the long tail of use cases; they must be able to leverage user-specific context (e.g., contact lists); and above all, they must be extremely accurate. In this work, we describe our efforts at building an E2E speech recognizer using a recurrent neural network transducer. In experimental evaluations, we find that the proposed approach can outperform a conventional CTC-based model in terms of both latency and accuracy in a number of evaluation categories. |
Year | DOI | Venue |
---|---|---|
2018 | 10.1109/icassp.2019.8682336 | 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) |
Field | DocType | Volume |
Transducer,Use case,End-to-end principle,Latency (engineering),Computer science,Recurrent neural network,Speech recognition,Mobile device,Artificial intelligence,Long tail,Machine learning | Journal | abs/1811.06621 |
ISSN | Citations | PageRank |
1520-6149 | 2 | 0.39 |
References | Authors | |
13 | 20 |
Name | Order | Citations | PageRank |
---|---|---|---|
Yanzhang He | 1 | 64 | 16.36 |
Tara N. Sainath | 2 | 3497 | 232.43 |
Rohit Prabhavalkar | 3 | 163 | 22.56 |
Ian McGraw | 4 | 253 | 24.41 |
Raziel Álvarez | 5 | 30 | 3.84 |
Ding Zhao | 6 | 110 | 27.07 |
David Rybach | 7 | 188 | 20.31 |
Anjuli Kannan | 8 | 90 | 7.17 |
Yonghui Wu | 9 | 1065 | 72.78 |
Ruoming Pang | 10 | 1092 | 92.99 |
Qiao Liang | 11 | 77 | 19.86 |
Deepti Bhatia | 12 | 3 | 0.74 |
Yuan Shangguan | 13 | 2 | 0.39 |
Bo Li | 14 | 206 | 42.46 |
Golan Pundak | 15 | 36 | 3.91 |
Khe Chai Sim | 16 | 300 | 31.13 |
Tom Bagby | 17 | 2 | 2.08 |
Shuo-Yiin Chang | 18 | 27 | 4.71 |
Kanishka Rao | 19 | 189 | 11.94 |
Alexander Gruenstein | 20 | 216 | 23.52 |