Abstract | ||
---|---|---|
In this paper, we conduct a detailed investigation of attention based models for automatic speech recognition (ASR). First, we explore different types of attention, including "online' and "full-sequence" attention. Second, we explore different sub word units to see how much of the end-to-end ASR process can reasonably be captured by an attention model. In experimental evaluations, we find that although attention is typically focused over a small region of the acoustics during each step of next label prediction, "full-sequence" attention outperforms "online" attention, although this gap can be significantly reduced by increasing the length of the segments over which attention is computed. Furthermore, we find that context-independent phonemes arc a reasonable sub-word unit for attention models. When used in the second-pass to rescore N-best hypotheses, these models provide over a 10% relative improvement in word error rate. |
Year | DOI | Venue |
---|---|---|
2017 | 10.21437/Interspeech.2017-232 | 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION |
Field | DocType | ISSN |
Computer science,Speech recognition | Conference | 2308-457X |
Citations | PageRank | References |
5 | 0.44 | 5 |
Authors | ||
5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Rohit Prabhavalkar | 1 | 163 | 22.56 |
Tara N. Sainath | 2 | 3497 | 232.43 |
Bo Li | 3 | 206 | 42.46 |
Kanishka Rao | 4 | 189 | 11.94 |
Navdeep Jaitly | 5 | 2988 | 166.08 |