Abstract | ||
---|---|---|
We propose a direct-to-word sequence model which uses a word network to learn word embeddings from letters. The word network can be integrated seamlessly with arbitrary sequence models including Connectionist Temporal Classification and encoder-decoder models with attention. We show our direct-to-word model can achieve word error rate gains over sub-word level models for speech recognition. We also show that our direct-to-word approach retains the ability to predict words not seen at training time without any retraining. Finally, we demonstrate that a word-level model can use a larger stride than a sub-word level model while maintaining accuracy. This makes the model more efficient both for training and inference. |
Year | Venue | DocType |
---|---|---|
2020 | ICML | Conference |
Citations | PageRank | References |
0 | 0.34 | 0 |
Authors | ||
3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Ronan Collobert | 1 | 4002 | 308.61 |
Awni Y. Hannun | 2 | 517 | 27.54 |
Gabriel Synnaeve | 3 | 27 | 7.73 |