Abstract | ||
---|---|---|
Whereas conventional spoken language understanding (SLU) systems map speech to text, and then text to intent, end-to-end SLU systems map speech directly to intent through a single trainable model. Achieving high accuracy with these end-to-end models without a large amount of training data is difficult. We propose a method to reduce the data requirements of end-to-end SLU in which the model is first pre-trained to predict words and phonemes, thus learning good features for SLU. We introduce a new SLU dataset, Fluent Speech Commands, and show that our method improves performance both when the full dataset is used for training and when only a small subset is used. We also describe preliminary experiments to gauge the modelu0027s ability to generalize to new phrases not heard during training. |
Year | DOI | Venue |
---|---|---|
2019 | 10.21437/interspeech.2019-2396 | Conference of the International Speech Communication Association |
DocType | Volume | Citations |
Journal | abs/1904.03670 | 2 |
PageRank | References | Authors |
0.36 | 0 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Loren Lugosch | 1 | 2 | 0.69 |
Mirco Ravanelli | 2 | 185 | 17.87 |
Patrick Ignoto | 3 | 2 | 0.36 |
Vikrant Singh Tomar | 4 | 20 | 3.44 |
Yoshua Bengio | 5 | 42677 | 3039.83 |