Abstract | ||
---|---|---|
The problem of keyword spotting i.e. identifying keywords in a real-time audio stream is mainly solved by applying a neural network over successive sliding windows. Due to the difficulty of the task, baseline models are usually large, resulting in a high computational cost and energy consumption level. We propose a new method called SANAS (Stochastic Adaptive Neural Architecture Search) which is able to adapt the architecture of the neural network on-the-fly at inference time such that small architectures will be used when the stream is easy to process (silence, low noise, ... ) and bigger networks will be used when the task becomes more difficult. We show that this adaptive model can be learned end-to-end by optimizing a trade-off between the prediction performance and the average computational cost per unit of time. Experiments on the Speech Commands dataset [1] show that this approach leads to a high recognition level while being much faster (and/or energy saving) than classical approaches where the network architecture is static. |
Year | DOI | Venue |
---|---|---|
2018 | 10.1109/icassp.2019.8683305 | 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) |
Keywords | Field | DocType |
Neural Architecture Search, Keyword Spotting, Deep Learning, Budgeted Learning | Architecture,Inference,Network architecture,Low noise,Keyword spotting,Artificial intelligence,Artificial neural network,Energy consumption,Machine learning,Mathematics | Journal |
Volume | ISSN | Citations |
abs/1811.06753 | 1520-6149 | 0 |
PageRank | References | Authors |
0.34 | 10 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Tom Véniat | 1 | 0 | 0.68 |
Olivier Schwander | 2 | 6 | 1.50 |
Ludovic Denoyer | 3 | 810 | 63.87 |