Title | ||
---|---|---|
9.8 A 25mm<sup>2</sup> SoC for IoT Devices with 18ms Noise-Robust Speech-to-Text Latency via Bayesian Speech Denoising and Attention-Based Sequence-to-Sequence DNN Speech Recognition in 16nm FinFET |
Abstract | ||
---|---|---|
Automatic speech recognition (ASR) using deep learning is essential for user interfaces on IoT devices. However, previously published ASR chips [4-7] do not consider realistic operating conditions, which are typically noisy and may include more than one speaker. Furthermore, several of these works have implemented only small-vocabulary tasks, such as keyword-spotting (KWS), where context-blind deep neural network (DNN) algorithms are adequate. However, for large-vocabulary tasks (e.g., >100k words), the more complex bidirectional RNNs with an attention mechanism [1] provide context learning in long sequences, which improve ASR accuracy by up to 62% on the 200kwords LibriSpeech dataset, compared to a simpler unidirectional RNN (Fig. 9.8.1). Attention-based networks emphasize the most relevant parts of the source sequence during each decoding time step. In doing so, the encoder sequence is treated as a soft-addressable memory whose positions are weighted based on the state of the decoder RNN. Bidirectional RNNs learn past and future temporal information by concatenating forward and backward time steps. |
Year | DOI | Venue |
---|---|---|
2021 | 10.1109/ISSCC42613.2021.9366062 | 2021 IEEE International Solid- State Circuits Conference (ISSCC) |
Keywords | DocType | Volume |
SoC,IoT devices,bayesian speech denoising,sequence-to-sequence DNN speech recognition,FinFET,automatic speech recognition,deep learning,user interfaces,ASR chips,realistic operating conditions,small-vocabulary tasks,large-vocabulary tasks,complex bidirectional RNNs,attention mechanism,context learning,long sequences,ASR accuracy,200kwords LibriSpeech dataset,attention-based networks,source sequence,encoder sequence,context-blind deep neural network,noise-robust speech-to-text latency,bidirectional RNN,decoder RNN,soft-addressable memory,time 18.0 ms,size 16.0 nm | Conference | 64 |
ISSN | ISBN | Citations |
0193-6530 | 978-1-7281-9550-6 | 3 |
PageRank | References | Authors |
0.43 | 0 | 10 |
Name | Order | Citations | PageRank |
---|---|---|---|
Thierry Tambe | 1 | 18 | 3.43 |
En-Yu Yang | 2 | 10 | 2.31 |
Glenn G. Ko | 3 | 10 | 3.30 |
Yuji Chai | 4 | 5 | 2.16 |
Coleman Hooper | 5 | 7 | 1.17 |
Marco Donato | 6 | 31 | 5.83 |
Paul N. Whatmough | 7 | 147 | 20.59 |
Alexander M. Rush | 8 | 1499 | 67.53 |
David Brooks | 9 | 5518 | 422.08 |
Gu-Yeon Wei | 10 | 1927 | 214.15 |