Abstract | ||
---|---|---|
In this paper, we present a new on-device automatic speech recognition (ASR) system based on monotonic chunk-wise attention (MoChA) models trained with large (u003e 10K hours) corpus. We attained around 90% of a word recognition rate for general domain mainly by using joint training of connectionist temporal classifier (CTC) and cross entropy (CE) losses, minimum word error rate (MWER) training, layer-wise pretraining and data augmentation methods. In addition, we compressed our models by more than 3.4 times smaller using an iterative hyper low-rank approximation (LRA) method while minimizing the degradation in recognition accuracy. The memory footprint was further reduced with 8-bit quantization to bring down the final model size to lower than 39 MB. For on-demand adaptation, we fused the MoChA models with statistical n-gram models, and we could achieve a relatively 36% improvement on average in word error rate (WER) for target domains including the general domain. |
Year | DOI | Venue |
---|---|---|
2019 | 10.1109/ASRU46091.2019.9004027 | ASRU |
Field | DocType | Citations |
Cross entropy,Speech corpus,Computer science,Word recognition,Word error rate,Speech recognition,Memory footprint,Quantization (signal processing),Classifier (linguistics),Connectionism | Conference | 1 |
PageRank | References | Authors |
0.37 | 0 | 13 |
Name | Order | Citations | PageRank |
---|---|---|---|
Kwangyoun Kim | 1 | 2 | 4.11 |
Seokyeong Jung | 2 | 1 | 0.37 |
Jungin Lee | 3 | 1 | 0.37 |
Myoungji Han | 4 | 11 | 1.50 |
Chanwoo Kim | 5 | 1 | 0.37 |
Kyungmin Lee | 6 | 2 | 3.09 |
Dhananjaya Gowda | 7 | 3 | 5.47 |
Junmo Park | 8 | 1 | 0.37 |
Sungsoo Kim | 9 | 1 | 0.37 |
Sichen Jin | 10 | 1 | 0.37 |
Young-Yoon Lee | 11 | 1 | 0.37 |
Jinsu Yeo | 12 | 1 | 0.37 |
Daehyun Kim | 13 | 1 | 0.37 |