Attention Based On-Device Streaming Speech Recognition with Large Speech Corpus. - Citegraph

Paper Info

Title
Attention Based On-Device Streaming Speech Recognition with Large Speech Corpus.

Abstract
In this paper, we present a new on-device automatic speech recognition (ASR) system based on monotonic chunk-wise attention (MoChA) models trained with large (u003e 10K hours) corpus. We attained around 90% of a word recognition rate for general domain mainly by using joint training of connectionist temporal classifier (CTC) and cross entropy (CE) losses, minimum word error rate (MWER) training, layer-wise pretraining and data augmentation methods. In addition, we compressed our models by more than 3.4 times smaller using an iterative hyper low-rank approximation (LRA) method while minimizing the degradation in recognition accuracy. The memory footprint was further reduced with 8-bit quantization to bring down the final model size to lower than 39 MB. For on-demand adaptation, we fused the MoChA models with statistical n-gram models, and we could achieve a relatively 36% improvement on average in word error rate (WER) for target domains including the general domain.

Year	DOI	Venue
2019	10.1109/ASRU46091.2019.9004027	ASRU
Field	DocType	Citations
Cross entropy,Speech corpus,Computer science,Word recognition,Word error rate,Speech recognition,Memory footprint,Quantization (signal processing),Classifier (linguistics),Connectionism	Conference	1
PageRank	References	Authors
0.37	0	13

Authors (13 rows)

Cited by (1 rows)

References (0 rows)

Name	Order	Citations	PageRank
Kwangyoun Kim	1	2	4.11
Seokyeong Jung	2	1	0.37
Jungin Lee	3	1	0.37
Myoungji Han	4	11	1.50
Chanwoo Kim	5	1	0.37
Kyungmin Lee	6	2	3.09
Dhananjaya Gowda	7	3	5.47
Junmo Park	8	1	0.37
Sungsoo Kim	9	1	0.37
Sichen Jin	10	1	0.37
Young-Yoon Lee	11	1	0.37
Jinsu Yeo	12	1	0.37
Daehyun Kim	13	1	0.37

1