Title
End-To-End Contextual Speech Recognition Using Class Language Models And A Token Passing Decoder
Abstract
End-to-end modeling ( E2E) of automatic speech recognition ( ASR) blends all the components of a traditional speech recognition system into a single, unified model. Although it simplifies the ASR systems, the unified model is hard to adapt when training and testing data mismatches. In this work, we focus on contextual speech recognition, which is particularly challenging for E2E models because contextual information is only available in inference time. To improve the performance in the presence of contextual information during training, we propose to use class-based language models ( CLM) that can populate context-dependent information during inference. To enable this approach to scale to a large number of class members and minimize search errors, we propose a token passing algorithm with an efficient token recombination for E2E systems. We evaluate the proposed system on general and contextual ASR tasks, and achieve relative 62% Word Error Rate ( WER) reduction for the contextual ASR task without hurting recognition performance for the general ASR task. We also show that the proposed method performs well without modification of the decoding hyper-parameters across tasks, making it a desirable solution for E2E ASR.
Year
Venue
Keywords
2018
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)
End-to-end Speech Recognition, Weighted Finite State Transducer, Token Passing, Class-based Language Model
Field
DocType
Volume
Token passing,Inference,End-to-end principle,Computer science,Word error rate,Speech recognition,Test data,Decoding methods,Security token,Language model
Journal
abs/1812.02142
ISSN
Citations 
PageRank 
1520-6149
2
0.38
References 
Authors
0
5
Name
Order
Citations
PageRank
Zhehuai Chen120.72
Mahaveer Jain2242.93
Yongqiang Wang317513.32
Michael L. Seltzer4102769.42
Christian Fuegen596.58