Title
Approaches to Improving Recognition of Underrepresented Named Entities in Hybrid ASR Systems
Abstract
In this paper, we present a series of complementary approaches to improve the recognition of underrepresented named entities (NE) in hybrid ASR systems without compromising overall word error rate performance. The underrepresented words correspond to rare or out-of-vocabulary (OOV) words in the training data, and thereby can't be modeled reliably. We begin with graphemic lexicon which allows to drop the necessity of phonetic models in hybrid ASR. We study it under different settings and demonstrate its effectiveness in dealing with underrepresented NEs. Next, we study the impact of neural language model (LM) with letter-based features derived to handle infrequent words. After that, we attempt to enrich representations of underrepresented NEs in pretrained neural LM by borrowing the embedding representations of rich-represented words. This let us gain significant performance improvement on underrepresented NE recognition. Finally, we boost the likelihood scores of utterances containing NEs in the word lattices rescored by neural LMs and gain further performance improvement. The combination of the aforementioned approaches improves NE recognition by up to 42% relatively.
Year
DOI
Venue
2021
10.1109/ISCSLP49672.2021.9362062
2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP)
Keywords
DocType
ISBN
speech recognition,named entity recognition,graphemic lexicon,word lattice,word embeddings
Conference
978-1-7281-6995-8
Citations 
PageRank 
References 
0
0.34
0
Authors
6
Name
Order
Citations
PageRank
Mao Tingzhi100.34
Yerbolat Khassanov233.79
Van Tung Pham3408.42
Haihua Xu45511.41
Hao Huang5589104.49
Eng Siong Chng6970106.33