Title
Modelling Semantic Context of OOV Words in Large Vocabulary Continuous Speech Recognition.
Abstract
The diachronic nature of broadcast news data leads to the problem of out-of-vocabulary OOV words in large vocabulary continuous speech recognition LVCSR systems. Analysis of OOV words reveals that a majority of them are proper names PNs. However, PNs are important for automatic indexing of audio-video content and for obtaining reliable automatic transcriptions. In this paper, we focus on the problem of OOV PNs in diachronic audio documents. To enable the recovery of the PNs missed by the LVCSR system, relevant OOV PNs are retrieved by exploiting the semantic context of the LVCSR transcriptions. For retrieval of OOV PNs, we explore topic and semantic context derived from latent Dirichlet allocation LDA topic models, continuous word vector representations and the neural bag-of-words NBOW model which is capable of learning task specific word and context representations. We propose a neural bag-of-weighted words NBOW2 model which learns to assign higher weights to words that are important for retrieval of an OOV PN. With experiments on French broadcast news videos, we show that the NBOW and NBOW2 models outperform the methods based on raw embeddings from LDA and Skip-gram models. Combining the NBOW and NBOW2 models gives a faster convergence during training. Second pass speech recognition experiments, in which the LVCSR vocabulary and language model are updated with the retrieved OOV PNs, demonstrate the effectiveness of the proposed context models.
Year
DOI
Venue
2017
10.1109/TASLP.2017.2651361
IEEE/ACM Trans. Audio, Speech & Language Processing
Keywords
Field
DocType
Context,Vocabulary,Context modeling,Speech recognition,Semantics,Training,Computational modeling
Latent Dirichlet allocation,Computer science,Speech recognition,Context model,Natural language processing,Artificial intelligence,Topic model,Vocabulary,Proper noun,Automatic indexing,Semantics,Language model
Journal
Volume
Issue
ISSN
25
3
2329-9290
Citations 
PageRank 
References 
1
0.35
49
Authors
4
Name
Order
Citations
PageRank
imran sheikh1154.01
dominique fohr223949.61
irina illina38520.05
georges linar es413629.55