Title
Topic-Sensitive Language Modelling
Abstract
The paper proposes a new framework to construct topic-sensitive language models for large vocabulary speech recognition. Identifying a domain of discourse, a model appropriate for the current domain can be built. In our experiments, the target domain was represented with a piece of text. By using appropriate features, sub-corpus of a large collection of training text was extracted. Our feature selection process was especially suited to languages where words are formed by many different inflectional affixatation. All words with the same meaning (but different grammatical form) were collected in one cluster and represented as one feature. We used the heuristic word weighting classifier TFIDF (term frequency / inverse document frequency) to further shrink the feature vector. Final language model was built by interpolation of topic specific models and a general model. Experiments have been done by using English and Slovenian corpus.
Year
DOI
Venue
2000
10.1007/3-540-45323-7_43
Temporal Logic in Specification
Keywords
Field
DocType
general model,topic-sensitive language model,target domain,feature selection process,topic-sensitive language modelling,topic specific model,different grammatical form,final language model,feature vector,current domain,appropriate feature,inverse document frequency,feature selection,speech recognition,language model,term frequency
Feature vector,Feature selection,tf–idf,Computer science,Speech recognition,Natural language processing,Artificial intelligence,Domain of discourse,Topic model,Classifier (linguistics),Vocabulary,Language model
Conference
Volume
ISSN
ISBN
1902
0302-9743
3-540-41042-2
Citations 
PageRank 
References 
0
0.34
2
Authors
2
Name
Order
Citations
PageRank
Mirjam Sepesy Maučec150626.34
Zdravko Kacic224036.22