Abstract | ||
---|---|---|
The paper proposes a new framework to construct topic-sensitive language models for large vocabulary speech recognition. Identifying
a domain of discourse, a model appropriate for the current domain can be built. In our experiments, the target domain was
represented with a piece of text. By using appropriate features, sub-corpus of a large collection of training text was extracted.
Our feature selection process was especially suited to languages where words are formed by many different inflectional affixatation.
All words with the same meaning (but different grammatical form) were collected in one cluster and represented as one feature.
We used the heuristic word weighting classifier TFIDF (term frequency / inverse document frequency) to further shrink the feature vector. Final language model was built by interpolation
of topic specific models and a general model. Experiments have been done by using English and Slovenian corpus.
|
Year | DOI | Venue |
---|---|---|
2000 | 10.1007/3-540-45323-7_43 | Temporal Logic in Specification |
Keywords | Field | DocType |
general model,topic-sensitive language model,target domain,feature selection process,topic-sensitive language modelling,topic specific model,different grammatical form,final language model,feature vector,current domain,appropriate feature,inverse document frequency,feature selection,speech recognition,language model,term frequency | Feature vector,Feature selection,tf–idf,Computer science,Speech recognition,Natural language processing,Artificial intelligence,Domain of discourse,Topic model,Classifier (linguistics),Vocabulary,Language model | Conference |
Volume | ISSN | ISBN |
1902 | 0302-9743 | 3-540-41042-2 |
Citations | PageRank | References |
0 | 0.34 | 2 |
Authors | ||
2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Mirjam Sepesy Maučec | 1 | 506 | 26.34 |
Zdravko Kacic | 2 | 240 | 36.22 |