Automatic Estimation Of Language Model Parameters For Unseen Words Using Morpho-Syntactic Contextual Information - Citegraph

Paper Info

Title
Automatic Estimation Of Language Model Parameters For Unseen Words Using Morpho-Syntactic Contextual Information

Abstract
Various information sources naturally contains new words that appear in a daily basis and which are not present in the vocabulary of the speech recognition system but are important for applications such as closed-captioning or information dissemination. To be recognized, those words need to be included in the vocabulary and the language model (LM) parameters updated. In this context, we propose a new method that allows including new words in the vocabulary even if no well suited training data is available, as is the case of archived documents, and without the need of LM retraining. It uses morpho-syntatic information about an in-domain corpus and part-of-speech word classes to define a new LM unigram distribution associated to the updated vocabulary.Experiments were carried out for a European Portuguese Broadcast News transcription system. Results showed a relative reduction of 4% in word error rate, with 78% of the occurrences of those newly included words being correctly recognized.

Year	Venue	Keywords
2008	INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5	morpho-syntactic analysis, POS tags, class-based language models, broadcast news, transcription systems
Field	DocType	Citations
European Portuguese,Broadcasting,Computer science,Word error rate,Speech recognition,Natural language processing,Artificial intelligence,Information Dissemination,Syntax,Vocabulary,Language model,Retraining	Conference	3
PageRank	References	Authors
0.50	7	3

Authors (3 rows)

Cited by (3 rows)

References (7 rows)

Name	Order	Citations	PageRank
Ciro Martins	1	100	11.90
António J. S. Teixeira	2	152	35.26
João Paulo Neto	3	291	32.69

1