Title
A bilingual study on the prediction of morph-based improvement.
Abstract
Morph-based language modeling has been efficiently applied in improving the accuracy of Large-Vocabulary Continuous Speech Recognition (LVCSR) systems – especially in morphologically rich languages. However, the rate of improvements varies greatly and the underlying principles have been only superficially studied. Having a method that can predict the expected improvement prior to experimentations would be largely useful. In this paper, we introduce language-independent factors affecting morphbased improvement and show how they can be utilized in estimating the effectiveness of statistical morph-based language modeling. The task was broadcast news transcription in two less investigated languages, Hungarian and Romanian. It was found that in case of under-resourced conditions morph-based models can bring significant improvement – even for a morphologically less rich language like Romanian. In addition, it was shown that noninitial morph tagging can constantly outperform explicit modeling of word-boundaries both in terms of letter and word accuracies.
Year
Venue
Field
2014
SLTU
Broadcasting,Romanian,Computer science,Speech recognition,Natural language processing,Artificial intelligence,Language model
DocType
Citations 
PageRank 
Conference
2
0.40
References 
Authors
8
3
Name
Order
Citations
PageRank
Balázs Tarján1214.92
Tibor Fegyó26110.46
Péter Mihajlik35810.15