Abstract | ||
---|---|---|
Morph-based language modeling has been efficiently applied in improving the accuracy of Large-Vocabulary Continuous Speech Recognition (LVCSR) systems – especially in morphologically rich languages. However, the rate of improvements varies greatly and the underlying principles have been only superficially studied. Having a method that can predict the expected improvement prior to experimentations would be largely useful. In this paper, we introduce language-independent factors affecting morphbased improvement and show how they can be utilized in estimating the effectiveness of statistical morph-based language modeling. The task was broadcast news transcription in two less investigated languages, Hungarian and Romanian. It was found that in case of under-resourced conditions morph-based models can bring significant improvement – even for a morphologically less rich language like Romanian. In addition, it was shown that noninitial morph tagging can constantly outperform explicit modeling of word-boundaries both in terms of letter and word accuracies. |
Year | Venue | Field |
---|---|---|
2014 | SLTU | Broadcasting,Romanian,Computer science,Speech recognition,Natural language processing,Artificial intelligence,Language model |
DocType | Citations | PageRank |
Conference | 2 | 0.40 |
References | Authors | |
8 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Balázs Tarján | 1 | 21 | 4.92 |
Tibor Fegyó | 2 | 61 | 10.46 |
Péter Mihajlik | 3 | 58 | 10.15 |