Kullback-Leibler Divergence-Based Asr Training Data Selection - Citegraph

Paper Info

Title
Kullback-Leibler Divergence-Based Asr Training Data Selection

Abstract
Data preparation and selection affects systems in a wide range of complexities. A system built for a resource-rich language may be so large as to include borrowed languages. A system built for a resource-scarce language may be affected by how carefully the training data is selected and produced.Accuracy is affected by the presence of enough samples of qualitatively relevant information. We propose a method using the Kullback-Leibler divergence to solve two problems related to data preparation: the ordering of alternate pronunciations in a lexicon, and the selection of transcription data. In both cases, we want to guarantee that a particular distribution of n-grams is achieved. In the case of lexicon design, we want to ascertain that phones will be present often enough. In the case of training data selection for scarcely resourced languages, we want to make sure that some n-grams are better represented than others. Our proposed technique yields encouraging results.

Year	Venue	Keywords
2011	12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5	acoustic model training, lexical model, maximum entropy, Kullback-Leibler divergence, training data selection
Field	DocType	Citations
Training set,Computer science,Speech recognition,Principle of maximum entropy,Kullback–Leibler divergence	Conference	3
PageRank	References	Authors
0.42	6	2

Authors (2 rows)

Cited by (3 rows)

References (6 rows)

Name	Order	Citations	PageRank
Evandro Gouvêa	1	20	10.47
Marelie H. Davel	2	236	22.70

1