Title
Training Data Selection For Acoustic Modeling Via Submodular Optimization Of Joint Kullback-Leibler Divergence
Abstract
This paper provides a novel training data selection method to construct acoustic models for automatic speech recognition (ASR). Various training data sets have been developed for acoustic modeling. Each training set was created for a specific ASR application such that acoustic characteristics in the set, e.g. speakers, noise and recording devices, match those in the application. A mixture of such already-created training sets (an out-of-domain set) becomes a large utterance set containing various acoustic characteristics. The proposed method selects the most appropriate subset of the out-of-domain set and uses it for supervised training of an acoustic model for a new ASR application. The subset that has the most similar acoustic characteristics to the target-domain set (i.e. untranscribed utterances recorded by the target application) is selected based on the proposed joint Kullback-Leibler (KL) divergence of speech and non-speech characteristics. Furthermore, in order to select one of the many subsets in practical computation time, we also propose a selection algorithm based on submodular optimization that minimizes the joint KL divergence by greedy selection with guaranteed optimality. Experiments on real meeting utterances that use deep neural network acoustic models show that the proposed method yields better acoustic models than random or likelihood-based selection.
Year
Venue
Keywords
2015
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5
speech recognition, acoustic model, training data selection, Kullback-Leibler divergence, submodular optimization
Field
DocType
Citations 
Training set,Pattern recognition,Computer science,Submodular set function,Speech recognition,Artificial intelligence,Kullback–Leibler divergence
Conference
0
PageRank 
References 
Authors
0.34
5
5
Name
Order
Citations
PageRank
Taichi Asami12210.49
Ryo Masumura22528.24
Hirokazu Masataki3189.21
Manabu Okamoto401.01
Sumitaka Sakauchi5368.30