Unsupervised Hmm Posteriograms For Language Independent Acoustic Modeling In Zero Resource Conditions - Citegraph

Paper Info

Title
Unsupervised Hmm Posteriograms For Language Independent Acoustic Modeling In Zero Resource Conditions

Abstract
The task of language independent acoustic unit modeling in unlabeled raw speech (zero-resource setting) has gained significant interest over the recent years. The main challenge here is the extraction of acoustic representations that elicit good similarity between the same words or linguistic tokens spoken by different speakers and to derive these representations in a language independent manner. In this paper, we explore the use of Hidden Markov Model (HMM) based posteriograms for unsupervised acoustic unit modeling. The states of the HMM (which represent the language independent acoustic units) are initialized using a Gaussian mixture model (GMM) - Universal Background Model (UBM). The trained HMM is subsequently used to generate a temporally contiguous state alignment which are then modeled in a hybrid deep neural network (DNN) model. For the purpose of testing, we use the frame level HMM state posteriors obtained from the DNN as features for the ZeroSpeech challenge task. The minimal pair ABX error rate is measured for both the within and across speaker pairs. With several experiments on multiple languages in the ZeroSpeech corpus, we show that the proposed HMM based posterior features provides significant improvements over the baseline system using MFCC features (average relative improvements of 25 % for within speaker pairs and 40 % for across speaker pairs). Furthermore, the experiments where the target language is not seen training illustrate the proposed modeling approach is capable of learning global language independent representations.

Year	DOI	Venue
2017	10.1109/asru.2017.8269014	2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU)
Keywords	Field	DocType
Unsupervised learning, Hidden Markov Model (HMM) posteriograms, Multilingual Modeling, Zero resource speech	Mel-frequency cepstrum,Minimal pair,Computer science,Word error rate,ABX test,Speech recognition,Unsupervised learning,Hidden Markov model,Artificial neural network,Mixture model	Conference
Citations	PageRank	References
0	0.34	0
Authors
5

Authors (5 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
T. K. Ansari	1	0	0.68
Rajath Kumar	2	0	0.68
Sonali Singh	3	0	0.68
Sriram Ganapathy	4	252	39.62
Susheela Devi	5	0	0.34

1