Title
Automatic language identification using discrete hidden Markov model
Abstract
In the recent automatic language identification research, phono- tactic approach has been studied in which all training utter- ances are passed through a tokenizer in order to get phonetic se- quences to train the language model of different languages. The true transcription of the utterances was totally ignored. How- ever, information in the transcription may possess important discriminating power for language identification. In this paper, we propose to use discrete hidden Markov model that takes ac- count of the potential error patterns of the acoustic tokenizer and incorporates the transcription of the utterances in the lan- guage model training. Furthermore, with the DHMM approach, LID using multiple phonetic tokenizers can simply be consid- ered as using a multi-dimensional features to the DHMM allow- ing the making of joint decision earlier in the process. A system employing this approach produces 59.00% and 68.33% accu- racy on 10-sec and 45-sec speech respectively on recognizing a close set of six languages in the OGI telephone speech corpus while the phonotactic approach gives 57.00% and 77.50% iden- tification accuracy on 10-sec and 45-sec speech when the phone recognizer uses three-state and three-mixture HMM.
Year
Venue
Keywords
2004
INTERSPEECH
language model,hidden markov model,language identification
Field
DocType
Citations 
Speech corpus,Maximum-entropy Markov model,Pattern recognition,Computer science,Markov model,Speech recognition,Variable-order Markov model,Language identification,Artificial intelligence,Hidden Markov model,Language model,Markov algorithm
Conference
3
PageRank 
References 
Authors
0.40
6
2
Name
Order
Citations
PageRank
Ka-keung Wong130.40
Manhung Siu246461.40