Automatic language identification using discrete hidden Markov model - Citegraph

Paper Info

Title
Automatic language identification using discrete hidden Markov model

Abstract
In the recent automatic language identification research, phono- tactic approach has been studied in which all training utter- ances are passed through a tokenizer in order to get phonetic se- quences to train the language model of different languages. The true transcription of the utterances was totally ignored. How- ever, information in the transcription may possess important discriminating power for language identification. In this paper, we propose to use discrete hidden Markov model that takes ac- count of the potential error patterns of the acoustic tokenizer and incorporates the transcription of the utterances in the lan- guage model training. Furthermore, with the DHMM approach, LID using multiple phonetic tokenizers can simply be consid- ered as using a multi-dimensional features to the DHMM allow- ing the making of joint decision earlier in the process. A system employing this approach produces 59.00% and 68.33% accu- racy on 10-sec and 45-sec speech respectively on recognizing a close set of six languages in the OGI telephone speech corpus while the phonotactic approach gives 57.00% and 77.50% iden- tification accuracy on 10-sec and 45-sec speech when the phone recognizer uses three-state and three-mixture HMM.

Year	Venue	Keywords
2004	INTERSPEECH	language model,hidden markov model,language identification
Field	DocType	Citations
Speech corpus,Maximum-entropy Markov model,Pattern recognition,Computer science,Markov model,Speech recognition,Variable-order Markov model,Language identification,Artificial intelligence,Hidden Markov model,Language model,Markov algorithm	Conference	3
PageRank	References	Authors
0.40	6	2

Authors (2 rows)

Cited by (3 rows)

References (6 rows)

Name	Order	Citations	PageRank
Ka-keung Wong	1	3	0.40
Manhung Siu	2	464	61.40

1