Abstract | ||
---|---|---|
This paper presents a strategy to optimize the phonotactic front-end for spoken language recognition. This is achieved by selecting a subset of phones from an existing phone recognizer's phone inventory such that only the phones that best discriminate each of the target languages are selected. Each such phone subset will be used to construct a target-oriented phone tokenizer (TOPT). In this study, we examine different approaches to construct such phone tokenizers for the front-end of a Parallel Phone Recognizers followed by Vector Space Modeling (PPR-VSM) system. We show that the target-oriented phone tokenizers derived from language-specific phone recognizers are more effective than the original parallel phone recognizers. Our experimental results also show that the target-oriented phone tokenizers derived from universal phone recognizers achieve better performance than those derived from language-specific phone recognizers. Using the proposed target-oriented phone tokenizers as the phonotactic front-end, the language recognition system performance is significantly improved without the need for additional training samples. We achieve an equal error rate (EER) of 1.27%, 1.42% and 2.73% on the NIST 1996, 2003 and 2007 LRE databases respectively for 30-s closed-set tests. This system is one of the subsystems in IIR's submission to NIST 2007 LRE. |
Year | DOI | Venue |
---|---|---|
2009 | 10.1109/TASL.2009.2016731 | IEEE Transactions on Audio, Speech & Language Processing |
Keywords | Field | DocType |
target-oriented phonotactic front-end,spoken language recognition,existing phone recognizer,phone subset,target-oriented phone,language recognition,parallel phone,phone tokenizers,parallel phone recognizer,phonotactic feature edics:,index terms: feature selection,phone inventory,original parallel phone recognizers,proposed target-oriented phone tokenizers,universal phone recognizers,universal phone recognizer,language-specific phone recognizers,target-oriented phone tokenizer,system performance,vector space model,speech processing,feature selection,vectors,speech recognition,indexing terms,front end,databases | Speech processing,Feature selection,Computer science,Word error rate,Speech recognition,Feature extraction,Phone,NIST,Artificial intelligence,Natural language processing,Lexical analysis,Spoken language | Journal |
Volume | Issue | ISSN |
17 | 7 | 1558-7916 |
Citations | PageRank | References |
5 | 0.44 | 34 |
Authors | ||
4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Rong Tong | 1 | 108 | 11.33 |
Bin Ma | 2 | 600 | 47.26 |
Haizhou Li | 3 | 3678 | 334.61 |
Eng Siong Chng | 4 | 970 | 106.33 |