Abstract | ||
---|---|---|
We describe a new approach to automatic dialect and accent recognition which exceeds state-of-the-art performance in three recognition tasks. This approach improves the accuracy and substantially lower the time complexity of our earlier phonetic-based kernel approach for dialect recognition. In contrast to state-of-the-art acoustic-based systems, our approach employs phone labels and segmentation to constrain the acoustic models. Given a speaker's utterance, we first obtain phone hypotheses using a phone recognizer and then extract GMM-supervectors for each phone type, effectively summarizing the speaker's phonetic characteristics in a single vector of phone-type supervectors. Using these vectors, we design a kernel function that computes the phonetic similarities between pairs of utterances to train SVM classifiers to identify dialects. Comparing this approach to the state-of-the-art, we obtain a 12.9% relative improvement in EER on Arabic dialects, and a 17.9% relative improvement for American vs. Indian English dialects. We also see a 53.5% relative improvement over a GMM-UBM on American Southern vs. Non-Southem English. |
Year | Venue | Keywords |
---|---|---|
2011 | 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5 | linguistics,information technology,computer science |
Field | DocType | Citations |
Kernel (linear algebra),Indian English,Segmentation,Computer science,Support vector machine,Utterance,Speech recognition,Phone,Natural language processing,Artificial intelligence,Time complexity,Kernel (statistics) | Conference | 13 |
PageRank | References | Authors |
0.65 | 14 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Fadi Biadsy | 1 | 207 | 15.14 |
Julia Hirschberg | 2 | 2982 | 448.62 |
Daniel P. W. Ellis | 3 | 4198 | 356.08 |