Abstract | ||
---|---|---|
Recent studies show that phonetic sequences from multiple languages can provide effective features for speaker recognition. So far, only pronunciation dynamics in the time dimension, i.e., n-gram modeling on each of the phone sequences, have been examined. In the JHU 2002 Summer Workshop, we explored modeling the statistical pronunciation dynamics across streams in multiple languages (cross-stream dimension) as an additional component to the time dimension. We found that bigram modeling in the cross-stream dimension achieves improved performance over that in the time dimension on the NIST 2001 Speaker Recognition Evaluation Extended Data Task. Moreover, a linear combination of information from both dimensions at the score level further improves the performance, showing that the two dimensions contain complementary information. |
Year | DOI | Venue |
---|---|---|
2003 | 10.1109/ICASSP.2003.1202764 | 2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PROCEEDINGS: SIGNAL PROCESSING FOR COMMUNICATIONS SPECIAL SESSIONS |
Keywords | Field | DocType |
feature extraction,acoustics,scanning probe microscopy,speaker recognition,two dimensions,statistical analysis,loudspeakers,speech processing,natural languages,acoustic noise,data mining,china,linguistics,testing,phonetics,time dimension,nist,speech recognition | Pronunciation,Speech processing,Computer science,Phonetics,Speech recognition,Feature extraction,Natural language,Speaker recognition,Natural language processing,Bigram,Artificial intelligence,Multiple time dimensions | Conference |
ISSN | Citations | PageRank |
1520-6149 | 17 | 1.95 |
References | Authors | |
11 | 6 |
Name | Order | Citations | PageRank |
---|---|---|---|
Qin Jin | 1 | 639 | 66.86 |
Jiri Navratil | 2 | 314 | 31.36 |
D. A. Reynolds | 3 | 7176 | 641.65 |
Joseph P. Campbell | 4 | 814 | 85.36 |
Walter D. Andrews | 5 | 138 | 12.65 |
Joy S. Abramson | 6 | 17 | 1.95 |