Title
Combining Cross-Stream And Time Dimensions In Phonetic Speaker Recognition
Abstract
Recent studies show that phonetic sequences from multiple languages can provide effective features for speaker recognition. So far, only pronunciation dynamics in the time dimension, i.e., n-gram modeling on each of the phone sequences, have been examined. In the JHU 2002 Summer Workshop, we explored modeling the statistical pronunciation dynamics across streams in multiple languages (cross-stream dimension) as an additional component to the time dimension. We found that bigram modeling in the cross-stream dimension achieves improved performance over that in the time dimension on the NIST 2001 Speaker Recognition Evaluation Extended Data Task. Moreover, a linear combination of information from both dimensions at the score level further improves the performance, showing that the two dimensions contain complementary information.
Year
DOI
Venue
2003
10.1109/ICASSP.2003.1202764
2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PROCEEDINGS: SIGNAL PROCESSING FOR COMMUNICATIONS SPECIAL SESSIONS
Keywords
Field
DocType
feature extraction,acoustics,scanning probe microscopy,speaker recognition,two dimensions,statistical analysis,loudspeakers,speech processing,natural languages,acoustic noise,data mining,china,linguistics,testing,phonetics,time dimension,nist,speech recognition
Pronunciation,Speech processing,Computer science,Phonetics,Speech recognition,Feature extraction,Natural language,Speaker recognition,Natural language processing,Bigram,Artificial intelligence,Multiple time dimensions
Conference
ISSN
Citations 
PageRank 
1520-6149
17
1.95
References 
Authors
11
6
Name
Order
Citations
PageRank
Qin Jin163966.86
Jiri Navratil231431.36
D. A. Reynolds37176641.65
Joseph P. Campbell481485.36
Walter D. Andrews513812.65
Joy S. Abramson6171.95