Abstract | ||
---|---|---|
The aims of the study described in this paper are (1) to assess the relative speaker discriminant properties of phonemes and (2) to investigate the importance of the temporal frame-to-frame information for speaker modelling in the framework of a text-prompted speaker verification system using hidden Markov models (HMMs) and multilayer perceptrons (MLPs). It is shown that, with similar experimental conditions, nasals, fricatives and vowels convey more speaker specific information than plosives and liquids. Regarding the influence of the frame-to-frame temporal information, significant improvements are reported from the inclusion of several acoustic frames at the input of the MLPs. The results tend also to show that each phoneme has its optimal MLP context size giving the best equal error rate (EER) |
Year | DOI | Venue |
---|---|---|
1998 | 10.1109/ICASSP.1998.675380 | ICASSP |
Keywords | Field | DocType |
acoustic signal processing,error statistics,hidden Markov models,multilayer perceptrons,speaker recognition,speech processing,HMM,acoustic frames,equal error rate,fricatives,hidden Markov models,liquids,multilayer perceptrons,nasals,optimal MLP context size,phoneme specific MLP,plosives,speaker discriminant properties,speaker modelling,temporal frame-to-frame information,text-prompted speaker verification experiments,vowels | Speech processing,Pattern recognition,Discriminant,Computer science,Word error rate,Speech recognition,Speaker recognition,Specific-information,Speaker diarisation,Artificial intelligence,Hidden Markov model,Perceptron | Conference |
Volume | ISSN | ISBN |
2 | 1520-6149 | 0-7803-4428-6 |
Citations | PageRank | References |
12 | 0.68 | 8 |
Authors | ||
2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Dijana Petrovska-Delacretaz | 1 | 57 | 6.98 |
Jean Hennebert | 2 | 417 | 38.70 |