Title | ||
---|---|---|
From Speech Signals to Semantics — Tagging Performance at Acoustic, Phonetic and Word Levels |
Abstract | ||
---|---|---|
Spoken language understanding (SLU) is to decode the semantic information embedded in speech input. SLU decoding can be significantly degraded by mismatched acoustic/language models between training and testing of a decoder. In this paper we investigate the semantic tagging performance of bidirectional LSTM RNN (BLSTM-RNN) with input at acoustic, phonetic and word levels. It is tested on a crowdsourced, spoken dialog speech corpus spoken by non-native speakers in a job interview task. The tagging performance is shown to be improved successively from low-level, acoustic MFCC, midlevel, stochastic senone posteriorgram, to high-level, ASR recognized word string, with the corresponding tagging accuracies at 70.6%, 82.1% and 85.1%, respectively. With a score fusion of the three individual RNNs together, the accuracy can be further improved to 87.0%. |
Year | DOI | Venue |
---|---|---|
2018 | 10.1109/ISCSLP.2018.8706581 | 2018 11th International Symposium on Chinese Spoken Language Processing (ISCSLP) |
Keywords | Field | DocType |
Semantics,Acoustics,Training,Feature extraction,Tagging,Interviews,Hidden Markov models | Speech corpus,Mel-frequency cepstrum,Computer science,Speech recognition,Feature extraction,Decoding methods,Hidden Markov model,Language model,Spoken language,Semantics | Conference |
ISBN | Citations | PageRank |
978-1-5386-5627-3 | 0 | 0.34 |
References | Authors | |
0 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Qian Yao | 1 | 527 | 51.55 |
Rutuja Ubale | 2 | 2 | 3.17 |
Patrick Lange | 3 | 9 | 8.42 |
Keelan Evanini | 4 | 1 | 4.42 |
Frank K. Soong | 5 | 1395 | 268.29 |