Title
From Speech Signals to Semantics — Tagging Performance at Acoustic, Phonetic and Word Levels
Abstract
Spoken language understanding (SLU) is to decode the semantic information embedded in speech input. SLU decoding can be significantly degraded by mismatched acoustic/language models between training and testing of a decoder. In this paper we investigate the semantic tagging performance of bidirectional LSTM RNN (BLSTM-RNN) with input at acoustic, phonetic and word levels. It is tested on a crowdsourced, spoken dialog speech corpus spoken by non-native speakers in a job interview task. The tagging performance is shown to be improved successively from low-level, acoustic MFCC, midlevel, stochastic senone posteriorgram, to high-level, ASR recognized word string, with the corresponding tagging accuracies at 70.6%, 82.1% and 85.1%, respectively. With a score fusion of the three individual RNNs together, the accuracy can be further improved to 87.0%.
Year
DOI
Venue
2018
10.1109/ISCSLP.2018.8706581
2018 11th International Symposium on Chinese Spoken Language Processing (ISCSLP)
Keywords
Field
DocType
Semantics,Acoustics,Training,Feature extraction,Tagging,Interviews,Hidden Markov models
Speech corpus,Mel-frequency cepstrum,Computer science,Speech recognition,Feature extraction,Decoding methods,Hidden Markov model,Language model,Spoken language,Semantics
Conference
ISBN
Citations 
PageRank 
978-1-5386-5627-3
0
0.34
References 
Authors
0
5
Name
Order
Citations
PageRank
Qian Yao152751.55
Rutuja Ubale223.17
Patrick Lange398.42
Keelan Evanini414.42
Frank K. Soong51395268.29