From Speech Signals to Semantics — Tagging Performance at Acoustic, Phonetic and Word Levels - Citegraph

Paper Info

Title
From Speech Signals to Semantics — Tagging Performance at Acoustic, Phonetic and Word Levels

Abstract
Spoken language understanding (SLU) is to decode the semantic information embedded in speech input. SLU decoding can be significantly degraded by mismatched acoustic/language models between training and testing of a decoder. In this paper we investigate the semantic tagging performance of bidirectional LSTM RNN (BLSTM-RNN) with input at acoustic, phonetic and word levels. It is tested on a crowdsourced, spoken dialog speech corpus spoken by non-native speakers in a job interview task. The tagging performance is shown to be improved successively from low-level, acoustic MFCC, midlevel, stochastic senone posteriorgram, to high-level, ASR recognized word string, with the corresponding tagging accuracies at 70.6%, 82.1% and 85.1%, respectively. With a score fusion of the three individual RNNs together, the accuracy can be further improved to 87.0%.

Year	DOI	Venue
2018	10.1109/ISCSLP.2018.8706581	2018 11th International Symposium on Chinese Spoken Language Processing (ISCSLP)
Keywords	Field	DocType
Semantics,Acoustics,Training,Feature extraction,Tagging,Interviews,Hidden Markov models	Speech corpus,Mel-frequency cepstrum,Computer science,Speech recognition,Feature extraction,Decoding methods,Hidden Markov model,Language model,Spoken language,Semantics	Conference
ISBN	Citations	PageRank
978-1-5386-5627-3	0	0.34
References	Authors
0	5

Authors (5 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Qian Yao	1	527	51.55
Rutuja Ubale	2	2	3.17
Patrick Lange	3	9	8.42
Keelan Evanini	4	1	4.42
Frank K. Soong	5	1395	268.29

1