Title
Disfluency Detection Based on Speech-Aware Token-by-Token Sequence Labeling with BLSTM-CRFs and Attention Mechanisms
Abstract
This paper presents a new method for token-by-token sequence labeling that can leverage not only lexical information but also speech information without any alignments. Our motivation is to detect disfluencies such as fillers and word fragments robustly from spontaneous speech. Disfluency detection is often modeled as a token-by-token sequence labeling using a transcribed text via automatic speech recognition. However, utilizing the lexical information alone is not sufficient because the disfluencies cause changes to speech information. One problem is that the speech and the transcribed text need to be aligned when we handle speech and lexical information simultaneously. This prevents introducing speech information to the disfluency detection. To solve this problem, we propose a method for token-by-token sequence labeling, one that can simultaneously use lexical and speech information without requiring any alignments. To this end, we introduce attention mechanisms into a method for neural sequence labeling based on bi-directional long short-term memory recurrent neural network conditional random fields. The attention mechanisms enable us to find the term of disfluencies from speech automatically. Our experimental results show that the proposed method using acoustic and prosodic features improves the labeling accuracy compared with that using lexical features alone.
Year
DOI
Venue
2019
10.1109/APSIPAASC47483.2019.9023119
2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)
Keywords
DocType
ISSN
disfluency detection,sequence labeling,BLSTM-CRFs
Conference
2640-009X
ISBN
Citations 
PageRank 
978-1-7281-3249-5
0
0.34
References 
Authors
1
5
Name
Order
Citations
PageRank
Tomohiro Tanaka1178.61
Ryo Masumura22528.24
Takafumi Moriya335.45
Takanobu Oba45312.09
Yushi Aono5711.02