Title
Improved End-Of-Query Detection For Streaming Speech Recognition
Abstract
In many streaming speech recognition applications such as voice search it is important to determine quickly and accurately when the user has finished speaking their query. A conventional approach to this task is to declare end-of-query whenever a fixed interval of silence is detected by a voice activity detector (VAD) trained to classify each frame as speech or silence. However silence detection and end-of-query detection are fundamentally different tasks, and the criterion used during VAD training may not be optimal. In particular the conventional approach ignores potential acoustic cues such as filler sounds and past speaking rate which may indicate whether a given pause is temporary or query-final. In this paper we present a simple modification to make the conventional VAD training criterion more closely related to end-of-query detection. A unidirectional long shortterm memory architecture allows the system to remember past acoustic events, and the training criterion incentivizes the system to learn to use any acoustic cues relevant to predicting future user intent. We show experimentally that this approach improves latency at a given accuracy by around 100 ins for end-of-query detection for voice search.
Year
DOI
Venue
2017
10.21437/Interspeech.2017-496
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION
Keywords
Field
DocType
endpointing, voice activity detection, end-of-query detection
Pattern recognition,Voice activity detection,Computer science,Speech recognition,Artificial intelligence
Conference
ISSN
Citations 
PageRank 
2308-457X
2
0.39
References 
Authors
0
4
Name
Order
Citations
PageRank
Matt Shannon1796.31
Gabor Simko2427.06
Shuo-Yiin Chang3274.71
Carolina Parada424213.11