Improved End-Of-Query Detection For Streaming Speech Recognition - Citegraph

Paper Info

Title
Improved End-Of-Query Detection For Streaming Speech Recognition

Abstract
In many streaming speech recognition applications such as voice search it is important to determine quickly and accurately when the user has finished speaking their query. A conventional approach to this task is to declare end-of-query whenever a fixed interval of silence is detected by a voice activity detector (VAD) trained to classify each frame as speech or silence. However silence detection and end-of-query detection are fundamentally different tasks, and the criterion used during VAD training may not be optimal. In particular the conventional approach ignores potential acoustic cues such as filler sounds and past speaking rate which may indicate whether a given pause is temporary or query-final. In this paper we present a simple modification to make the conventional VAD training criterion more closely related to end-of-query detection. A unidirectional long shortterm memory architecture allows the system to remember past acoustic events, and the training criterion incentivizes the system to learn to use any acoustic cues relevant to predicting future user intent. We show experimentally that this approach improves latency at a given accuracy by around 100 ins for end-of-query detection for voice search.

Year	DOI	Venue
2017	10.21437/Interspeech.2017-496	18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION
Keywords	Field	DocType
endpointing, voice activity detection, end-of-query detection	Pattern recognition,Voice activity detection,Computer science,Speech recognition,Artificial intelligence	Conference
ISSN	Citations	PageRank
2308-457X	2	0.39
References	Authors
0	4

Authors (4 rows)

Cited by (2 rows)

References (0 rows)

Name	Order	Citations	PageRank
Matt Shannon	1	79	6.31
Gabor Simko	2	42	7.06
Shuo-Yiin Chang	3	27	4.71
Carolina Parada	4	242	13.11

1