Abstract | ||
---|---|---|
User interactions with mobile devices increasingly depend on voice as a primary input modality. Due to the disadvantages of sending audio across potentially spotty network connections for speech recognition, in recent years there has been growing attention to performing recognition on-device. The limited computational resources, however, typically require additional model constraints. In this work, we explore the task of on-device utterance verification, wherein the recognizer must transcribe an utterance if it is in a target set or reject it as being out of domain. We present a data-driven methodology for mining tens of thousands of target phrases from an existing corpus. We then compare two common garbage-modeling approaches to utterance verification: a sub-word rejection model and a white-listed n-gram model. We examine a deficiency of the sub-word modeling approach and introduce a novel modification that makes use of common prefixes between targeted phrases and non-targeted phrases. We show good performance in the trade-off between recall and word error rate using both the prefix and white-listed n-gram approaches. Finally, we evaluate the prefix-based approach in a hybrid setting where rejected instances are sent to a server-side recognizer. |
Year | Venue | Keywords |
---|---|---|
2015 | 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5 | automatic speech recognition, language modeling, utterance verification, OOV rejection, garbage modeling |
Field | DocType | Citations |
Garbage,Computer science,Word error rate,Utterance,Speech recognition,Prefix,Mobile device,Artificial intelligence,Natural language processing,Recall | Conference | 1 |
PageRank | References | Authors |
0.35 | 12 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Christophe Van Gysel | 1 | 53 | 5.27 |
Leonid Velikovich | 2 | 162 | 8.07 |
Ian McGraw | 3 | 253 | 24.41 |
Françoise Beaufays | 4 | 341 | 27.76 |