Enhancing Low Resource Keyword Spotting With Automatically Retrieved Web Documents - Citegraph

Paper Info

Title
Enhancing Low Resource Keyword Spotting With Automatically Retrieved Web Documents

Abstract
Keyword Spotting (KWS) systems developed for low resource languages with very little transcribed audio suffer due to a small vocabulary (high out-of-vocabulary (OOV) rate) and a weak language model. In this paper, we propose to augment such systems using automatically retrieved web documents. Our procedure can find large volumes of web documents similar to a small pool of training transcriptions within a few hours, by querying a search engine with automatically generated query terms. We then use simple language identification to extract high-confidence text for lexicon expansion and language modeling. Experiments using six very limited language packs (VLLP) from the IARPA-Babel program show web documents can cut the OOV rate by half on the development set, and on average improve keyword spotting performance by 2.8 points absolute measured by the Actual Term Weighted Value (ATWV). In particular, we find most of the gains (8.7 points on average) are from keywords that were OOV in the baseline system, and are converted into in-vocabulary (IV) through lexicon expansion. These gains are obtained even after using subword units (unsupervised syllable-like units and sequences of phones), which are known to greatly enhance OOV keyword search performance.

Year	Venue	Keywords
2015	16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5	web document retrieval, keyword spotting, language modeling
Field	DocType	Citations
Keyword density,Information retrieval,Computer science,Speech recognition,Keyword spotting	Conference	7
PageRank	References	Authors
0.47	6	6

Authors (6 rows)

Cited by (7 rows)

References (6 rows)

Name	Order	Citations	PageRank
Le Zhang	1	268	32.16
Damianos Karakos	2	221	19.35
William Hartmann	3	64	10.66
Roger Hsiao	4	57	3.32
Richard M. Schwartz	5	2839	717.76
Stavros Tsakalidis	6	213	13.83

1