Improving Speech Recognition And Keyword Search For Low Resource Languages Using Web Data - Citegraph

Paper Info

Title
Improving Speech Recognition And Keyword Search For Low Resource Languages Using Web Data

Abstract
We describe the use of text data scraped from the web to augment language models for Automatic Speech Recognition and Keyword Search for Low Resource Languages. We scrape text from multiple genres including blogs, online news, translated TED talks, and subtitles. Using linearly interpolated language models, we find that blogs and movie subtitles are more relevant for language modeling of conversational telephone speech and obtain large reductions in out-of-vocabulary keywords. Furthermore, we show that the web data can improve Term Error Rate Performance by 3.8% absolute and Maximum Term-Weighted Value in Keyword Search by 0.0076-0.1059 absolute points. Much of the gain comes from the reduction of out-of-vocabulary items.

Year	Venue	Keywords
2015	16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5	web resources, web scraping, keyword search, low-resource languages
Field	DocType	Citations
Computer science,Word error rate,Keyword search,Speech recognition,Natural language processing,Artificial intelligence,Augment,Language model	Conference	7
PageRank	References	Authors
0.49	15	8

Authors (8 rows)

Cited by (7 rows)

References (15 rows)

Name	Order	Citations	PageRank
Gideon Mendels	1	11	1.65
Erica Cooper	2	51	4.19
Victor Soto	3	8	1.55
Julia Hirschberg	4	2982	448.62
Mark J. F. Gales	5	3905	367.45
Kate Knill	6	249	28.02
Anton Ragni	7	98	9.06
Haipeng Wang	8	40	4.25

1