Web resources for language modeling in conversational speech recognition - Citegraph

Paper Info

Title
Web resources for language modeling in conversational speech recognition

Abstract
This article describes a methodology for collecting text from the Web to match a target sublanguage both in style (register) and topic. Unlike other work that estimates n-gram statistics from page counts, the approach here is to select and filter documents, which provides more control over the type of material contributing to the n-gram counts. The data can be used in a variety of ways; here, the different sources are combined in two types of mixture models. Focusing on conversational speech where data collection can be quite costly, experiments demonstrate the positive impact of Web collections on several tasks with varying amounts of data, including Mandarin and English telephone conversations and English meetings and lectures.

Year	DOI	Venue
2007	10.1145/1322391.1322392	TSLP
Keywords	Field	DocType
data collection,web collection,web data,english telephone conversation,conversational speech recognition,page count,different source,mixture model,language modeling,n-gram statistic,english meeting,web resource,n-gram count,conversational speech,language model,speech recognition	Web resource,Data collection,Computer science,Speech recognition,Natural language processing,Artificial intelligence,Mandarin Chinese,Mixture model,Language model,Sublanguage	Journal
Volume	Issue	ISSN
5	1	1550-4875
Citations	PageRank	References
27	1.38	41
Authors
6

Authors (6 rows)

Cited by (27 rows)

References (41 rows)

Name	Order	Citations	PageRank
Ivan Bulyko	1	249	22.40
Mari Ostendorf	2	2462	348.75
Manhung Siu	3	464	61.40
Tim Ng	4	122	9.38
Andreas Stolcke	5	6690	712.46
Özgür Çetin	6	154	14.41

1