Title
Web-Based Language Model Domain Adaptation for Real World Voice Retrieval
Abstract
This paper presents our recent work on the development of a real world voice retrieval system, which automatically updates language models for a specific domain with the latest web data. Two of the main difficult issues in handling this system are tackled in this paper. First, when people use voice retrieval systems, new created "hot words" are inputted as the keywords. In order to ensure the quality of the user experience, it is important to increase the recognition performance of these hot words. Second, for our applications, the retrieval domain is given. How to automatically select in domain data from the web data and update domain-specific language models is another problem which needs to be solved. To address these issues, in the system the latest text training data are obtained by searching web data related to the top ranking hot words. Based on the data, a block-based language modeling method is proposed to automatically build and update domain-specific language models. Meanwhile, in-domain high frequency words and phrases are added into the lexicon for the lexicon updating. From real world users' voice retrieval dataset, experimental results showed that through the updating of our system, consistent improvements were achieved for in-domain voice retrieval recognition.
Year
DOI
Venue
2013
10.1109/CIS.2013.28
CIS
Keywords
Field
DocType
voice retrieval dataset,hot word recognition,recognition performance,voice retrieval system,real world voice retrieval,hot word,in-domain high-frequency phrases,voice retrieval,web data,information retrieval,retrieval domain,in-domain high-frequency words,web-based language model domain,web-based language model domain adaptation,domain-specific language models,domain data,latest web data,domain-specific language model,in-domain voice retrieval recognition,automatic language model update,text training data,internet,web data search,blockbased language model,block-based language modeling method,natural language processing,lexicon updating,text analysis,latest text training data
Cache language model,Question answering,Information retrieval,Data retrieval,Data control language,Computer science,Universal Networking Language,Language identification,Natural language processing,Artificial intelligence,Language model,Visual Word
Conference
ISBN
Citations 
PageRank 
978-1-4799-2548-3
0
0.34
References 
Authors
8
5
Name
Order
Citations
PageRank
Mengzhe Chen111.03
Qingqing Zhang210214.76
Zhichao Wang365.56
Jielin Pan44418.04
Yonghong Yan5656114.13