Title
Robust spoken document retrieval methods for misrecognition and out-of-vocabulary keywords
Abstract
This paper describes a Japanese spoken document retrieval system that is robust for Out-of-Vocabulary (OOV) words. A standard approach to spoken document retrieval is to automatically transcribe spoken documents into word sequences, which can be directly matched against queries. In this approach, the documents including OOV words and words misrecognized as other words cannot be retrieved. To avoid this problem, we propose a novel method of spoken document retrieval considering OOV keywords. One approach we use is to create an index from multiple recognizer outputs to deal with transcribed documents including misrecognized words. The index becomes better to use multiple recognizers which have different characteristics from one another. The other is to use both word-based indexing for in-vocabulary keywords and syllable-based indexing for OOV keywords, then switch them according to in-vocabulary/OOV keywords in the query. Evaluation results clearly show that this approach benefits from the advantages of both indexing methods and that the proposed technique is quite effective in robustly retrieving spoken documents. © 2004 Wiley Periodicals, Inc. Syst Comp Jpn, 35(14): 44–53, 2004; Published online in Wiley InterScience (). DOI 10.1002/scj.10697
Year
DOI
Venue
2004
10.1002/scj.v35:14
Systems and Computers in Japan
Field
DocType
Volume
Information retrieval,Computer science,Search engine indexing,Natural language processing,Artificial intelligence,Document retrieval,Out of vocabulary
Journal
35
Issue
Citations 
PageRank 
14
1
0.36
References 
Authors
15
2
Name
Order
Citations
PageRank
Hiromitsu Nishizaki116329.49
Seiichi Nakagawa2598104.03