Robust spoken document retrieval methods for misrecognition and out-of-vocabulary keywords - Citegraph

Paper Info

Title
Robust spoken document retrieval methods for misrecognition and out-of-vocabulary keywords

Abstract
This paper describes a Japanese spoken document retrieval system that is robust for Out-of-Vocabulary (OOV) words. A standard approach to spoken document retrieval is to automatically transcribe spoken documents into word sequences, which can be directly matched against queries. In this approach, the documents including OOV words and words misrecognized as other words cannot be retrieved. To avoid this problem, we propose a novel method of spoken document retrieval considering OOV keywords. One approach we use is to create an index from multiple recognizer outputs to deal with transcribed documents including misrecognized words. The index becomes better to use multiple recognizers which have different characteristics from one another. The other is to use both word-based indexing for in-vocabulary keywords and syllable-based indexing for OOV keywords, then switch them according to in-vocabulary/OOV keywords in the query. Evaluation results clearly show that this approach benefits from the advantages of both indexing methods and that the proposed technique is quite effective in robustly retrieving spoken documents. © 2004 Wiley Periodicals, Inc. Syst Comp Jpn, 35(14): 44–53, 2004; Published online in Wiley InterScience (). DOI 10.1002/scj.10697

Year	DOI	Venue
2004	10.1002/scj.v35:14	Systems and Computers in Japan
Field	DocType	Volume
Information retrieval,Computer science,Search engine indexing,Natural language processing,Artificial intelligence,Document retrieval,Out of vocabulary	Journal	35
Issue	Citations	PageRank
14	1	0.36
References	Authors
15	2

Authors (2 rows)

Cited by (1 rows)

References (15 rows)

Name	Order	Citations	PageRank
Hiromitsu Nishizaki	1	163	29.49
Seiichi Nakagawa	2	598	104.03

1