Title
Noise-tolerance feasibility for restricted-domain Information Retrieval systems.
Abstract
Information Retrieval systems normally have to work with rather heterogeneous sources, such as Web sites or documents from Optical Character Recognition tools. The correct conversion of these sources into flat text files is not a trivial task since noise may easily be introduced as a result of spelling or typeset errors. Interestingly, this is not a great drawback when the size of the corpus is sufficiently large, since redundancy helps to overcome noise problems. However, noise becomes a serious problem in restricted-domain Information Retrieval specially when the corpus is small and has little or no redundancy. This paper devises an approach which adds noise-tolerance to Information Retrieval systems. A set of experiments carried out in the agricultural domain proves the effectiveness of the approach presented.
Year
DOI
Venue
2013
10.1016/j.datak.2013.02.002
Data & Knowledge Engineering
Keywords
Field
DocType
Information Retrieval,Noise-tolerance,Restricted domain,Edit distance
Edit distance,Drawback,Data mining,Human–computer information retrieval,Information retrieval,Computer science,Optical character recognition,Redundancy (engineering),Noise tolerance,Database,Visual Word
Journal
Volume
Issue
ISSN
86
1
0169-023X
Citations 
PageRank 
References 
0
0.34
33
Authors
5
Name
Order
Citations
PageRank
Katia Vila1234.56
Antonio Fernández Orquín2111.75
José M. Gómez3328.02
Antonio Ferrández422625.57
Josval Díaz561.14