Title
Effects of OCR errors on ranking and feedback using the vector space model
Abstract
We report on the performance of the vector space model in the presence of OCR errors. We show that average precision and recall is not affected for our full text document collection when the OCR version is compared to its corresponding corrected set. We do see divergence though between the relevant document rankings of the OCR and corrected collections with different weighting combinations. In particular, we observed that cosine normalization plays a considerable role in the disparity seen between the collections. Furthermore, we show that even though feedback improves retrieval for both collections, it can not be used to compensate for OCR errors caused by badly degraded documents.
Year
DOI
Venue
1996
10.1016/0306-4573(95)00058-5
Information Processing and Management: an International Journal
Keywords
Field
DocType
vector space model,ocr error,relevance information retrieval,information retrieval,analysis of variance,feedback
Weighting,Divergence,Normalization (statistics),Information retrieval,Ranking,Computer science,Precision and recall,Speech recognition,Relevance (information retrieval),Vector space model,Document retrieval
Journal
Volume
Issue
ISSN
32
3
0306-4573
Citations 
PageRank 
References 
34
2.81
8
Authors
3
Name
Order
Citations
PageRank
Kazem Taghva135043.51
Julie Borsack220822.53
Allen Condit321022.95