Effects of OCR errors on ranking and feedback using the vector space model - Citegraph

Paper Info

Title
Effects of OCR errors on ranking and feedback using the vector space model

Abstract
We report on the performance of the vector space model in the presence of OCR errors. We show that average precision and recall is not affected for our full text document collection when the OCR version is compared to its corresponding corrected set. We do see divergence though between the relevant document rankings of the OCR and corrected collections with different weighting combinations. In particular, we observed that cosine normalization plays a considerable role in the disparity seen between the collections. Furthermore, we show that even though feedback improves retrieval for both collections, it can not be used to compensate for OCR errors caused by badly degraded documents.

Year	DOI	Venue
1996	10.1016/0306-4573(95)00058-5	Information Processing and Management: an International Journal
Keywords	Field	DocType
vector space model,ocr error,relevance information retrieval,information retrieval,analysis of variance,feedback	Weighting,Divergence,Normalization (statistics),Information retrieval,Ranking,Computer science,Precision and recall,Speech recognition,Relevance (information retrieval),Vector space model,Document retrieval	Journal
Volume	Issue	ISSN
32	3	0306-4573
Citations	PageRank	References
34	2.81	8
Authors
3

Authors (3 rows)

Cited by (34 rows)

References (8 rows)

Name	Order	Citations	PageRank
Kazem Taghva	1	350	43.51
Julie Borsack	2	208	22.53
Allen Condit	3	210	22.95

1