Abstract | ||
---|---|---|
We report on the performance of the vector space model in the presence of OCR errors. We show that average precision and recall is not affected for our full text document collection when the OCR version is compared to its corresponding corrected set. We do see divergence though between the relevant document rankings of the OCR and corrected collections with different weighting combinations. In particular, we observed that cosine normalization plays a considerable role in the disparity seen between the collections. Furthermore, we show that even though feedback improves retrieval for both collections, it can not be used to compensate for OCR errors caused by badly degraded documents. |
Year | DOI | Venue |
---|---|---|
1996 | 10.1016/0306-4573(95)00058-5 | Information Processing and Management: an International Journal |
Keywords | Field | DocType |
vector space model,ocr error,relevance information retrieval,information retrieval,analysis of variance,feedback | Weighting,Divergence,Normalization (statistics),Information retrieval,Ranking,Computer science,Precision and recall,Speech recognition,Relevance (information retrieval),Vector space model,Document retrieval | Journal |
Volume | Issue | ISSN |
32 | 3 | 0306-4573 |
Citations | PageRank | References |
34 | 2.81 | 8 |
Authors | ||
3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Kazem Taghva | 1 | 350 | 43.51 |
Julie Borsack | 2 | 208 | 22.53 |
Allen Condit | 3 | 210 | 22.95 |