The effects of OCR error on the extraction of private information - Citegraph

Paper Info

Title
The effects of OCR error on the extraction of private information

Abstract
OCR error has been shown not to affect the average accuracy of text retrieval or text categorization.Recent studies however have indicated that information extraction is significantly degraded by OCR error. We experimented with information extraction software on two collections, one with OCR-ed documents and another with manually-corrected versions of the former. We discovered a significant reduction in accuracy on the OCR text versus the corrected text. The majority of errors were attributable to zoning problems rather than OCR classification errors.

Year	DOI	Venue
2006	10.1007/11669487_31	Document Analysis Systems
Keywords	Field	DocType
information extraction software,text retrieval,corrected text,ocr text,average accuracy,ocr-ed document,ocr classification error,text categorization,private information,ocr error,information extraction	Character recognition,Computer science,Document Structure Description,Optical character recognition,Image processing,Speech recognition,Software,Information extraction,Private information retrieval,Text retrieval	Conference
Volume	ISSN	ISBN
3872	0302-9743	3-540-32140-3
Citations	PageRank	References
10	0.91	8
Authors
3

Authors (3 rows)

Cited by (10 rows)

References (8 rows)

Name	Order	Citations	PageRank
Kazem Taghva	1	350	43.51
Russell Beckley	2	39	4.38
jeffrey coombs	3	89	7.73

1