Title
Post-Editing Through Approximation And Global Correction
Abstract
This paper describes a new automatic spelling correction program to deal with OCR generated errors. The method used here is based on three principles:1. Approximate string matching between the misspellings and the terms occuring in the database as opposed to the entire dictionary2. Local information obtained from the individual documents3. The use of a confusion matrix, which contains information inherently specific to the nature of errors caused by the particular OCR deviceThis system is then utilized to process approximately 10,000 pages of OCR generated documents. Among the misspellings discovered by this algorithm, about 87% were corrected.
Year
DOI
Venue
1995
10.1142/S0218001495000377
INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE
Keywords
Field
DocType
optical character recognition, confusion matrix
Confusion matrix,Matrix calculus,Pattern recognition,Computer science,Image processing,Optical character recognition,Error detection and correction,Spelling,Approximate string matching,Artificial intelligence
Journal
Volume
Issue
ISSN
9
6
0218-0014
Citations 
PageRank 
References 
5
1.30
10
Authors
4
Name
Order
Citations
PageRank
Kazem Taghva135043.51
Julie Borsack220822.53
Bryan Bullard351.30
Allen Condit421022.95