Title
VisualDiff: Document Image Verification and Change Detection
Abstract
This paper explores the related problems of verification and change detection in document images. The goal is to determine if two document images differ, and if so, to determine precisely what content may have been added, deleted, or otherwise modified. This problem has many potential applications, especially for important legal documents such as contractual agreements. These agreements are often edited, shared and stored as scanned or hardcopy documents, where small, undetected changes between edits could create major differences in the contractual language and thus have severe repercussions. One can view the problem of change detection as tracing the revision history of a set of documents. Thus, in order to validate the performance of this approach, we created the "Enron Revisions" dataset. This dataset contains realistic revisions obtained from attachments in the Enron Corpus, and a series of before and after snapshots of the revisions in images with varying levels of noise from resolution, binarization, and blur. The approach taken in this paper utilizes the SIFT descriptor to align two document images without the benefit of OCR and once aligned, to compare dense descriptors to determine changes that have occurred within the image. As a baseline, this "VisualDiff" is compared to a UNIX diff-like approach on text extracted through OCR and results demonstrate the effectiveness of this approach.
Year
DOI
Venue
2013
10.1109/ICDAR.2013.17
Document Analysis and Recognition
Keywords
Field
DocType
document image processing,image resolution,OCR,SIFT descriptor,UNIX diff-like approach,VisualDiff,change detection,contractual agreements,contractual language,document image verification,enron revision dataset,image binarization,image blur,image resolution,important legal documents,realistic revisions,Change Detection,Document Image,Document Verification
Computer vision,Scale-invariant feature transform,Change detection,Feature detection (computer vision),Pattern recognition,Computer science,Document image processing,Unix,Artificial intelligence,Image resolution,Snapshot (computer storage),Tracing
Conference
ISSN
Citations 
PageRank 
1520-5363
2
0.39
References 
Authors
8
2
Name
Order
Citations
PageRank
Rajiv Jain145.16
David Doermann24313312.70