The Detection of Duplicates in Document Image Databases - Citegraph

Paper Info

Title
The Detection of Duplicates in Document Image Databases

Abstract
In this paper we propose and implement a method for detecting duplicate documents in very large image databases. The method is based on a robust "signature" extracted from each document image which is used to index into a table of previously processed documents. The approach has a number of advantages over OCR or other recognition based methods including speed and robustness to imaging distortions. To justify the approach and test the scalability, we have developed a simulator which allows us to change parameters of the system and examine performance for millions of document signatures. A complete system is implemented and tested on a test collection of technical articles and memos.

Year	DOI	Venue
1998	10.1016/S0262-8856(98)00054-7	Image and Vision Computing
Keywords	Field	DocType
robustness,database systems,system testing,indexation,coding,data bases,systems approach,queueing theory,distortion,feature extraction,distributed environment,indexes,collection,image retrieval,image recognition,simulators	Data mining,Document imaging,Distributed Computing Environment,Computer science,Search engine indexing,Image quality,Image processing,Robustness (computer science),Shape coding,Database,Scalability	Journal
Volume	Issue	ISSN
16	12-13	Image and Vision Computing
ISBN	Citations	PageRank
0-8186-7898-4	34	4.23
References	Authors
4	3

Authors (3 rows)

Cited by (34 rows)

References (4 rows)

Name	Order	Citations	PageRank
David Doermann	1	4313	312.70
Huiping Li	2	98	12.58
Omid E. Kia	3	66	11.12

1