Abstract | ||
---|---|---|
In this paper we propose and implement a method for detecting duplicate documents in very large image databases. The method is based on a robust "signature" extracted from each document image which is used to index into a table of previously processed documents. The approach has a number of advantages over OCR or other recognition based methods including speed and robustness to imaging distortions. To justify the approach and test the scalability, we have developed a simulator which allows us to change parameters of the system and examine performance for millions of document signatures. A complete system is implemented and tested on a test collection of technical articles and memos. |
Year | DOI | Venue |
---|---|---|
1998 | 10.1016/S0262-8856(98)00054-7 | Image and Vision Computing |
Keywords | Field | DocType |
robustness,database systems,system testing,indexation,coding,data bases,systems approach,queueing theory,distortion,feature extraction,distributed environment,indexes,collection,image retrieval,image recognition,simulators | Data mining,Document imaging,Distributed Computing Environment,Computer science,Search engine indexing,Image quality,Image processing,Robustness (computer science),Shape coding,Database,Scalability | Journal |
Volume | Issue | ISSN |
16 | 12-13 | Image and Vision Computing |
ISBN | Citations | PageRank |
0-8186-7898-4 | 34 | 4.23 |
References | Authors | |
4 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
David Doermann | 1 | 4313 | 312.70 |
Huiping Li | 2 | 98 | 12.58 |
Omid E. Kia | 3 | 66 | 11.12 |