Finding Groups Of Duplicate Images In Very Large Datasets - Citegraph

Paper Info

Title
Finding Groups Of Duplicate Images In Very Large Datasets

Abstract
This paper addresses the problem of detecting groups of duplicates in large-scale unstructured image datasets such as the Internet. Leveraging the recent progress in data mining, we propose an efficient approach based on the search of closed patterns. Moreover, we present a novel way to encode the bag-of-words image representation into data mining transactions. We validate our approach on a new dataset of one million Internet images obtained with random searches on Google image search. Using the proposed method, we find more than 80 thousands groups of duplicates among the one million images in less than three minutes while using only 150 Megabytes of memory. Unlike other existing approaches, our method can scale gracefully to larger datasets as it has linear time and space (memory) complexities. Furthermore, the approach does not need (to build or use) any precomputed indexing structure.

Year	DOI	Venue
2012	10.5244/C.26.105	PROCEEDINGS OF THE BRITISH MACHINE VISION CONFERENCE 2012
Field	DocType	Citations
Data mining,ENCODE,Information retrieval,Megabyte,Computer science,Image representation,Search engine indexing,Time complexity,The Internet	Conference	1
PageRank	References	Authors
0.35	19	3

Authors (3 rows)

Cited by (1 rows)

References (19 rows)

Name	Order	Citations	PageRank
Winn Voravuthikunchai	1	17	1.57
Bruno Crémilleux	2	373	34.98
Frédéric Jurie	3	3924	235.82

1