Title
Comparative Analysis of Approximate Blocking Techniques for Entity Resolution.
Abstract
Entity Resolution is a core task for merging data collections. Due to its quadratic complexity, it typically scales to large volumes of data through blocking: similar entities are clustered into blocks and pair-wise comparisons are executed only between co-occurring entities, at the cost of some missed matches. There are numerous blocking methods, and the aim of this work is to offer a comprehensive empirical survey, extending the dimensions of comparison beyond what is commonly available in the literature. We consider 17 state-of-the-art blocking methods and use 6 popular real datasets to examine the robustness of their internal configurations and their relative balance between effectiveness and time efficiency. We also investigate their scalability over a corpus of 7 established synthetic datasets that range from 10,000 to 2 million entities.
Year
Venue
Field
2016
PROCEEDINGS OF THE VLDB ENDOWMENT
Data mining,Quadratic complexity,Name resolution,Computer science,Robustness (computer science),Blocking techniques,Artificial intelligence,Empirical survey,Merge (version control),Machine learning,Scalability
DocType
Volume
Issue
Journal
9
9
ISSN
Citations 
PageRank 
2150-8097
21
0.71
References 
Authors
24
4
Name
Order
Citations
PageRank
George Papadakis1584.27
Jonathan Svirsky2210.71
Avigdor Gal3232.45
Themis Palpanas4113691.61