Title
Multi-core Meta-blocking for Big Linked Data.
Abstract
Discovering matching entities in different Knowledge Bases constitutes a core task in the Linked Data paradigm. Due to its quadratic time complexity, Entity Resolution typically scales to large datasets through blocking, which restricts comparisons to similar entities. For Big Linked Data, Meta-blocking is also needed to restructure the blocks in a way that boosts precision, while maintaining high recall. Based on blocking and Meta-blocking, JedAI Toolkit implements an end-to-end ER workflow for both relational and RDF data. However, its bottleneck is the time-consuming procedure of Meta-blocking, which iterates over all comparisons in each block. To accelerate it, we present a suite of parallelization techniques that are suitable for multi-core processors. We present 2 categories of parallelization strategies, with each one comprising 4 different approaches that are orthogonal to Meta-blocking algorithms. We perform extensive experiments over a real dataset with 3.4 million entities and 13 billion comparisons, demonstrating that our methods can process it within few minutes, achieving high speedup.
Year
Venue
Field
2017
SEMANTICS
Data mining,Bottleneck,Suite,Computer science,Linked data,Time complexity,Workflow,Multi-core processor,RDF,Speedup
DocType
Citations 
PageRank 
Conference
1
0.36
References 
Authors
18
4
Name
Order
Citations
PageRank
George Papadakis1584.27
Konstantina Bereta27912.13
Themis Palpanas3113691.61
Manolis Koubarakis42790322.32