Title
Incremental entity resolution process over query results for data integration systems.
Abstract
Entity Resolution (ER) in data integration systems is the problem of identifying groups of tuples from one or multiple data sources that represent the same real-world entity. This is a crucial stage of data integration processes, which often need to integrate data at query-time. This task becomes even more challenging in scenarios with dynamic data sources or when a large volume of data needs to be integrated. Then, to deal with large volumes of data, new ER solutions have been proposed. One possible approach consists in performing the ER process over query results rather than in the whole set of tuples being integrated. Additionally, previous results of ER tasks can be reused in order to reduce the number of comparisons between pairs of tuples at query-time. In a similar way, indexing techniques can also be employed to help the identification of equivalent tuples and to reduce the number of comparisons between pairs of tuples. In this context, this work proposes an incremental ER process over query results. The contributions of this work are the specification, the implementation and the evaluation of the proposed incremental process. We performed some experiments and we concluded that the incremental ER at query-time is more efficient than traditional ER processes.
Year
DOI
Venue
2019
10.1007/s10844-019-00544-1
Journal of Intelligent Information Systems
Keywords
Field
DocType
Data integration, Entity resolution, Record linkage, Duplicate detection, Incremental entity resolution.
Data integration,Record linkage,Data mining,Duplicate detection,Multiple data,Name resolution,Tuple,Computer science,Search engine indexing,Dynamic data
Journal
Volume
Issue
ISSN
52
2
0925-9902
Citations 
PageRank 
References 
0
0.34
25
Authors
3