SDLER: stacked dedupe learning for entity resolution in big data era - Citegraph

Paper Info

Title
SDLER: stacked dedupe learning for entity resolution in big data era

Abstract
In the Big Data Era, Entity Resolution (ER) faces many challenges such as high scalability, the coexistence of complex similarity metrics, tautonymy and synonym, and the requirement of Data Quality Evaluation. Moreover, despite more than seventy years of development efforts, there is still a high demand for democratizing ER to reduce human participation in tuning parameters, data labeling, defining blocking functions, and feature engineering. This study aimed to explore a novel Stacked Dedupe Learning ER system with high accuracy and efficiency. The study evaluated sophisticated composition methods, such as Bidirectional Recurrent Neural Networks (BiRNNs) and Long Short-Term Memory (LSTM) hidden units, to renovate each tuple to word representation distribution in a sense to capture similarities amidst tuples. Also, pre-trained words embedding where they were not available, ways to learn and tune Word Representation Distribution customized for ER tasks under different scenarios were considered. More so, the Locality Sensitive Hashing (LSH) based blocking approach, which considered the entire attributes of a tuple and produced slighter blocks, compared with traditional methods with few attributes, were assessed. The algorithm was tested on multiple datasets namely benchmarks, and multi-lingual data. The experimental results showed that Stacked Dedupe Learning achieves high quality and good performance, and scales well compared to the existing solutions.

Year	DOI	Venue
2021	10.1007/s11227-021-03710-x	The Journal of Supercomputing
Keywords	DocType	Volume
Bidirectional RNN, Big data, Data quality, Entity resolution, Stacked Dedupe Learning (SDL), Word Representation Distribution (WRD)	Journal	77
Issue	ISSN	Citations
10	0920-8542	1
PageRank	References	Authors
0.36	16	4

Authors (4 rows)

Cited by (1 rows)

References (16 rows)

Name	Order	Citations	PageRank
Alladoumbaye Ngueilbaye	1	2	1.10
Hongzhi Wang	2	421	73.72
Daouda Ahmat Mahamat	3	1	0.36
Elgendy Ibrahim	4	39	5.42

1