Title
Learning Explainable Entity Resolution Algorithms for Small Business Data using SystemER
Abstract
The 2019 FEIII CALI data challenge aims at linking different representations of the same real-world entities across multiple public datasets that collect identification and activity data about small to medium enterprises (SMEs) in California. We formalize this challenge as a learning-based entity resolution (ER) task, the goal of which is to learn a high-precision and high-recall pair-wise ER model that classifies small business entity pairs into matches and non-matches. Realistic ER tasks usually involve a pipeline of laborintensive and error-prone tasks, such as data preprocesing, gathering of training data, feature engineering, and model tuning. In this task, we apply an advanced human-in-the-loop system, named SystemER, to learn ER algorithms for SME entities. Powered by active learning and via a carefully designed user interface, SystemER can learn high-quality explainable ER algorithms with low human effort, while achieving high-accuracy on the datasets provided by the FEIII CALI data challenge.
Year
DOI
Venue
2019
10.1145/3336499.3338010
Proceedings of the 5th Workshop on Data Science for Macro-modeling with Financial and Economic Datasets
Keywords
DocType
ISBN
Entity resolution, SystemER, human-in-the-loop, small business
Conference
978-1-4503-6823-0
Citations 
PageRank 
References 
0
0.34
0
Authors
4
Name
Order
Citations
PageRank
Kun Qian141.08
Douglas Burdick222618.54
Sairam Gurajada31187.83
Ling-ling Yan4127370.78