Title | ||
---|---|---|
Learning Explainable Entity Resolution Algorithms for Small Business Data using SystemER |
Abstract | ||
---|---|---|
The 2019 FEIII CALI data challenge aims at linking different representations of the same real-world entities across multiple public datasets that collect identification and activity data about small to medium enterprises (SMEs) in California. We formalize this challenge as a learning-based entity resolution (ER) task, the goal of which is to learn a high-precision and high-recall pair-wise ER model that classifies small business entity pairs into matches and non-matches. Realistic ER tasks usually involve a pipeline of laborintensive and error-prone tasks, such as data preprocesing, gathering of training data, feature engineering, and model tuning. In this task, we apply an advanced human-in-the-loop system, named SystemER, to learn ER algorithms for SME entities. Powered by active learning and via a carefully designed user interface, SystemER can learn high-quality explainable ER algorithms with low human effort, while achieving high-accuracy on the datasets provided by the FEIII CALI data challenge.
|
Year | DOI | Venue |
---|---|---|
2019 | 10.1145/3336499.3338010 | Proceedings of the 5th Workshop on Data Science for Macro-modeling with Financial and Economic Datasets |
Keywords | DocType | ISBN |
Entity resolution, SystemER, human-in-the-loop, small business | Conference | 978-1-4503-6823-0 |
Citations | PageRank | References |
0 | 0.34 | 0 |
Authors | ||
4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Kun Qian | 1 | 4 | 1.08 |
Douglas Burdick | 2 | 226 | 18.54 |
Sairam Gurajada | 3 | 118 | 7.83 |
Ling-ling Yan | 4 | 1273 | 70.78 |