Title
Multi-Source Uncertain Entity Resolution at Yad Vashem: Transforming Holocaust Victim Reports into People.
Abstract
In this work we describe an entity resolution project performed at Yad Vashem, the central repository of Holocaust-era information. The Yad Vashem dataset is unique with respect to classic entity resolution, by virtue of being both massively multi-source and by requiring multi-level entity resolution. With today's abundance of information sources, this project sets an example for multi-source resolution on a big-data scale. We discuss a set of requirements that led us to choose the MFIBlocks entity resolution algorithm in achieving the goals of the application. We also provide a machine learning approach, based upon decision trees to transform soft clusters into ranked clustering of records, representing possible entities. An extensive empirical evaluation demonstrates the unique properties of this dataset, highlighting the shortcomings of current methods and proposing avenues for future research in this realm.
Year
DOI
Venue
2016
10.1145/2882903.2903737
SIGMOD/PODS'16: International Conference on Management of Data San Francisco California USA June, 2016
Field
DocType
ISBN
Decision tree,Data mining,Name resolution,Ranking,Realm,Computer science,The Holocaust,Cluster analysis,Multi-source,Database
Conference
978-1-4503-3531-7
Citations 
PageRank 
References 
0
0.34
14
Authors
5
Name
Order
Citations
PageRank
Tomer Sagi1615.50
Avigdor Gal2232.45
Omer Barkol31027.78
Ruth Bergman4457.05
Alexander Avram500.68