Title
A framework for entity resolution with efficient blocking
Abstract
In applications of Web data integration, we frequently need to identify whether data objects in different data sources represent the same entity in the real world. This problem is known as entity resolution. In this paper, we propose a generic framework for entity resolution for relational data sets, called BARM, consisting of the Blocker, Attribute matchers and the Record Matcher. BARM is convenient for different blocking and matching algorithms to fit into it. For the blocker, we apply the SPectrAl Neighborhood (SPAN), a state-of-the-art blocking algorithm, to our data sets and show that SPAN is effective and efficient. For attribute matchers, we propose the Context Sensitive Value Matching Library (CSVML) for matching attribute values and also an approach to evaluate the goodness of matching functions. CSVML takes the meaning and context of attribute values into consideration and therefore has good performance, as shown in experimental results. We adopt Bayesian network as the record matcher in the framework and propose a method of inference from Bayesian network based on Markov blanket of the network. As a comparison, we also apply three other classifiers, including Decision Tree, Support Vector Machines, and the Naive Bayes classifier to our data sets. Experiments show that Bayesian network is advantageous in the book domain.
Year
DOI
Venue
2012
10.1109/IRI.2012.6303041
IRI
Keywords
Field
DocType
relational databases,web data integration,entity resolution,barm,inference method,bayes methods,record matcher,csvml,pattern matching,spectral neighborhood,internet,relational data set,attribute matcher,markov blanket,blocker,context sensitive value matching library,markov processes,bayesian network,data integration,attribute value matching,bayesian methods,databases,vectors,erbium,sparse matrices
Data integration,Decision tree,Data mining,Naive Bayes classifier,Computer science,Support vector machine,Bayesian network,Markov blanket,Artificial intelligence,Pattern matching,Machine learning,Bayesian probability
Conference
ISBN
Citations 
PageRank 
978-1-4673-2283-6
1
0.38
References 
Authors
18
6
Name
Order
Citations
PageRank
Liangcai Shu1824.35
Can Lin240.87
Weiyi Meng32722514.77
Yue Han4361.88
Clement T. Yu531711419.96
Neil R. Smalheiser665857.50