Title
Data Integration Using Data Mining Techniques
Abstract
Database integration provides integrated access to multiple data sources. Database integration has two main activities: schema integration (forming a global view of the data contents available in the sources) and data integration (transforming source data into a uniform format). This paper focuses on automating the aspect of data integration known as entity identification using data mining techniques. Once a global database is formed of all the transformed source data, there may be multiple instances of the same entity, with different values for the global attributes, and no global identifier to simplify the process of entity identification. We implement decision trees and k-NN as classification techniques, and we introduce a preprocessing step to cluster the data using conceptual hierarchies. We conduct a performance study using a small testbed and varying parameters such as training set size and number of unique entities to study processing speed and accuracy tradeoffs. We find that clustering is a promising technique for improving processing speed, and that decision trees generally have faster processing time but lower accuracy than k-NN in some cases.
Year
Venue
Keywords
2002
FLAIRS Conference
data integration,data mining techniques,data mining,database integration,data integrity,decision tree
Field
DocType
ISBN
Data warehouse,Data integration,Data mining,Ontology-based data integration,Data modeling,Identifier,Source data,Computer science,Database design,Artificial intelligence,Cluster analysis,Machine learning
Conference
1-57735-141-X
Citations 
PageRank 
References 
0
0.34
10
Authors
4
Name
Order
Citations
PageRank
Karen C. Davis124739.78
Krishnamoorthy Janakiraman200.34
Ali Minai300.68
Robert B. Davis4606.16