Title
Learning domain-independent string transformation weights for high accuracy object identification
Abstract
The task of object identification occurs when integrating information from multiple websites. The same data objects can exist in inconsistent text formats across sites, making it difficult to identify matching objects using exact text match. Previous methods of object identification have required manual construction of domain-specific string transformations or manual setting of general transformation parameter weights for recognizing format inconsistencies. This manual process can be time consuming and error-prone. We have developed an object identification system called Active Atlas [18], which applies a set of domain-independent string transformations to compare the objects' shared attributes in order to identify matching objects. In this paper, we discuss extensions to the Active Atlas system, which allow it to learn to tailor the weights of a set of general transformations to a specific application domain through limited user input. The experimental results demonstrate that this approach achieves higher accuracy and requires less user involvement than previous methods across various application domains.
Year
DOI
Venue
2002
10.1145/775047.775099
KDD
Keywords
Field
DocType
domain-independent string transformation,manual process,high accuracy object identification,active atlas system,manual setting,data object,previous method,object identification,active atlas,manual construction,domain-independent string transformation weight,object identification system,lifetime value
Data mining,Customer lifetime value,Computer science,Identification system,Application domain,Artificial intelligence,Data objects,Machine learning
Conference
ISBN
Citations 
PageRank 
1-58113-567-X
140
10.71
References 
Authors
15
3
Search Limit
100140
Name
Order
Citations
PageRank
Sheila Tejada170485.55
Craig A. Knoblock25229680.57
Steven Minton33473536.74