Abstract | ||
---|---|---|
Entity resolution (ER) is a problem that arises in many information integration scenarios: We have two or more sources containing records on the same set of real-world entities (e.g., customers).However, there are no unique identifiers that tell us what records from one source correspond to those in the other sources.Furthermore, the records representing the same entity may have differing information, e.g., one record may have the address misspelled, another record may be missing some fields.An ER algorithm attempts to identify the matching records from multiple sources (i.e., those corresponding to the same real-world entity), and merges the matching records as best it can.In many ER applications the input data has data quality or uncertainty values associated with it. Furthermore, the ER process itself introduces additional uncertainties, e.g., we may only be 90% confident that two given records actually correspond to the same real-world entity.In this talk Hector Garcia-Molina will discuss the challenges in representing quality/uncertainty/confidences in a way that is useful for the ER process.He will also present some preliminary ideas on how to perform ER with uncertain data. (This work is joint with Omar Benjelloun, David Menestrina, Qi Su, and Jennifer Widom). |
Year | DOI | Venue |
---|---|---|
2005 | 10.1145/1077501.1077503 | IQIS |
Keywords | Field | DocType |
er algorithm attempt,input data,entity resolution,uncertain data,real-world entity,matching record,data quality,er application,additional uncertainty,er process,information integration | Information integration,Data mining,Name resolution,Data quality,Information retrieval,Computer science,Uncertain data,Unique identifier | Conference |
ISBN | Citations | PageRank |
1-59593-160-0 | 1 | 0.35 |
References | Authors | |
1 | 1 |
Name | Order | Citations | PageRank |
---|---|---|---|
Héctor García-Molina | 1 | 24359 | 5652.13 |