Abstract | ||
---|---|---|
Record Linkage (RL) is an important component of data cleaning and integration and data processing in general. For years, many efforts have focused on improving the performance of the RL process, either by reducing the number of record comparisons or reducing the number of attribute comparisons, which reduces the computational time, but increases the amount of error. However, the real bottleneck of RL is the post-process, where the results have to be reviewed by experts that decide which pairs or groups of records are real links and which are false hits. In this paper we show that exploiting the semantic relationships (e.g. foreign key), established between one or more data sources, makes it possible to find a new sort of semantic blocking method that improves the number of hits and reduces the amount of review effort. |
Year | Venue | Keywords |
---|---|---|
2007 | CCIA | foreign key,rl process,real link,false hit,real bottleneck,computational time,attribute comparison,semantic relationship,record linkage,data source,data integration,data cleansing,data processing |
Field | DocType | Volume |
Data integration,Data mining,Bottleneck,Record linkage,Data cleansing,Data processing,Information retrieval,Computer science,sort,Foreign key,Semantic information | Conference | 163 |
ISSN | Citations | PageRank |
0922-6389 | 0 | 0.34 |
References | Authors | |
4 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Jordi Nin | 1 | 311 | 26.53 |
Victor Muntes-Mulero | 2 | 38 | 4.06 |
Norbert Martínez-Bazan | 3 | 106 | 5.55 |
Josep-Lluis Larriba-Pey | 4 | 51 | 3.37 |