Abstract | ||
---|---|---|
Currently, the computation of set similarity has become an increasingly important tool in many real-world applications, such as near-duplicate detection, data cleaning and record linkage, etc., in which sets often are uncertain due to date missing, imprecise and noise, etc. The challenge of evaluating similarity between probabilistic sets mainly stems from the exponential blowup in the number of possible worlds induced by uncertainty. In this paper, we define the probability threshold set similarity (PTSS) between two probabilistic sets based on the possible world semantics and propose an exact solution to compute PTSS via the dynamic programming. To speed up the computation of the probability threshold set query (PTSQ), we derive an efficient and effective pruning rule for PTSQ. Finally, we conduct extensive experiments to verify the effectiveness and efficiency of our algorithms using both real and synthetic datasets. |
Year | DOI | Venue |
---|---|---|
2015 | 10.1007/978-3-319-21042-1_30 | WEB-AGE INFORMATION MANAGEMENT (WAIM 2015) |
Field | DocType | Volume |
Edit distance,Data mining,Dynamic programming,Computer science,Range query (data structures),Uncertain data,Artificial intelligence,Probabilistic logic,Machine learning,Speedup,Computation,Possible world | Conference | 9098 |
ISSN | Citations | PageRank |
0302-9743 | 0 | 0.34 |
References | Authors | |
19 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Lei Wang | 1 | 333 | 76.40 |
Ming Gao | 2 | 76 | 9.41 |
Rong Zhang | 3 | 39 | 6.77 |
Cheqing Jin | 4 | 379 | 35.44 |
Aoying Zhou | 5 | 2632 | 238.85 |