Parallel Top-K Query Processing On Uncertain Strings Using Mapreduce - Citegraph

Paper Info

Title
Parallel Top-K Query Processing On Uncertain Strings Using Mapreduce

Abstract
Top-k query is an important and essential operator for data analysis over string collections. However, when uncertainty comes into big data, it calls for new parallel algorithms for efficient query processing on large scale uncertain strings. In this paper, we proposed a MapReduce-based parallel algorithm, called MUSK, for answering top-k queries over large scale uncertain strings. We used the probabilistic n-grams to generate key-value pairs. To improve the performance, a novel lower bound for expected edit distance was derived to prune strings based on a new defined function gram mapping distance. By integrating the bound with TA, the filtering power in the Map stage was optimized effectively to decrease the transmission cost. Comprehensive experimental results on both real-world and synthetic datasets showed that MUSK outperformed the baseline approach with speeds up to 6 times in the best case, which indicated good scalability over large datasets.

Year	DOI	Venue
2015	10.1007/978-3-319-18123-3_6	DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, DASFAA 2015, PT II
Field	DocType	Volume
Edit distance,Computer science,Upper and lower bounds,Parallel algorithm,Filter (signal processing),Theoretical computer science,Operator (computer programming),Probabilistic logic,Big data,Database,Scalability	Conference	9050
ISSN	Citations	PageRank
0302-9743	1	0.35
References	Authors
13	4

Authors (4 rows)

Cited by (1 rows)

References (13 rows)

Name	Order	Citations	PageRank
Hui Xu	1	212	29.73
Xiao Feng Ding	2	149	11.52
Hai Jin	3	6544	644.63
Wenbin Jiang	4	355	36.55

1