CrowdOLA: Online Aggregation on Duplicate Data Powered by Crowdsourcing. - Citegraph

Paper Info

Title
CrowdOLA: Online Aggregation on Duplicate Data Powered by Crowdsourcing.

Abstract
Recently there is an increasing need for interactive human-driven analysis on large volumes of data. Online aggregation (OLA), which provides a quick sketch of massive data before a long wait of the final accurate query result, has drawn significant research attention. However, the direct processing of OLA on duplicate data will lead to incorrect query answers, since sampling from duplicate records leads to an over representation of the duplicate data in the sample. This violates the prerequisite of uniform distributions in most statistical theories. In this paper, we propose CrowdOLA, a novel framework for integrating online aggregation processing with deduplication. Instead of cleaning the whole dataset, CrowdOLA retrieves block-level samples continuously from the dataset, and employs a crowd-based entity resolution approach to detect duplicates in the sample in a pay-as-you-go fashion. After cleaning the sample, an unbiased estimator is provided to address the error bias that is introduced by the duplication. We evaluate CrowdOLA on both real-world and synthetic workloads. Experimental results show that CrowdOLA provides a good balance between efficiency and accuracy.

Year	DOI	Venue
2018	10.1007/s11390-018-1824-5	J. Comput. Sci. Technol.
Keywords	Field	DocType
online aggregation, entity resolution, crowdsourcing, cloud computing	Data deduplication,Data mining,Name resolution,Crowdsourcing,Computer science,Bias of an estimator,Sampling (statistics),Online aggregation,Cloud computing,Distributed computing,Sketch	Journal
Volume	Issue	ISSN
33	2	1000-9000
Citations	PageRank	References
1	0.36	23
Authors
6

Authors (6 rows)

Cited by (1 rows)

References (23 rows)

Name	Order	Citations	PageRank
Anzhen Zhang	1	2	1.39
Jianzhong Li	2	63	24.23
Hong Gao	3	1086	120.07
Yu-Biao Chen	4	1	0.36
Heng-Zhao Ma	5	1	1.71
Mohamed Jaward Bah	6	7	0.77

1