Title
Parrot: A Progressive Analysis System On Large Text Collections
Abstract
The size of textual data continues to grow along with the need for timely and cost-effective analysis, while the growth of computation power cannot keep up with the growth of data. The delays when processing huge textual data can negatively impact user activity and insight. This calls for a paradigm shift from blocking fashion to progressive processing. In this paper, we propose a sample-based progressive processing model that focuses on term frequency calculation on text. The model is based on an incremental execution engine and will calculate a series of approximate results for a single query in a progressive way to provide a smooth trade-off between accuracy and latency. As a part, we proposed a new variant of the bootstrap technique to quantify result error progressively. We implemented this method in our system called Parrot on top of Apache Spark and used real-world data to test its performance. Experiments demonstrate that our method is 2.4x-19.7x faster to get a result within 1% error while the confidence interval always covers the accurate results very well.
Year
DOI
Venue
2021
10.1007/s41019-020-00144-y
DATA SCIENCE AND ENGINEERING
Keywords
DocType
Volume
Approximate query processing, Text data analytics, Term frequency, Bootstrap
Journal
6
Issue
ISSN
Citations 
1
2364-1185
0
PageRank 
References 
Authors
0.34
0
6
Name
Order
Citations
PageRank
Yazhong Zhang100.68
Hanbing Zhang200.68
Zhenying He315816.03
Yinan Jing403.04
Kai Zhang5735.97
X. Sean WANG61168.68