Title
Efficient Level-Based Top-Down Data Cube Computation Using MapReduce
Abstract
Data cube is an essential part of OLAP(On-Line Analytical Processing) to support efficiently multidimensional analysis for a large size of data. The computation of data cube takes much time, because a data cube with d dimensions consists of 2(d) (i.e., exponential order of d) cuboids. To build ROLAP (Relational OLAP) data cubes efficiently, many algorithms (e.g., GBLP, PipeSort, PipeHash, BUC, etc.) have been developed, which share sort cost and input data scan and/or reduce data computation time. Several parallel processing algorithms have been also proposed. On the other hand, MapReduce is recently emerging for the framework processing huge volume of data like web-scale data in a distributed/parallel manner by using a large number of computers (e.g., several hundred or thousands). In the MapReduce framework, the degree of parallel processing is more important to reduce total execution time than elaborate strategies like short-share and computation-reduction which existing ROLAP algorithms use. In this paper, we propose two distributed parallel processing algorithms. The first algorithm called MRLevel, which takes advantages of the MapReduce framework. The second algorithm called MRPipeLevel, which is based on the existing PipeSort algorithm which is one of the most efficient ones for top-down cube computation. (Top-down approach is more effective to handle big data, compared to others such as bottom-up and special data structures which are dependent on main-memory size.) The proposed MRLevel algorithm tries to parallelize cube computation and to reduce the number of data scan by level at the same time. The MRPipeLevel algorithm is based on the advantages of the MRLevel and to reduce the number of data scan by pipelining at the same time. We implemented and evaluated the performance of this algorithm under the MapReduce framework. Through the experiments, we also identify the factors for performance enhancement in MapReduce to process very huge data.
Year
DOI
Venue
2015
10.1007/978-3-662-47804-2_1
Lecture Notes in Computer Science
Keywords
Field
DocType
Data cube,ROLAP,MapReduce,Hadoop,Distributed parallel computing
Data structure,Computer science,Parallel computing,Multidimensional analysis,ROLAP,Online analytical processing,Big data,Data cube,Cube,Computation
Journal
Volume
ISSN
Citations 
9260
0302-9743
4
PageRank 
References 
Authors
0.50
5
4
Name
Order
Citations
PageRank
Suan Lee1397.38
Jinho Kim271.22
Yang-sae Moon348945.58
Wookey Lee419629.22