Abstract | ||
---|---|---|
In-memory OLAP systems require a space-efficient representation of sparse data cubes in order to accommodate large data sets.
On the other hand, most efficient online aggregation techniques, such as prefix sums, are built on dense array-based representations.
These are often not applicable to real-world data due to the size of the arrays which usually cannot be compressed well, as
most sparsity is removed during pre-processing. A possible solution is to identify dense regions in a sparse cube and only
represent those using arrays, while storing sparse data separately, e.g. in a spatial index structure. Previous dense-region-based
approaches have concentrated mainly on the effectiveness of the dense-region detection (i.e. on the space-efficiency of the result). However, especially in higher-dimensional cubes,
data is usually more cluttered, resulting in a potentially large number of small dense regions, which negatively affects query
performance on such a structure. In this paper, our focus is not only on space-efficiency but also on time-efficiency, both
for the initial dense-region extraction and for queries carried out in the resulting hybrid data structure. We describe two
methods to trade available memory for increased aggregate query performance. In addition, optimizations in our approach significantly
reduce the time to build the initial data structure compared to former systems. Also, we present a straightforward adaptation
of our approach to support multi-core or multi-processor architectures, which can further enhance query performance. Experiments
with different real-world data sets show how various parameter settings can be used to adjust the efficiency and effectiveness
of our algorithms.
|
Year | DOI | Venue |
---|---|---|
2010 | 10.1007/978-3-642-03730-6_15 | Transactions on large-scale data- and knowledge-centered systems II |
Keywords | Field | DocType |
query performance,different real-world data set,hybrid data structure,initial data structure,large data set,sparse data,sparse data cube,spatial index structure,dense array-based representation,dense region,Dense-Region-Based Data Cube Representations,Efficient Online Aggregates | Data structure,Data mining,Data set,Computer science,Range query (data structures),Online aggregation,Online analytical processing,Database,Data cube,Sparse matrix,Spatial database | Journal |
Volume | ISSN | ISBN |
5691 | 0302-9743 | 3-642-16174-X |
Citations | PageRank | References |
1 | 0.38 | 15 |
Authors | ||
2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Kais Haddadin | 1 | 1 | 0.38 |
Tobias Lauer | 2 | 37 | 8.65 |