Abstract | ||
---|---|---|
An active warehouse is refreshed on-line and thus achieves a higher consistency between the stored information and the latest data updates. The need for on-line warehouse refreshment introduces several challenges in the implementation of data warehouse transformations. In this article, we focus on a frequently encountered operation in this context, namely, the join of a fast stream $S$ of source updates with a disk-based relation $R$, under the constraint of limited memory. This operation lies at the core of several common transformations, such as, surrogate key assignment, duplicate detection or identification of newly inserted tuples. We propose a specialized join algorithm, termed MeshJoin , that compensates for the difference in the access cost of the two join inputs by (a) relying entirely on fast sequential scans of $R$, and (b) sharing the I/O cost of accessing $R$ across multiple tuples of $S$. We detail the MeshJoin algorithm and develop a systematic cost model that enables tuning MeshJoin based on the available memory and the desired throughput. We present an experimental study that validates the performance of MeshJoin on synthetic and real-life data. Our results verify the effectiveness of MeshJoin and demonstrate its advantages over existing join algorithms. |
Year | DOI | Venue |
---|---|---|
2008 | 10.1109/TKDE.2008.27 | IEEE Trans. Knowl. Data Eng. |
Keywords | Field | DocType |
meshjoin algorithm,real-life data,index terms— active data warehouse,active warehouse,available memory,o cost,latest data updates,join,persistent data,active data warehouse,systematic cost model,meshjoin,data warehouse transformation,relations.,streaming updates,streams,on-line warehouse refreshment,access cost,data warehousing,memory management,warehousing,data warehouses,data mining,data warehouse,production systems,indexing terms,scalability,throughput,data analysis | Data warehouse,Data mining,Persistent data structure,Tuple,Computer science,Memory management,Throughput,Data link,Surrogate key,Scalability | Journal |
Volume | Issue | ISSN |
20 | 7 | 1041-4347 |
Citations | PageRank | References |
42 | 1.43 | 35 |
Authors | ||
5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Neoklis Polyzotis | 1 | 2078 | 138.76 |
Spiros Skiadopoulos | 2 | 1139 | 65.60 |
Panos Vassiliadis | 3 | 1821 | 134.74 |
Alkis Simitsis | 4 | 1665 | 94.62 |
Nils Frantzell | 5 | 42 | 1.43 |