Title
A Multi-Way Semi-Stream Join For A Near-Real-Time Data Warehouse
Abstract
Semi-stream processing, the operation of joining a stream of data with non-stream disk-based master data, is a crucial component of near real-time data warehousing. The requirements for semi-stream joins are fast, accurate processing and the ability to function well with limited memory. Currently, semi-stream algorithms presented in the literature such as MeshJoin, Semi-Stream Index Join and CacheJoin can join only one foreign key in the stream data with one table in the master data. However, it is quite likely that stream data have multiple foreign keys that need to join with multiple tables in the master data. We extend CacheJoin to form three new possibilities for multi-way semi-stream joins, namely Sequential, Semi-concurrent, and Concurrent joins. Initially, the new algorithms can join two foreign keys in the stream data with two tables in the master data. However, these algorithms can be easily generalized to join with any number of tables in the master data. We evaluated the performance of all three algorithms, and our results show that the semi-concurrent architecture performs best under the same scenario.
Year
DOI
Venue
2017
10.1007/978-3-319-68155-9_5
DATABASES THEORY AND APPLICATIONS, ADC 2017
Keywords
Field
DocType
Multi-way stream processing, Join operator, Near-real-time data warehouse
Data warehouse,Data mining,Joins,Architecture,Computer science,Stream data,Master data,Sort-merge join,Foreign key,Database
Conference
Volume
ISSN
Citations 
10538
0302-9743
1
PageRank 
References 
Authors
0.34
9
3
Name
Order
Citations
PageRank
M. Asif Naeem110219.73
Kim Tung Nguyen210.34
Gerald Weber3799.13