Title
A Memory-Optimal Many-To-Many Semi-Stream Join
Abstract
Semi-stream join algorithms join a fast stream input with a disk-based master data relation. A common class of these algorithms is derived from hash joins: they use the stream as build input for a main hash table, and also include a cache for frequent master data. The composition of the cache is very important for performance; however, the decision of which master data to cache has so far been solely based on heuristics. We present the first formal criterion, a cache inequality that leads to a provably optimal composition of the cache in a semi-stream many-to-many equijoin algorithm. We propose a novel algorithm, Semi-Stream Balanced Join (SSBJ), which exploits this cache inequality to achieve a given service rate with a provably minimal amount of memory for all stream distributions. We present a cost model for SSBJ and compare its service rate empirically and analytically with other related approaches.
Year
DOI
Venue
2019
10.1007/s10619-018-7247-z
Distributed and Parallel Databases
Keywords
Field
DocType
Many-to-many semi-stream join, Cache optimization, Performance evaluation
Joins,Cache,Computer science,Master data,Exploit,Theoretical computer science,Heuristics,Hash function,Many-to-many (data model),Distributed computing,Hash table
Journal
Volume
Issue
ISSN
37.0
4
1573-7578
Citations 
PageRank 
References 
1
0.39
29
Authors
3
Name
Order
Citations
PageRank
M. Asif Naeem110219.73
Gerald Weber2799.13
Christof Lutteroth333646.62