Abstract | ||
---|---|---|
Semi-stream join algorithms join a fast stream input with a disk-based master data relation. A common class of these algorithms is derived from hash joins: they use the stream as build input for a main hash table, and also include a cache for frequent master data. The composition of the cache is very important for performance; however, the decision of which master data to cache has so far been solely based on heuristics. We present the first formal criterion, a cache inequality that leads to a provably optimal composition of the cache in a semi-stream many-to-many equijoin algorithm. We propose a novel algorithm, Semi-Stream Balanced Join (SSBJ), which exploits this cache inequality to achieve a given service rate with a provably minimal amount of memory for all stream distributions. We present a cost model for SSBJ and compare its service rate empirically and analytically with other related approaches. |
Year | DOI | Venue |
---|---|---|
2019 | 10.1007/s10619-018-7247-z | Distributed and Parallel Databases |
Keywords | Field | DocType |
Many-to-many semi-stream join, Cache optimization, Performance evaluation | Joins,Cache,Computer science,Master data,Exploit,Theoretical computer science,Heuristics,Hash function,Many-to-many (data model),Distributed computing,Hash table | Journal |
Volume | Issue | ISSN |
37.0 | 4 | 1573-7578 |
Citations | PageRank | References |
1 | 0.39 | 29 |
Authors | ||
3 |
Name | Order | Citations | PageRank |
---|---|---|---|
M. Asif Naeem | 1 | 102 | 19.73 |
Gerald Weber | 2 | 79 | 9.13 |
Christof Lutteroth | 3 | 336 | 46.62 |