Utilizing the column imprints to accelerate no‐partitioning hash joins in large‐scale edge systems - Citegraph

Paper Info

Title
Utilizing the column imprints to accelerate no‐partitioning hash joins in large‐scale edge systems

Abstract
AbstractAbstractWith the increasing number of edge devices in large‐scale edge systems, more and more data are collected to be processed. In such big data scenarios, there is a resurgence of interest in main‐memory analytic databases because of the large RAM capacity of modern servers and the increasing demand for real‐time analytic platforms. In such databases, join is at the heart of almost every query plan. Join also stays as a time‐consuming operation when the denormalization overhead is too large to be applicable. However, the current implementations of these operations have not fully leveraged the new features (eg, SIMD, multi‐core) provided by the modern hardware. The goal of this article is to design efficient algorithms for joins by judiciously exploiting every bit of RAM and all the available parallelisms in each processing unit. For join operations, hash joins have been studied, improved, and reexamined over decades. In this article, we propose to utilize a secondary index to improve hash joins without the physical partitioning. Specifically, in the build phase, the hash values are scattered evenly into the logical partitions of the hash table; in the probe phase, the secondary index is used as the hints to re‐order the probing sequence, such that the locality of the hash probing is increased. We benchmark the performance of the proposed techniques in our column‐store research prototype. Extensive experiments on the synthetic data and the real data show that our methods offer significant performance improvement over their counterparts.With the increasing number of edge devices in large‐scale edge systems, more and more data are collected to be processed. In such big data scenarios, there is a resurgence of interest in main‐memory analytic databases because of the large RAM capacity of modern servers and the increasing demand for real‐time analytic platforms. In such databases, join is at the heart of almost every query plan. Join also stays as a time‐consuming operation when the denormalization overhead is too large to be applicable. In this article, we propose to utilize a secondary index to improve hash joins without the physical partitioning and we benchmark the performance of the proposed techniques in our column‐store research prototype. View Figure

Year	DOI	Venue
2021	10.1002/ett.4084	Periodicals
DocType	Volume	Issue
Journal	32	6
ISSN	Citations	PageRank
2161-3915	0	0.34
References	Authors
0	2

Authors (2 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Yu Li	1	0	0.34
Wenjian Xu	2	2	1.74

1