Abstract | ||
---|---|---|
Text analytics directly on compression (TADOC) has proven to be a promising technology for big data analytics. GPUs are extremely popular accelerators for data analytics systems. Unfortunately, no work so far shows how to utilize GPUs to accelerate TADOC. We describe G-TADOC, the first framework that provides GPU-based text analytics directly on compression, effectively enabling efficient text analytics on GPUs without decompressing the input data.G-TADOC solves three major challenges. First, TADOC involves a large amount of dependencies, which makes it difficult to exploit massive parallelism on a GPU. We develop a novel fine-grained thread-level workload scheduling strategy for GPU threads, which partitions heavily-dependent loads adaptively in a fine-grained manner. Second, in developing G-TADOC, thousands of GPU threads writing to the same result buffer leads to inconsistency while directly using locks and atomic operations lead to large synchronization overheads. We develop a memory pool with thread-safe data structures on GPUs to handle such difficulties. Third, maintaining the sequence information among words is essential for lossless compression. We design a sequence-support strategy, which maintains high GPU parallelism while ensuring sequence information.Our experimental evaluations show that G-TADOC provides 31.1 x average speedup compared to state-of-the-art TADOC. |
Year | DOI | Venue |
---|---|---|
2021 | 10.1109/ICDE51399.2021.00148 | 2021 IEEE 37TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2021) |
Keywords | DocType | ISSN |
TADOC, GPU, parallelism, analytics on compressed data | Conference | 1084-4627 |
Citations | PageRank | References |
1 | 0.35 | 0 |
Authors | ||
7 |
Name | Order | Citations | PageRank |
---|---|---|---|
Feng Zhang | 1 | 79 | 14.36 |
Zaifeng Pan | 2 | 1 | 0.35 |
Yanliang Zhou | 3 | 1 | 0.35 |
Jidong Zhai | 4 | 340 | 36.27 |
Xipeng Shen | 5 | 2025 | 118.55 |
Onur Mutlu | 6 | 9446 | 357.40 |
Xiaoyong Du | 7 | 882 | 123.29 |