Title
Identifying Hierarchical Structures in Sequences on GPU
Abstract
Identifying hierarchical structures in sequences is an important problem with applications in lossless data-compression to program profiling. A popular algorithm for identifying hierarchical structures in sequences is the Sequitur algorithm developed by Nevill-Manning and Witten. Sequitur is not just a compression algorithm, it attempts to learn the hierarchical structure of the input sequence as a context-free grammar. However, Sequitur is difficult to parallelize. Inspired by Sequitur, we have developed a new GPU algorithm, that reveals the hierarchical structure in sequences and is also concurrency-friendly. Our algorithm, Pequitur, is built as a series of fast kernels (for intermittent synchronization), where each kernel attempts to minimize inter-thread communication and achieve a good load balance among the GPU threads. As opposed to Sequitur, Pequitur follows a greedy strategy to find good productions, that are productions formed by long and frequent substrings. We have implemented and evaluated our algorithm on the NVIDIA K20c card on random strings drawn from multiple distributions. On our benchmarks, Pequitur achieves an average speedup of more than 3X over an optimized Sequitur implementation with similar compression ratios.
Year
DOI
Venue
2015
10.1109/Trustcom.2015.609
TrustCom/BigDataSE/ISPA
Keywords
Field
DocType
hierarchical structure,GPU,lossless data-compression,program profiling,context-free grammar,load balance,GPU thread,NVIDIA K20c card,Sequitur algorithm
Kernel (linear algebra),Substring,Computer science,Grammar-based code,Load balancing (computing),Parallel computing,Sequitur algorithm,Theoretical computer science,Data compression,Speedup,Lossless compression
Conference
Volume
ISSN
Citations 
3
2324-9013
1
PageRank 
References 
Authors
0.37
11
3
Name
Order
Citations
PageRank
Prashant Jalan110.70
Arihant Kumar Jain210.70
Subhajit Roy34510.84