Title | ||
---|---|---|
Variable-to-Fixed-Length Encoding for Large Texts Using Re-Pair Algorithm with Shared Dictionaries |
Abstract | ||
---|---|---|
The Re-Pair algorithm proposed by Larsson and Moffat in 1999 is a simple grammar-based compression method that achieves an extremely high compression ratio. However, Re-Pair is an offline and very space consuming algorithm. Thus, to apply it to a very large text, we need to divide the text into smaller blocks. Consequently, if we share a part of the dictionary among all blocks, we expect that the compression speed and ratio of the algorithm will improve. In this paper, we implemented our method with exploiting variable-to-fixed-length codes, and empirically show how the compression speed and ratio of the method vary by adjusting three parameters: block size, dictionary size, and size of shared dictionary. Finally, we discuss the tendencies of compression speed and ratio with respect to the three parameters. |
Year | DOI | Venue |
---|---|---|
2013 | 10.1109/DCC.2013.97 | Data Compression Conference |
Keywords | Field | DocType |
empirically show,re-pair algorithm,shared dictionaries,simple grammar-based compression method,space consuming algorithm,compression speed,shared dictionary,large text,block size,variable-to-fixed-length encoding,high compression ratio,dictionary size,large texts,grammar,data compression,encoding,grammars,information science,workstations,dictionaries,compression ratio | Block size,Rule-based machine translation,Incremental encoding,Grammar-based code,Computer science,Algorithm,Theoretical computer science,Compression ratio,Data compression,Lossless compression,Encoding (memory) | Conference |
ISSN | ISBN | Citations |
1068-0314 | 978-1-4673-6037-1 | 1 |
PageRank | References | Authors |
0.39 | 0 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Kei Sekine | 1 | 3 | 0.78 |
Hirohito Sasakawa | 2 | 7 | 1.85 |
Satoshi Yoshida | 3 | 3 | 0.78 |
Takuya Kida | 4 | 271 | 23.56 |