Title
Variable-to-Fixed-Length Encoding for Large Texts Using Re-Pair Algorithm with Shared Dictionaries
Abstract
The Re-Pair algorithm proposed by Larsson and Moffat in 1999 is a simple grammar-based compression method that achieves an extremely high compression ratio. However, Re-Pair is an offline and very space consuming algorithm. Thus, to apply it to a very large text, we need to divide the text into smaller blocks. Consequently, if we share a part of the dictionary among all blocks, we expect that the compression speed and ratio of the algorithm will improve. In this paper, we implemented our method with exploiting variable-to-fixed-length codes, and empirically show how the compression speed and ratio of the method vary by adjusting three parameters: block size, dictionary size, and size of shared dictionary. Finally, we discuss the tendencies of compression speed and ratio with respect to the three parameters.
Year
DOI
Venue
2013
10.1109/DCC.2013.97
Data Compression Conference
Keywords
Field
DocType
empirically show,re-pair algorithm,shared dictionaries,simple grammar-based compression method,space consuming algorithm,compression speed,shared dictionary,large text,block size,variable-to-fixed-length encoding,high compression ratio,dictionary size,large texts,grammar,data compression,encoding,grammars,information science,workstations,dictionaries,compression ratio
Block size,Rule-based machine translation,Incremental encoding,Grammar-based code,Computer science,Algorithm,Theoretical computer science,Compression ratio,Data compression,Lossless compression,Encoding (memory)
Conference
ISSN
ISBN
Citations 
1068-0314
978-1-4673-6037-1
1
PageRank 
References 
Authors
0.39
0
4
Name
Order
Citations
PageRank
Kei Sekine130.78
Hirohito Sasakawa271.85
Satoshi Yoshida330.78
Takuya Kida427123.56