Title
Simple Linear-Time Off-Line Text Compression by Longest-First Substitution
Abstract
We consider grammar based text compression with longest first substitution, where non-overlapping occurrences of a longest repeating substring of the input text are replaced by a new non-terminal symbol. We present a new text compression algorithm by simplifying the algorithm presented in [dl. We give a new formulation of the correctness proof introducing the sparse lazy sufix tree data structure. We also present another type of longest first substitution strategy that allows better compression. W e show results of preliminary experiments comparing grammar sizes of the two versions of the longest first strategy and the most frequent strategy.
Year
DOI
Venue
2007
10.1109/DCC.2007.70
DCC
Keywords
Field
DocType
new text compression algorithm,new formulation,input text,better compression,grammar size,longest-first substitution,simple linear-time off-line text,substitution strategy,new non-terminal symbol,longest repeating substring,text compression,frequent strategy,grammars,linear time,tree data structures,informatics,compression algorithms,data compression,computer science,production,shape,data structure
Rule-based machine translation,Data structure,Substring,Computer science,Grammar-based code,Tree (data structure),Algorithm,Theoretical computer science,Suffix tree,Time complexity,Data compression
Conference
ISSN
ISBN
Citations 
1068-0314
0-7695-2791-4
2
PageRank 
References 
Authors
0.40
7
4
Name
Order
Citations
PageRank
Ryosuke Nakamura16821.87
Hideo Bannai262079.87
Shunsuke Inenaga359579.02
Masayuki Takeda490279.24