Title
A data compression scheme for Chinese text files using Huffman coding and a two-level dictionary
Abstract
This paper present's a data compression scheme for Chinese text files. Due to the skewness of the distribution of Chinese ideograms, the Huffman coding method is adopted. By storing the frequencies of the encoding symbols rather than their Huffman codes in a dictionary, applying differential coding where it saves space, and structuring the dictionary in the Huffman coding scheme into a two-level dictionary structure, the algorithm produces significant improvement on the compression results. The proposed method is evaluated by comparing its performance with three well-known compression algorithms. This algorithm should also be applicable to other ideogram;based or oriental-language texts. Also, it has the potential to reduce the dictionary size in a bigram- or trigram-based semi-adaptive compression scheme for English texts.
Year
DOI
Venue
1995
10.1016/0020-0255(94)00108-N
Inf. Sci.
Keywords
Field
DocType
two-level dictionary,huffman coding,chinese text,data compression scheme,huffman codes,compression algorithm,data compression
Modified Huffman coding,Tunstall coding,Incremental encoding,Dictionary coder,Computer science,Algorithm,Speech recognition,Huffman coding,Shannon–Fano coding,DEFLATE,Canonical Huffman code
Journal
Volume
Issue
ISSN
84
1-2
0020-0255
Citations 
PageRank 
References 
2
0.38
8
Authors
2
Name
Order
Citations
PageRank
Ghim-Hwee Ong1143.89
Shell Ying Huang216119.52