Title
Application Of A Word-Based Text Compression Method To Japanese And Chinese Texts
Abstract
16-bit Asian language codes can not be compressed well by conventional 8-bit sampling text compression schemes. Previously, we reported the application of a word-based text compression method that uses 16-bit sampling for the compression of Japanese texts. This paper describes our further efforts in applying a word-based method with a static canonical Huffman encoder to both Japanese and Chinese texts. The method was proposed to support a multilingual environment, as we replaced the word-dictionary and the canonical Huffman code table for the respective language appropriately. A computer simulation showed that this method is effective for both languages. The obtained compression ratio was a little less than 0.5 without regarding the Markov context, and around 0.4 when accounting for the first order Markov context.
Year
Venue
Keywords
2002
IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES
lossless, text compression, language, word-based
DocType
Volume
Issue
Journal
E85A
12
ISSN
Citations 
PageRank 
0916-8508
0
0.34
References 
Authors
0
4
Name
Order
Citations
PageRank
Shigeru Yoshida100.34
Takashi Morihara211.36
Hironori Yahagi330.88
Noriko Itani400.34