Application Of A Word-Based Text Compression Method To Japanese And Chinese Texts - Citegraph

Paper Info

Title
Application Of A Word-Based Text Compression Method To Japanese And Chinese Texts

Abstract
16-bit Asian language codes can not be compressed well by conventional 8-bit sampling text compression schemes. Previously, we reported the application of a word-based text compression method that uses 16-bit sampling for the compression of Japanese texts. This paper describes our further efforts in applying a word-based method with a static canonical Huffman encoder to both Japanese and Chinese texts. The method was proposed to support a multilingual environment, as we replaced the word-dictionary and the canonical Huffman code table for the respective language appropriately. A computer simulation showed that this method is effective for both languages. The obtained compression ratio was a little less than 0.5 without regarding the Markov context, and around 0.4 when accounting for the first order Markov context.

Year	Venue	Keywords
2002	IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES	lossless, text compression, language, word-based
DocType	Volume	Issue
Journal	E85A	12
ISSN	Citations	PageRank
0916-8508	0	0.34
References	Authors
0	4

Authors (4 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Shigeru Yoshida	1	0	0.34
Takashi Morihara	2	1	1.36
Hironori Yahagi	3	3	0.88
Noriko Itani	4	0	0.34

1