Title
An Effective Grammar-Based Compression Algorithm for Tree Structured Data
Abstract
Many semistructured data such as HTML/XML files are represented by rooted trees t such that all children of each internal vertex of t axe ordered and all edges of t have labels. Such data is called tree structured data. Analyzing large tree structured data is a time-consuming process in data mining. If we can reduce the size of input data without loss of information, we can speed up such a heavy process. In this paper, we consider a problem of effective compression of an ordered rooted tree, which represents given tree structured data, without loss of information. Firstly, in order to define this problem in a grammar-based compression scheme, we present a variable replacement grammar (VRG for short) over ordered rooted trees. The grammar-based compression problem for an ordered rooted tree T is defined as a problem of finding a VRG which generates only T and whose size is minimum. For the grammar-based compression problem for an ordered rooted tree, we show that there is no polynomial time algorithm with approximation ratio less than 8593/8592 unless P=NP. Secondly, based on this theoretical result, we present an effective compression algorithm for finding a VRG which generates only a given ordered rooted tree and whose size is as small as possible. Finally, in order to evaluate the performance of our grammar-based compression algorithm, we report some experimental results.
Year
DOI
Venue
2003
10.1007/978-3-540-39917-9_25
LECTURE NOTES IN ARTIFICIAL INTELLIGENCE
Keywords
Field
DocType
tree structure,data mining,compression algorithm
Range tree,Grammar-based code,Computer science,K-ary tree,Theoretical computer science,Link/cut tree,Artificial intelligence,Interval tree,Discrete mathematics,Combinatorics,Binary tree,Segment tree,Machine learning,Search tree
Conference
Volume
ISSN
Citations 
2835
0302-9743
5
PageRank 
References 
Authors
0.50
14
4
Name
Order
Citations
PageRank
Kazunori Yamagata150.50
Tomoyuki Uchida225535.06
Takayoshi Shoudai326931.89
Yasuaki Nakamura4105140.45