Abstract | ||
---|---|---|
Many semistructured data such as HTML/XML files are represented by rooted trees t such that all children of each internal vertex of t axe ordered and all edges of t have labels. Such data is called tree structured data. Analyzing large tree structured data is a time-consuming process in data mining. If we can reduce the size of input data without loss of information, we can speed up such a heavy process. In this paper, we consider a problem of effective compression of an ordered rooted tree, which represents given tree structured data, without loss of information. Firstly, in order to define this problem in a grammar-based compression scheme, we present a variable replacement grammar (VRG for short) over ordered rooted trees. The grammar-based compression problem for an ordered rooted tree T is defined as a problem of finding a VRG which generates only T and whose size is minimum. For the grammar-based compression problem for an ordered rooted tree, we show that there is no polynomial time algorithm with approximation ratio less than 8593/8592 unless P=NP. Secondly, based on this theoretical result, we present an effective compression algorithm for finding a VRG which generates only a given ordered rooted tree and whose size is as small as possible. Finally, in order to evaluate the performance of our grammar-based compression algorithm, we report some experimental results. |
Year | DOI | Venue |
---|---|---|
2003 | 10.1007/978-3-540-39917-9_25 | LECTURE NOTES IN ARTIFICIAL INTELLIGENCE |
Keywords | Field | DocType |
tree structure,data mining,compression algorithm | Range tree,Grammar-based code,Computer science,K-ary tree,Theoretical computer science,Link/cut tree,Artificial intelligence,Interval tree,Discrete mathematics,Combinatorics,Binary tree,Segment tree,Machine learning,Search tree | Conference |
Volume | ISSN | Citations |
2835 | 0302-9743 | 5 |
PageRank | References | Authors |
0.50 | 14 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Kazunori Yamagata | 1 | 5 | 0.50 |
Tomoyuki Uchida | 2 | 255 | 35.06 |
Takayoshi Shoudai | 3 | 269 | 31.89 |
Yasuaki Nakamura | 4 | 105 | 140.45 |