Finding Frequent Patterns from Compressed Tree-Structured Data - Citegraph

Paper Info

Title
Finding Frequent Patterns from Compressed Tree-Structured Data

Abstract
In this paper we present a new method for finding frequent patterns from tree-structured data, where a frequent pattern means a subgraph which frequently occurs in a given tree-structured data. We make use of a data compression method called TGCA for tree-structured data. Improving manipulation of large scaled data by compressing them has been investigated in previous studies, such as keyword search in plain texts, and frequent itemset mining from transaction data, but it has not been applied to finding frequent patterns from tree-structured data in the best of our knowledge. The TGCA algorithm is obtained by modifying the SEQUITUR algorithm for plain texts so that it can compress tree-structured data, and we show that we can count occurrences of patterns in the original data by using the data compressed by TGCA without expanding it. This is the reason why our method improves the efficiency of finding frequent patterns. The advantage of our method is shown in some experiments in the case that the data can be compressed in some good compression ratios.

Year	DOI	Venue
2008	10.1007/978-3-540-88411-8_27	Discovery Science
Keywords	Field	DocType
compressed tree-structured data,original data,frequent itemset mining,plain text,new method,transaction data,frequent pattern,tree-structured data,sequitur algorithm,frequent patterns,tgca algorithm,data compression method,compression ratio,data compression,tree structure	Data mining,Pattern recognition,Computer science,Keyword search,Sequitur algorithm,Compression ratio,Tree structured data,Artificial intelligence,Data compression,Transaction data,Fold (higher-order function)	Conference
Volume	ISSN	Citations
5255	0302-9743	1
PageRank	References	Authors
0.35	13	3

Authors (3 rows)

Cited by (1 rows)

References (13 rows)

Name	Order	Citations	PageRank
Seiji Murakami	1	1	0.35
Koichiro Doi	2	31	7.59
Akihiro Yamamoto	3	135	26.84

1