XML Document Clustering Using Common XPath - Citegraph

Paper Info

Title
XML Document Clustering Using Common XPath

Abstract
XML is becoming a common way of storing data. The elements and their arrangement in the documents hierarchy not only describe the document structure but also imply the datas semantic meaning, and hence provide valuable information to develop tools for manipulating XML documents. In this paper, we pursue a data mining approach to the problem of XML document clustering. We introduce a novel XML structural representation called common XPath (CXP), which encodes the frequently occurring elements with the hierarchical information, and propose to take the CXPs mined to form the feature vectors for XML document clustering. In other words, data mining acts as a feature extractor in the clustering process. Based on this idea, we devise a path-based XML document clustering algorithm called PBClustering which groups the documents according to their CXPs, i.e. their frequent structures. Encouraging simulation results are observed and reported.

Year	DOI	Venue
2005	10.1109/WIRI.2005.39	WIRI
Keywords	Field	DocType
data mining act,novel xml,xml document,clustering process,common xpath,xml document clustering,storing data,data mining approach,document structure,path-based xml document,feature extraction,data mining,tree data structures,information retrieval,xml,clustering algorithms,html,indexing	Streaming XML,Well-formed document,Information retrieval,XML validation,Computer science,Document Structure Description,XML database,XPath,XML schema,Simple API for XML	Conference
ISBN	Citations	PageRank
0-7695-2414-1	17	0.81
References	Authors
7	4

Authors (4 rows)

Cited by (17 rows)

References (7 rows)

Name	Order	Citations	PageRank
Ho-Pong Leung	1	39	2.19
Fu Lai Chung	2	1534	86.72
Stephen C. F. Chan	3	168	15.78
Robert Luk	4	97	5.88

1