Feature Matrix Extraction and Classification of XML Pages - Citegraph

Paper Info

Title
Feature Matrix Extraction and Classification of XML Pages

Abstract
With the increasing data on the Web, the disadvantage of HTML is more and more evident. There must be a method which can separate data from display, and then XML (eXtensible Markup Language) arises. XML can be the main form of expressing and exchanging data. How to store, manage and use the data effectively have been problems needing to be solved in the field of Internet, in which the automatic text classification is an important one. In this article, we propose a data model to analyze documents using the hierarchical structure and keywords information. Experiments show the model has not only high accuracy, but also less time cost.

Year	DOI	Venue
2008	10.1007/978-3-540-89376-9_21	APWeb Workshops
Keywords	Field	DocType
keywords information,feature matrix extraction,time cost,automatic text classification,hierarchical structure,xml pages,main form,data model,extensible markup language,high accuracy,feature extraction,vector space model,rough set	Data mining,Efficient XML Interchange,XML framework,Streaming XML,Information retrieval,XML,Computer science,XML validation,Document Structure Description,XML database,Database,XML Signature	Conference
Volume	ISSN	Citations
4977	0302-9743	2
PageRank	References	Authors
0.41	5	5

Authors (5 rows)

Cited by (2 rows)

References (5 rows)

Name	Order	Citations	PageRank
Hongcan Yan	1	2	1.76
Dianchuan Jin	2	2	1.09
Lihong Li	3	2	0.75
Baoxiang Liu	4	298	45.05
Yanan Hao	5	83	4.54

1