A negative category based approach for Wikipedia document classification - Citegraph

Paper Info

Title
A negative category based approach for Wikipedia document classification

Abstract
Profile based methods have been successfully used for the classification of unstructured texts. This paper presents a profile based method for Wikipedia XML document classification. We have used profiles built using negative category information. Our approach exploits the structure of Wikipedia documents to build profiles. Two class profiles are built; one based on the whole content and the other based on the initial description of the Wikipedia documents. In addition, we have also explored the option of using the terms in the section and subsection titles. The effectiveness of cosine and fractional similarity measures in classifying XML documents is compared. The importance of combining two profile based classifiers is experimentally shown to have worked better than individual classifiers.

Year	DOI	Venue
2010	10.1504/IJKEDM.2010.032582	IJKEDM
Keywords	Field	DocType
wikipedia document,fractional similarity measure,negative category information,initial description,class profile,subsection title,individual classifier,wikipedia document classification,classifying xml document,wikipedia xml document classification,unstructured text,feature selection,cosine	Document classification,Data mining,Information retrieval,Feature selection,XML,Computer science,Exploit	Journal
Volume	Issue	Citations
1	1	1
PageRank	References	Authors
0.37	10	3

Authors (3 rows)

Cited by (1 rows)

References (10 rows)

Name	Order	Citations	PageRank
Meenakshi Sundaram Murugeshan	1	5	1.51
K. Lakshmi	2	10	2.65
Saswati Mukherjee	3	29	7.25

1