Multi-label Wikipedia classification with textual and link features - Citegraph

Paper Info

Title
Multi-label Wikipedia classification with textual and link features

Abstract
We address the problem of categorizing a large set of linked documents with important content and structure aspects, in particular, from the Wikipedia collection proposed at the INEX 2009 XML Mining challenge. We analyze the network of collection pages and turn it into valuable features for the classification. We combine the content-based and link-based features of pages to train an accurate categorizer for unlabelled pages. In the multi-label setting, we revise a number of existing techniques and test some which show a good scalability. We report evaluation results obtained with a variety of learning methods and techniques on the training set of the Wikipedia corpus.

Year	DOI	Venue
2009	10.1007/978-3-642-14556-8_38	INEX
Keywords	Field	DocType
multi-label wikipedia classification,wikipedia collection,important content,evaluation result,accurate categorizer,large set,wikipedia corpus,good scalability,link feature,link-based feature,collection page,xml mining challenge	Training set,XML,Information retrieval,Computer science,Betweenness centrality,Report evaluation,Scalability	Conference
Volume	ISSN	ISBN
6203	0302-9743	3-642-14555-8
Citations	PageRank	References
1	0.36	12
Authors
1

Authors (1 rows)

Cited by (1 rows)

References (12 rows)

Name	Order	Citations	PageRank
Boris Chidlovskii	1	411	52.58

1