Title
CUCS: A Web Page Classification Algorithm for Large Training Set
Abstract
This paper presents a new algorithm of web page classification, CUCS(Combined UC and SVM), for large training set. CUCS combines the advantages of SVM (Support Vector Machine) and UC (Unsupervised Clustering), achieving high precision and fast speed. In the training stage, CUCS gets clustering centers, which include positive example centers and negative ones, by means of UC. Then CUCS prunes training set to produce classifier by SVM. In the classifying stage, the minimum distance from a web page to the positive centers, as well as to the negative centers, is calculated. If the difference between the two distances is large enough, the web page will be classified by UC. Otherwise, the web page will be classified by pruned SVM. Through experiments, CUCS manifests precision that is much higher than UC and a little higher than SVM. As to time consumed, CUCS costs more time than UC and far less than SVM.
Year
DOI
Venue
2008
10.1109/NPC.2008.11
NPC Workshops
Keywords
Field
DocType
web page classification,web page,large training set,cucs prunes training,cucs manifests precision,combined uc,high precision,large enough,classifying stage,training stage,web page classification algorithm,classification,clustering algorithm,internet,web pages,support vector machine,clustering algorithms,svm,support vector machines,unsupervised learning,classification algorithms
Web page,Computer science,Unsupervised learning,Artificial intelligence,Cluster analysis,Classifier (linguistics),Training set,Distance measurement,Pattern recognition,Support vector machine,Algorithm,Statistical classification,Machine learning
Conference
Citations 
PageRank 
References 
1
0.37
8
Authors
4
Name
Order
Citations
PageRank
Jing Wang110.71
Hongming Cai239658.68
Boyi Xu326120.53
Li-hong Jiang4495.54