Abstract | ||
---|---|---|
This paper presents a new algorithm of web page classification, CUCS(Combined UC and SVM), for large training set. CUCS combines the advantages of SVM (Support Vector Machine) and UC (Unsupervised Clustering), achieving high precision and fast speed. In the training stage, CUCS gets clustering centers, which include positive example centers and negative ones, by means of UC. Then CUCS prunes training set to produce classifier by SVM. In the classifying stage, the minimum distance from a web page to the positive centers, as well as to the negative centers, is calculated. If the difference between the two distances is large enough, the web page will be classified by UC. Otherwise, the web page will be classified by pruned SVM. Through experiments, CUCS manifests precision that is much higher than UC and a little higher than SVM. As to time consumed, CUCS costs more time than UC and far less than SVM. |
Year | DOI | Venue |
---|---|---|
2008 | 10.1109/NPC.2008.11 | NPC Workshops |
Keywords | Field | DocType |
web page classification,web page,large training set,cucs prunes training,cucs manifests precision,combined uc,high precision,large enough,classifying stage,training stage,web page classification algorithm,classification,clustering algorithm,internet,web pages,support vector machine,clustering algorithms,svm,support vector machines,unsupervised learning,classification algorithms | Web page,Computer science,Unsupervised learning,Artificial intelligence,Cluster analysis,Classifier (linguistics),Training set,Distance measurement,Pattern recognition,Support vector machine,Algorithm,Statistical classification,Machine learning | Conference |
Citations | PageRank | References |
1 | 0.37 | 8 |
Authors | ||
4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Jing Wang | 1 | 1 | 0.71 |
Hongming Cai | 2 | 396 | 58.68 |
Boyi Xu | 3 | 261 | 20.53 |
Li-hong Jiang | 4 | 49 | 5.54 |