Abstract | ||
---|---|---|
Cross-language Web content quality assessment plays an important role in many Web content processing applications. In the previous research, natural language processing, heuristic content and term frequency-inverse document frequency features based statistical systems have proven effective for Web content quality assessment. However, these are language-dependent features, which are not suitable for cross-language ranking. This paper proposes a cross-language Web content quality assessment method. First multi-modal language-independent features are extracted. The extracting features include character features, domain registration features, two-layer hyperlink analysis features and third-party Web service features. All the extracted features are then fused. Based on the fused features, feature selection is carried out to get a new eigenspace. Finally cross-language Web content quality model on the eigenspace can be learned. The experiments on ECML/PKDD 2010 Discovery Challenge cross-language datasets demonstrate that every scale feature has discriminability; different modalities of features are complementary to each other; and the feature selection is effective for statistical learning based cross-language Web content quality assessment. |
Year | DOI | Venue |
---|---|---|
2012 | 10.1016/j.knosys.2012.05.018 | Knowl.-Based Syst. |
Keywords | Field | DocType |
discovery challenge cross-language datasets,third-party web service feature,feature selection,web content processing application,cross-language web content quality,web content quality assessment,heuristic content,cross-language ranking,character feature,assessment method,feature extraction,web spam | Data mining,Heuristic,Feature selection,Information retrieval,Ranking,Computer science,Feature extraction,Hyperlink,Web service,Web content,Spamdexing | Journal |
Volume | ISSN | Citations |
35, | 0950-7051 | 2 |
PageRank | References | Authors |
0.38 | 24 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Guanggang Geng | 1 | 141 | 20.78 |
Liming Wang | 2 | 13 | 8.75 |
Wei Wang | 3 | 271 | 25.20 |
An-Lei Hu | 4 | 9 | 2.18 |
Shuo Shen | 5 | 38 | 3.72 |