Statistical feature extraction for cross-language web content quality assessment - Citegraph

Paper Info

Title
Statistical feature extraction for cross-language web content quality assessment

Abstract
Web content quality assessment is a typical static ranking problem. Heuristic content and TFIDF features based statistical systems have proven effective for Web content quality assessment. But they are all language dependent features, which are not suitable for cross-language ranking. In this paper, we fuse a series of language-independent features including hostname features, domain registration features, two-layer hyperlink analysis features and third-party Web service features to assess the Web content quality. The experiments on ECML/PKDD 2010 Discovery Challenge cross-language datasets show that the assessment is effective.

Year	DOI	Venue
2011	10.1145/2009916.2010083	SIGIR
Keywords	Field	DocType
hostname feature,discovery challenge cross-language datasets,third-party web service feature,web content quality,statistical feature extraction,typical static ranking problem,heuristic content,cross-language ranking,domain registration feature,language dependent feature,cross-language web content quality,web content quality assessment,machine learning,web service,feature extraction	Data mining,Heuristic,Information retrieval,Ranking,tf–idf,Computer science,Feature extraction,Hyperlink,Web service,Web content,Hostname	Conference
Citations	PageRank	References
0	0.34	3
Authors
5

Authors (5 rows)

Cited by (0 rows)

References (3 rows)

Name	Order	Citations	PageRank
Guanggang Geng	1	141	20.78
Xiaodong Li	2	3	1.07
Liming Wang	3	13	8.75
Wei Wang	4	202	58.31
Shuo Shen	5	38	3.72

1