Web page clustering enhanced by summarization - Citegraph

Paper Info

Title
Web page clustering enhanced by summarization

Abstract
Traditional Web page clustering algorithms use the full-text in the documents to generate feature vectors. Such methods often produce unsatisfactory results because there is much noisy information, such as decoration, interaction, and advertisement, in Web pages. The varying-length problem of the Web pages is also a significant negative factor affecting the performance. In this paper, we investigate the use of several summarization techniques to tackle these issues when clustering Web pages. Compared with the full-text representation of the Web pages, our experimental results indicate that our proposed approach effectively solves the problems of noisy information and varying-length, and thus significantly boosts the clustering performance.

Year	DOI	Venue
2004	10.1145/1031171.1031223	CIKM
Keywords	Field	DocType
clustering web page,web page,noisy information,varying-length problem,clustering performance,full-text representation,traditional web page,feature vector,summarization,latent semantic analysis,web pages	Data mining,Automatic summarization,HITS algorithm,Feature vector,Web page,Information retrieval,Web page clustering,Computer science,Website Parse Template,Cluster analysis,Latent semantic analysis	Conference
ISBN	Citations	PageRank
1-58113-874-1	3	0.43
References	Authors
5	5

Authors (5 rows)

Cited by (3 rows)

References (5 rows)

Name	Order	Citations	PageRank
Xuanhui Wang	1	1394	68.85
Dou Shen	2	1224	59.46
Hua-Jun Zeng	3	1999	100.54
Zheng Chen	4	5019	256.89
Wei-ying Ma	5	14587	1003.11

1