Title
Human Performance on Clustering Web Pages: A Preliminary Study
Abstract
With the increase in information on the World Wide Web it has become difficult to quickly find desired information with- out using multiple queries or using a topic-specific search en- gine. One way to help in the search is by grouping HTML pages together that appear in some way to be related. In order to better understand this task, we performed an initial study of human clustering of web pages, in the hope that it would pro- vide some insight into the difficulty of automating this task. Our results show that subjects did not cluster identically; in fact, on average, any two subjects had little similarity in their web-page clusters. We also found that subjects generally cre- ated rather small clusters, and those with access only to URLs created fewer clusters than those with access to the full text of each web page. Generally the overlap of documents between clusters for any given subject increased when given the full text, as did the percentage of documents clustered. When an- alyzing individual subjects, we found that each had different behavior across queries, both in terms of overlap, size of clus- ters, and number of clusters. These results provide a sober- ing note on any quest for a single clearly correct clustering method for web pages.
Year
Venue
Keywords
1998
KDD
world wide web,web pages,human performance,document clustering
Field
DocType
Citations 
Cluster (physics),Data mining,Fuzzy clustering,World Wide Web,Search engine,Information retrieval,Web page,Computer science,Cluster analysis
Conference
20
PageRank 
References 
Authors
6.46
12
4
Name
Order
Citations
PageRank
Sofus A. Macskassy161347.11
Arunava Banerjee231329.18
Brian D. Davison32312150.97
Haym Hirsh41839277.74