Title
Clustering of web search results based on an Iterative Fuzzy C-means Algorithm and Bayesian Information Criterion
Abstract
The clustering of web search has become a very interesting research area among academic and scientific communities involved in information retrieval. Clustering of web search result systems, also called Web Clustering Engines, seek to increase the coverage of documents presented for the user to review, while reducing the time spent reviewing them. Several algorithms for web document clustering already exist, but results show there is room for more to be done. This paper introduces a new description-centric algorithm for clustering of web results called IFCWR. IFCWR initially selects a maximum estimated number of clusters using Forgy's strategy, then it iteratively merges clusters until results cannot be improved. Every merge operation implies the execution of Fuzzy C-Means for clustering results of web search and the calculus of Bayesian Information Criterion for automatically evaluating the best solution and number of clusters. IFCWR was compared against other established web document clustering algorithms, among them: Suffix Tree Clustering and Lingo. Comparison was executed on AMBIENT and MORESQUE datasets, using precision, recall, f-measure, SSLk and other metrics. Results show a considerable improvement in clustering quality and performance.
Year
DOI
Venue
2013
10.1109/IFSA-NAFIPS.2013.6608452
IFSA/NAFIPS
Keywords
Field
DocType
belief networks,moresque dataset,clustering performance,pattern clustering,bayesian information criterion,sslk metric,ifcwr,lingo,fuzzy c-means,description-centric algorithm,internet,merge operation,suffix tree clustering,f-measure metric,recall metric,web clustering engines,web document clustering,iterative fuzzy c-means algorithm,document handling,precision metric,ambient dataset,iterative methods,clustering quality,web search results clustering,clustering algorithms,accuracy,algorithm design and analysis
Fuzzy clustering,Data mining,CURE data clustering algorithm,Computer science,Artificial intelligence,Cluster analysis,Canopy clustering algorithm,Data stream clustering,Correlation clustering,Algorithm,Determining the number of clusters in a data set,Constrained clustering,Machine learning
Conference
Citations 
PageRank 
References 
0
0.34
0
Authors
5
Name
Order
Citations
PageRank
Carlos Cobos14010.10
Martha Mendoza221.38
Milos Manic352149.70
Elizabeth Leon4335.26
Enrique Herrera-Viedma513105642.24