Title | ||
---|---|---|
Clustering of web search results based on an Iterative Fuzzy C-means Algorithm and Bayesian Information Criterion |
Abstract | ||
---|---|---|
The clustering of web search has become a very interesting research area among academic and scientific communities involved in information retrieval. Clustering of web search result systems, also called Web Clustering Engines, seek to increase the coverage of documents presented for the user to review, while reducing the time spent reviewing them. Several algorithms for web document clustering already exist, but results show there is room for more to be done. This paper introduces a new description-centric algorithm for clustering of web results called IFCWR. IFCWR initially selects a maximum estimated number of clusters using Forgy's strategy, then it iteratively merges clusters until results cannot be improved. Every merge operation implies the execution of Fuzzy C-Means for clustering results of web search and the calculus of Bayesian Information Criterion for automatically evaluating the best solution and number of clusters. IFCWR was compared against other established web document clustering algorithms, among them: Suffix Tree Clustering and Lingo. Comparison was executed on AMBIENT and MORESQUE datasets, using precision, recall, f-measure, SSLk and other metrics. Results show a considerable improvement in clustering quality and performance. |
Year | DOI | Venue |
---|---|---|
2013 | 10.1109/IFSA-NAFIPS.2013.6608452 | IFSA/NAFIPS |
Keywords | Field | DocType |
belief networks,moresque dataset,clustering performance,pattern clustering,bayesian information criterion,sslk metric,ifcwr,lingo,fuzzy c-means,description-centric algorithm,internet,merge operation,suffix tree clustering,f-measure metric,recall metric,web clustering engines,web document clustering,iterative fuzzy c-means algorithm,document handling,precision metric,ambient dataset,iterative methods,clustering quality,web search results clustering,clustering algorithms,accuracy,algorithm design and analysis | Fuzzy clustering,Data mining,CURE data clustering algorithm,Computer science,Artificial intelligence,Cluster analysis,Canopy clustering algorithm,Data stream clustering,Correlation clustering,Algorithm,Determining the number of clusters in a data set,Constrained clustering,Machine learning | Conference |
Citations | PageRank | References |
0 | 0.34 | 0 |
Authors | ||
5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Carlos Cobos | 1 | 40 | 10.10 |
Martha Mendoza | 2 | 2 | 1.38 |
Milos Manic | 3 | 521 | 49.70 |
Elizabeth Leon | 4 | 33 | 5.26 |
Enrique Herrera-Viedma | 5 | 13105 | 642.24 |