Abstract | ||
---|---|---|
Different important studies in Web search results clustering have recently shown increasing performances motivated by the use of external resources. Following this trend, we present a new algorithm called Dual C-Means, which provides a theoretical background for clustering in different representation spaces. Its originality relies on the fact that external resources can drive the clustering process as well as the labeling task in a single step. To validate our hypotheses, a series of experiments are conducted over different standard datasets and in particular over a new dataset built from the TREC Web Track 2012 to take into account query logs information. The comprehensive empirical evaluation of the proposed approach demonstrates its significant advantages over traditional clustering and labeling techniques. |
Year | DOI | Venue |
---|---|---|
2014 | 10.1145/2600428.2609583 | SIGIR |
Keywords | Field | DocType |
web search results clustering,automatic labeling,dual c-means,evaluation,clustering | Data mining,Fuzzy clustering,CURE data clustering algorithm,Computer science,Artificial intelligence,Cluster analysis,Canopy clustering algorithm,Clustering high-dimensional data,Data stream clustering,Correlation clustering,Information retrieval,Brown clustering,Machine learning | Conference |
Citations | PageRank | References |
7 | 0.47 | 28 |
Authors | ||
3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Jose G. Moreno | 1 | 50 | 10.67 |
Gaël Dias | 2 | 354 | 41.95 |
Guillaume Cleuziou | 3 | 129 | 19.02 |