Abstract | ||
---|---|---|
In this paper, we present a system for clustering the search results of a news search engine. The news search interface includes the relevant news articles to a given query organized in terms of related news stories. Here each cluster corresponds to a news story and the news articles are clustered into stories. We present a system that clusters the search results of a news search system in a fast and scalable manner. The clustering system is organized into three components including offline clustering, incremental clustering and realtime clustering. We propose novel techniques for clustering the search results in realtime. The experimental results with large collections of news documents reveal that our system is both scalable and also achieves good accuracy in clustering the news search results. |
Year | DOI | Venue |
---|---|---|
2011 | 10.1145/1935826.1935918 | WSDM |
Keywords | Field | DocType |
news search engine,news search interface,related news story,relevant news article,news search system,scalable clustering,news search result,news story,news document,news article,search result,clustering,search engine | Canopy clustering algorithm,Data mining,CURE data clustering algorithm,Clustering high-dimensional data,Data stream clustering,Information retrieval,Computer science,Consensus clustering,Brown clustering,Cluster analysis,DBSCAN | Conference |
Citations | PageRank | References |
13 | 0.71 | 17 |
Authors | ||
8 |
Name | Order | Citations | PageRank |
---|---|---|---|
Srinivas Vadrevu | 1 | 245 | 15.51 |
Choon Hui Teo | 2 | 623 | 47.52 |
Suju Rajan | 3 | 360 | 19.19 |
Kunal Punera | 4 | 648 | 36.78 |
Byron Dom | 5 | 2600 | 825.93 |
Alexander J. Smola | 6 | 19627 | 1967.09 |
Yi Chang | 7 | 1463 | 86.17 |
Zhaohui Zheng | 8 | 1437 | 69.55 |