Title
Scalable clustering of news search results
Abstract
In this paper, we present a system for clustering the search results of a news search engine. The news search interface includes the relevant news articles to a given query organized in terms of related news stories. Here each cluster corresponds to a news story and the news articles are clustered into stories. We present a system that clusters the search results of a news search system in a fast and scalable manner. The clustering system is organized into three components including offline clustering, incremental clustering and realtime clustering. We propose novel techniques for clustering the search results in realtime. The experimental results with large collections of news documents reveal that our system is both scalable and also achieves good accuracy in clustering the news search results.
Year
DOI
Venue
2011
10.1145/1935826.1935918
WSDM
Keywords
Field
DocType
news search engine,news search interface,related news story,relevant news article,news search system,scalable clustering,news search result,news story,news document,news article,search result,clustering,search engine
Canopy clustering algorithm,Data mining,CURE data clustering algorithm,Clustering high-dimensional data,Data stream clustering,Information retrieval,Computer science,Consensus clustering,Brown clustering,Cluster analysis,DBSCAN
Conference
Citations 
PageRank 
References 
13
0.71
17
Authors
8
Name
Order
Citations
PageRank
Srinivas Vadrevu124515.51
Choon Hui Teo262347.52
Suju Rajan336019.19
Kunal Punera464836.78
Byron Dom52600825.93
Alexander J. Smola6196271967.09
Yi Chang7146386.17
Zhaohui Zheng8143769.55