Title
Combine-and-conquer: improving the diversity in similarity search through influence sampling
Abstract
Result diversification methods are intended to retrieve elements similar to a given object whereas also enforcing a certain degree of diversity among them, aimed at improving the answer relevance. Most of the methods are based on optimization, but bearing NP-hard solutions. Diversity is injected into an otherwise all-too-similar result set in two phases: in the first, the search space is reduced to speed up finding the optimal solution, whereas in the second a trade-off between diversity and similarity over the reduced space is obtained. It is assumed that the first phase is achieved by applying a traditional nearest neighbor algorithm, but no previous investigation evaluated the impact of the first over the second phase. In this paper, we devised alternative techniques to execute the first phase and evaluated how obtaining a better quality set of elements in the first phase can improve the diversity. Besides the traditional nearest neighbor-based pre-selection, we also considered naive random selection, cluster-based and influence-based ones. Thereafter, extensive experiments evaluated a number of state-of-the-art diversity algorithms employed in the second phase, regarding both processing time and answer quality. The obtained results have shown that although the much more elaborated (and much more time consuming) methods indeed provide best answers, other alternatives are able to provide a better commitment regarding quality and performance. Moreover, the pre-selection techniques can reduce the total running time by up to two orders of magnitude.
Year
DOI
Venue
2015
10.1145/2695664.2695798
SAC 2015: Symposium on Applied Computing Salamanca Spain April, 2015
Keywords
Field
DocType
Search result diversification, similarity search, sampling
k-nearest neighbors algorithm,Data mining,Result set,Computer science,Sampling (statistics),Order of magnitude,Nearest neighbor search,Speedup
Conference
ISBN
Citations 
PageRank 
978-1-4503-3196-8
0
0.34
References 
Authors
6
6