Title
Improving Neighborhood-Based Collaborative Filtering by Reducing Hubness
Abstract
For recommending multimedia items, collaborative filtering (CF) denotes the technique of automatically predicting a user's rating or preference for an item by exploiting item preferences of a (large) group of other users. In traditional memory-based (or neighborhood-based) recommenders, this is accomplished by, first, selecting a number of similar users (or items) and, second, combining their ratings into a single user's predicted rating for an item. Strategies for both defining similarity (i.e., to identify nearest neighbors) and for combining ratings (i.e., to weight their impact) have been extensively studied and even resulted in inconsistent findings. In this paper, we investigate the effects of the high dimensionality of userxitem matrices on the quality of memory-based movie rating prediction. By examining several publicly available real-world CF data sets, we show that the step of nearest neighbor selection is affected by the phenomena of similarity concentration and hub occurrence due to high-dimensional data spaces and the class of similarity measures used. To mitigate this, we adapt a normalization technique called mutual proximity that has been shown to reduce these effects in classification tasks. Finally, we show that removing hubs and incorporating normalized similarity values into the neighbor weighting step leads to increased rating prediction accuracy, observable on all examined data sets in terms of lowered error measure (RMSE).
Year
DOI
Venue
2014
10.1145/2578726.2578747
ICMR
Keywords
Field
DocType
increased rating prediction accuracy,collaborative filtering,defining similarity,item preference,multimedia item,data space,cf data set,similarity concentration,normalized similarity value,memory-based movie rating prediction
k-nearest neighbors algorithm,Data mining,Data set,Weighting,Normalization (statistics),Collaborative filtering,Pattern recognition,Computer science,Mean squared error,Curse of dimensionality,Artificial intelligence,Machine learning
Conference
Citations 
PageRank 
References 
8
0.49
20
Authors
3
Name
Order
Citations
PageRank
Peter Knees159451.71
Dominik Schnitzer232418.33
Arthur Flexer359948.03