Improving Neighborhood-Based Collaborative Filtering by Reducing Hubness - Citegraph

Paper Info

Title
Improving Neighborhood-Based Collaborative Filtering by Reducing Hubness

Abstract
For recommending multimedia items, collaborative filtering (CF) denotes the technique of automatically predicting a user's rating or preference for an item by exploiting item preferences of a (large) group of other users. In traditional memory-based (or neighborhood-based) recommenders, this is accomplished by, first, selecting a number of similar users (or items) and, second, combining their ratings into a single user's predicted rating for an item. Strategies for both defining similarity (i.e., to identify nearest neighbors) and for combining ratings (i.e., to weight their impact) have been extensively studied and even resulted in inconsistent findings. In this paper, we investigate the effects of the high dimensionality of userxitem matrices on the quality of memory-based movie rating prediction. By examining several publicly available real-world CF data sets, we show that the step of nearest neighbor selection is affected by the phenomena of similarity concentration and hub occurrence due to high-dimensional data spaces and the class of similarity measures used. To mitigate this, we adapt a normalization technique called mutual proximity that has been shown to reduce these effects in classification tasks. Finally, we show that removing hubs and incorporating normalized similarity values into the neighbor weighting step leads to increased rating prediction accuracy, observable on all examined data sets in terms of lowered error measure (RMSE).

Year	DOI	Venue
2014	10.1145/2578726.2578747	ICMR
Keywords	Field	DocType
increased rating prediction accuracy,collaborative filtering,defining similarity,item preference,multimedia item,data space,cf data set,similarity concentration,normalized similarity value,memory-based movie rating prediction	k-nearest neighbors algorithm,Data mining,Data set,Weighting,Normalization (statistics),Collaborative filtering,Pattern recognition,Computer science,Mean squared error,Curse of dimensionality,Artificial intelligence,Machine learning	Conference
Citations	PageRank	References
8	0.49	20
Authors
3

Authors (3 rows)

Cited by (8 rows)

References (20 rows)

Name	Order	Citations	PageRank
Peter Knees	1	594	51.71
Dominik Schnitzer	2	324	18.33
Arthur Flexer	3	599	48.03

1