Title
Collaborative filtering driven by fast semantic feature analysis on Spark
Abstract
Collaborative filtering (CF) is a prevailing technique utilized for recommendation systems and has been comprehensively explored to tackle the problem of information overload particularly in the Big Data context. The traditional CF algorithms are capable to perform adequately under various circumstances, nevertheless, there exist some shortcomings involving cold start and data sparsity. Moreover, a potential breakthrough rests in taking full advantage of any valuable semantic information contained in items. Therefore, for alleviating these defects, in this paper, we propose a two-stage collaborative filtering approach driven by Simhash-based semantic feature analysis, of which the first stage is Simhash-based semantic feature extraction for items and categories, and the second stage is reinforced CF rating prediction driven by intensely compressed category features. The rich semantic features of vast items and their categories can be rapidly extracted and compressed in the first stage by employing the Simhash, with being utilized to promote the traditional collaborative filtering processes. Besides, to solve the problems pertaining to the Big Data context, we design a parallel algorithm on Spark to accelerate the time-consuming process of semantic feature extraction for vast items. Finally, we conduct comprehensive experiments to validate the reinforced CF approach by adopting practical datasets, and the results reveal that compared with the traditional CF algorithms it can accomplish a promising performance.
Year
DOI
Venue
2022
10.1007/s11276-018-01901-8
Wireless Networks
Keywords
DocType
Volume
Collaborative filtering, Simhash, Semantic feature, Spark
Journal
28
Issue
ISSN
Citations 
3
1572-8196
0
PageRank 
References 
Authors
0.34
26
3
Name
Order
Citations
PageRank
Yang Peng1134.18
Liang Gu231.72
Xuan Liu329738.07