Title
Effect of Spam on Hashtag Recommendation for Tweets.
Abstract
Presence of spam tweets in a dataset may affect the choices of feature selection, algorithm formulation, and system evaluation for many applications. However, most existing studies have not considered the impact of spam tweets. In this paper, we study the impact of spam tweets on hashtag recommendation for hyperlinked tweets (i.e., tweets containing URLs) in HSpam14 dataset. HSpam14 is a collection of 14 million tweets with annotations of being spam and ham (i.e., non-spam). In our experiments, we observe that it is much easier to recommend "correct" hashtags for spam tweets than ham tweets, because of the near duplicates in spam tweets. Simple approaches like recommending most popular hashtags achieves very good accuracy on spam tweets. On the other hand, features that are highly effective on ham tweets may not be effective on spam tweets. Our findings suggest that without removing spam tweets from the data collection (as in most studies), the results obtained could be misleading for hashtag recommendation tasks.
Year
DOI
Venue
2016
10.1145/2872518.2889404
WWW '16: 25th International World Wide Web Conference Montréal Québec Canada April, 2016
Field
DocType
ISBN
Data mining,Data collection,World Wide Web,Social media,Feature selection,Computer science,System evaluation,Microblogging
Conference
978-1-4503-4144-8
Citations 
PageRank 
References 
1
0.36
4
Authors
2
Name
Order
Citations
PageRank
Surendra Sedhai1542.83
Aixin Sun23071156.89