Effect of Spam on Hashtag Recommendation for Tweets. - Citegraph

Paper Info

Title
Effect of Spam on Hashtag Recommendation for Tweets.

Abstract
Presence of spam tweets in a dataset may affect the choices of feature selection, algorithm formulation, and system evaluation for many applications. However, most existing studies have not considered the impact of spam tweets. In this paper, we study the impact of spam tweets on hashtag recommendation for hyperlinked tweets (i.e., tweets containing URLs) in HSpam14 dataset. HSpam14 is a collection of 14 million tweets with annotations of being spam and ham (i.e., non-spam). In our experiments, we observe that it is much easier to recommend "correct" hashtags for spam tweets than ham tweets, because of the near duplicates in spam tweets. Simple approaches like recommending most popular hashtags achieves very good accuracy on spam tweets. On the other hand, features that are highly effective on ham tweets may not be effective on spam tweets. Our findings suggest that without removing spam tweets from the data collection (as in most studies), the results obtained could be misleading for hashtag recommendation tasks.

Year	DOI	Venue
2016	10.1145/2872518.2889404	WWW '16: 25th International World Wide Web Conference Montréal Québec Canada April, 2016
Field	DocType	ISBN
Data mining,Data collection,World Wide Web,Social media,Feature selection,Computer science,System evaluation,Microblogging	Conference	978-1-4503-4144-8
Citations	PageRank	References
1	0.36	4
Authors
2

Authors (2 rows)

Cited by (1 rows)

References (4 rows)

Name	Order	Citations	PageRank
Surendra Sedhai	1	54	2.83
Aixin Sun	2	3071	156.89

1