Title
Vandals and Hoaxes on the Web.
Abstract
Web is a space for all, where everybody can read, publish and share information. This has had tremendous positive impact on the lives of billions of people. Wikipedia, being the largest encyclopedia and free, is a major source of information for many. However, since anyone can edit its articles, it is easy to add undesirable content and misinformation. These raise concerns about its credibility and safety, and that of the Web in general. In this talk, I will describe algorithms to identify two different aspects of undesirable actors and acts on Wikipedia: vandals and hoaxes. First, I will present the state-of-the-art system to detect vandals on Wikipedia called VEWS, which stands for Vandal Early Warning System [1]. Vandals are editors who make unconstructive edits on Wikipedia. VEWS models the editing behavior of all editors on Wikipedia, both benign and vandals, and then builds upon the differences in their behavior to identify the vandals. VEWS achieves an accuracy of over 85% and outperforms ClueBot NG and STiki, the best known algorithms that fight vandalism. Moreover, on average, VEWS detects vandals 2.39 edits before ClueBot NG. Furthermore, the combination of the two gives a fully automatic vandal early warning system with even higher accuracy. Second, I will present an in-depth study of hoaxes on Wikipedia [2]. Hoaxes are fake articles on Wikipedia that are deliberately created to mislead others. By studying over 22,000 hoaxes that have been created on Wikipedia, I will discuss their real-world impact, characteristics and finally, their detection. In terms of impact, while most hoaxes are detected quickly, a small number of hoaxes survive for a long time and are well cited across the Web. The characteristics of hoaxes are defined in terms of article structure and content, embeddedness in the rest of Wikipedia and the creator of the article. Finally, I will discuss an algorithm that uses these findings to determine whether an article is a hoax.
Year
DOI
Venue
2016
10.1145/3002137.3002139
CyberSafety@CIKM
Keywords
Field
DocType
Web, Wikipedia, Vandals, Hoax, Misinformation
Publication,Data mining,World Wide Web,Internet privacy,Credibility,Information retrieval,Computer science,Hoax,Encyclopedia,Early warning system
Conference
Citations 
PageRank 
References 
0
0.34
1
Authors
1
Name
Order
Citations
PageRank
Srijan Kumar132624.97