Title
Mingling of Clear and Muddy Water: Understanding and Detecting Semantic Confusion in Blackhat SEO
Abstract
Search Engine Optimization (SEO) is a set of techniques that help website operators increase the visibility of their webpages to search engine users. However, there are also many unethical practices that abuse ranking algorithms of a search engine to promote illegal online content, called blackhat SEO. In this paper, we make the first attempt to systematically investigate a recent trend in blackhat SEO, semantic confusion, which mingles the content of a webpage to deceive existing detection of blackhat SEO. In particular, from a new perspective of content semantics, we propose an effective defense against the semantic confusion based blackhat SEO. We built a prototype of our defense called SCDS, and then we validated its effectiveness based on 4.5 million domains randomly selected from 11 zone files and passive DNS records. Our evaluation results show that SCDS can detect more than 82 thousand blackhat SEO websites with a precision of 98.35%. We further analyzed 57,477 long-tail keywords promoted by blackhat SEO and found more than 157 SEO campaigns. Finally, we deployed SCDS into the gateway of a campus network for ten months and detected 23,093 domains with malicious semantic confusion content, showing the effectiveness of SCDS in practice.
Year
DOI
Venue
2021
10.1007/978-3-030-88418-5_13
COMPUTER SECURITY - ESORICS 2021, PT I
DocType
Volume
ISSN
Conference
12972
0302-9743
Citations 
PageRank 
References 
0
0.34
0
Authors
7
Name
Order
Citations
PageRank
Yang Hao1162.94
Kun Du2337.22
Yubao Zhang394.22
Shuai Hao4629.39
Haining Wang500.34
Jia Zhang6143.98
Haixin Duan723736.86