PhishStorm: Detecting Phishing With Streaming Analytics - Citegraph

Paper Info

Title
PhishStorm: Detecting Phishing With Streaming Analytics

Abstract
Despite the growth of prevention techniques, phishing remains an important threat since the principal countermeasures in use are still based on reactive URL blacklisting. This technique is inefficient due to the short lifetime of phishing Web sites, making recent approaches relying on real-time or proactive phishing URL detection techniques more appropriate. In this paper, we introduce PhishStorm, an automated phishing detection system that can analyze in real time any URL in order to identify potential phishing sites. PhishStorm can interface with any email server or HTTP proxy. We argue that phishing URLs usually have few relationships between the part of the URL that must be registered (low-level domain) and the remaining part of the URL (upper-level domain, path, query). We show in this paper that experimental evidence supports this observation and can be used to detect phishing sites. For this purpose, we define the new concept of intra-URL relatedness and evaluate it using features extracted from words that compose a URL based on query data from Google and Yahoo search engines. These features are then used in machine-learning-based classification to detect phishing URLs from a real dataset. Our technique is assessed on 96 018 phishing and legitimate URLs that result in a correct classification rate of 94.91% with only 1.44% false positives. An extension for a URL phishingness rating system exhibiting high confidence rate ( $>$ 99%) is proposed. We discuss in this paper efficient implementation patterns that allow real-time analytics using Big Data architectures such as STORM and advanced data structures based on the Bloom filter.

Year	DOI	Venue
2014	10.1109/TNSM.2014.2377295	Network and Service Management, IEEE Transactions
Keywords	Field	DocType
Big Data,Web sites,computer crime,data analysis,data structures,feature extraction,learning (artificial intelligence),pattern classification,search engines,unsolicited e-mail,Bloom filter,Google search engines,HTTP proxy,PhishStorm,STORM,URL blacklisting,URL phishingness rating system,Yahoo search engines,advanced data structures,automated phishing detection system,big data architectures,email server,feature extraction,intraURL relatedness,legitimate URLs,machine-learning-based classification,phishing Web sites,prevention techniques,proactive phishing URL detection techniques,query data,real-time analytics,real-time phishing URL detection techniques,streaming analytics,Big Data,Machine Learning,Mining and Statistical Methods,Phishing Detection,STORM,Search Engine Query Data,Security Management,Security management,URL Rating,URL rating,Word Relatedness,big data,machine learning,mining and statistical methods,phishing detection,search engine query data,word relatedness	Bloom filter,Data mining,Data structure,Phishing,Computer science,Semantic URL,Spoofed URL,Analytics,Big data,False positive paradox	Journal
Volume	Issue	ISSN
11	4	1932-4537
Citations	PageRank	References
22	0.82	27
Authors
4

Authors (4 rows)

Cited by (22 rows)

References (27 rows)

Name	Order	Citations	PageRank
Samuel Marchal	1	146	11.72
Jerome Francois	2	57	4.39
Radu State	3	623	86.87
Thomas Engel	4	455	42.34

1