Title | ||
---|---|---|
A Crawler Architecture for Harvesting the Clear, Social, and Dark Web for IoT-Related Cyber-Threat Intelligence |
Abstract | ||
---|---|---|
The clear, social, and dark web have lately been identified as rich sources of valuable cyber-security information that -given the appropriate tools and methods-may be identified, crawled and subsequently leveraged to actionable cyber-threat intelligence. In this work, we focus on the information gathering task, and present a novel crawling architecture for transparently harvesting data from security websites in the clear web, security forums in the social web, and hacker forums/marketplaces in the dark web. The proposed architecture adopts a two-phase approach to data harvesting. Initially a machine learning-based crawler is used to direct the harvesting towards websites of interest, while in the second phase state-of-the-art statistical language modelling techniques are used to represent the harvested information in a latent low-dimensional feature space and rank it based on its potential relevance to the task at hand. The proposed architecture is realised using exclusively open-source tools, and a preliminary evaluation with crowdsourced results demonstrates its effectiveness. |
Year | DOI | Venue |
---|---|---|
2019 | 10.1109/SERVICES.2019.00016 | 2019 IEEE World Congress on Services (SERVICES) |
Keywords | Field | DocType |
IoT,cyber security,cyber threat intelligence,crawling architecture,machine learning,language models | World Wide Web,Architecture,Feature vector,Crawling,Social web,Computer science,Hacker,Deep Web,Web crawler,Database,Language model | Conference |
Volume | ISSN | ISBN |
2642-939X | 2378-3818 | 978-1-7281-3852-7 |
Citations | PageRank | References |
2 | 0.40 | 25 |
Authors | ||
4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Paris Koloveas | 1 | 3 | 0.76 |
Thanasis Chantzios | 2 | 2 | 0.74 |
Christos Tryfonopoulos | 3 | 246 | 21.99 |
Spiros Skiadopoulos | 4 | 1139 | 65.60 |