Automated Discovery of Internet Censorship by Web Crawling. - Citegraph

Paper Info

Title
Automated Discovery of Internet Censorship by Web Crawling.

Abstract
Censorship of the Internet is widespread around the world. As access to the web becomes increasingly ubiquitous, filtering of this resource becomes more pervasive. Transparency about specific content and information that citizens are denied access to is atypical. To counter this, numerous techniques for maintaining URL filter lists have been proposed by various individuals, organisations and researchers. These aim to improve empirical data on censorship for benefit of the public and wider censorship research community, while also increasing the transparency of filtering activity by oppressive regimes. We present a new approach for discovering filtered domains in different target countries. This method is fully automated and requires no human interaction. The system uses web crawling techniques to traverse between filtered sites and implements a robust method for determining if a domain is filtered. We demonstrate the effectiveness of the approach by running experiments to search for filtered content in four different censorship regimes. Our results show that we perform better than the current state of the art and have built domain filter lists an order of magnitude larger than the most widely available public lists as of April 2018. Further, we build a dataset mapping the interlinking nature of blocked content between domains and exhibit the tightly networked nature of censored web resources.

Year	DOI	Venue
2018	10.1145/3201064.3201091	WebSci '18: 10th ACM Conference on Web Science Amsterdam Netherlands May, 2018
Keywords	DocType	Volume
censorship, DNS, filtering, transparency, monitoring	Conference	abs/1804.03056
ISBN	Citations	PageRank
978-1-4503-5563-6	1	0.37
References	Authors
16	3

Authors (3 rows)

Cited by (1 rows)

References (16 rows)

Name	Order	Citations	PageRank
Alexander Darer	1	8	1.86
Oliver Farnan	2	8	1.86
Joss Wright	3	39	6.42

1