Title
Research Report: Building a File Observatory for Secure Parser Development
Abstract
Parsing untrusted data is notoriously challenging. Failure to handle maliciously crafted data correctly can (and does) lead to a wide range of vulnerabilities. The Language-theoretic security (LangSec) philosophy seeks to obviate the need for developers to apply ad hoc solutions by, instead, offering formally correct and verifiable input handling throughout the software development lifecycle. One of the key components in developing secure parsers is a broad coverage corpus that enables developers to understand the problem space for a given format and to use, potentially, as seeds for fuzzing and other automated testing. In this paper, we offer an update on work initially reported at the LangSec 2020 conference on the development of a file observatory to gather and enable analysis on a diverse collection of files at scale. The initial focus of the observatory is on Portable Document Format (PDF) files and file formats typically embedded in PDFs. In this paper, we report on the addition of a bug tracker corpus and new analytic methods.
Year
DOI
Venue
2021
10.1109/SPW53761.2021.00025
2021 IEEE Security and Privacy Workshops (SPW)
Keywords
DocType
ISBN
LangSec,language-theoretic security,file corpus creation,file forensics,text extraction,parser resources
Conference
978-1-6654-3733-2
Citations 
PageRank 
References 
2
0.41
0
Authors
6
Name
Order
Citations
PageRank
Tim Allison141.17
Wayne Burke241.17
Chris A. Mattmann320025.39
Anastasija Mensikova420.41
Philip Southam541.17
Ryan Stonebraker641.17