Title
ReDeBug: Finding Unpatched Code Clones in Entire OS Distributions
Abstract
Programmers should never fix the same bug twice. Unfortunately this often happens when patches to buggy code are not propagated to all code clones. Unpatched code clones represent latent bugs, and for security-critical problems, latent vulnerabilities, thus are important to detect quickly. In this paper we present ReDeBug, a system for quickly finding unpatched code clones in OS-distribution scale code bases. While there has been previous work on code clone detection, ReDeBug represents a unique design point that uses a quick, syntax-based approach that scales to OS distribution-sized code bases that include code written in many different languages. Compared to previous approaches, ReDeBug may find fewer code clones, but gains scale, speed, reduces the false detection rate, and is language agnostic. We evaluated ReDeBug by checking all code from all packages in the Debian Lenny/Squeeze, Ubuntu Maverick/Oneiric, all Source Forge C and C++ projects, and the Linux kernel for unpatched code clones. ReDeBug processed over 2.1 billion lines of code at 700,000 LoC/min to build a source code database, then found 15,546 unpatched copies of known vulnerable code in currently deployed code by checking 376 Debian/Ubuntu security-related patches in 8 minutes on a commodity desktop machine. We show the real world impact of ReDeBug by confirming 145 real bugs in the latest version of Debian Squeeze packages.
Year
DOI
Venue
2012
10.1109/SP.2012.13
IEEE Symposium on Security and Privacy
Keywords
Field
DocType
debian lenny,unpatched code clone,vulnerable code,code clone,finding unpatched code clones,entire os distributions,debian squeeze package,os-distribution scale code base,fewer code clone,distribution-sized code base,source code database,code clone detection,cloning,computer bugs,scalability,security,linux,debug,kernel,linux kernel,source code,lines of code
False detection,Duplicate code,Programming language,Computer security,Source code,Computer science,Code clone,Operating system,Linux kernel,Source lines of code,Debugging,Scalability
Conference
Volume
Issue
ISSN
37
6
1081-6011
Citations 
PageRank 
References 
45
1.41
12
Authors
3
Name
Order
Citations
PageRank
Jiyong Jang129716.23
Abeer Agrawal2451.41
David Brumley32940142.75