STABILIZER: statistically sound performance evaluation - Citegraph

Paper Info

Title
STABILIZER: statistically sound performance evaluation

Abstract
Researchers and software developers require effective performance evaluation. Researchers must evaluate optimizations or measure overhead. Software developers use automatic performance regression tests to discover when changes improve or degrade performance. The standard methodology is to compare execution times before and after applying changes. Unfortunately, modern architectural features make this approach unsound. Statistically sound evaluation requires multiple samples to test whether one can or cannot (with high confidence) reject the null hypothesis that results are the same before and after. However, caches and branch predictors make performance dependent on machine-specific parameters and the exact layout of code, stack frames, and heap objects. A single binary constitutes just one sample from the space of program layouts, regardless of the number of runs. Since compiler optimizations and code changes also alter layout, it is currently impossible to distinguish the impact of an optimization from that of its layout effects. This paper presents Stabilizer, a system that enables the use of the powerful statistical techniques required for sound performance evaluation on modern architectures. Stabilizer forces executions to sample the space of memory configurations by repeatedly re-randomizing layouts of code, stack, and heap objects at runtime. Stabilizer thus makes it possible to control for layout effects. Re-randomization also ensures that layout effects follow a Gaussian distribution, enabling the use of statistical tests like ANOVA. We demonstrate Stabilizer's efficiency (<7% median overhead) and its effectiveness by evaluating the impact of LLVM's optimizations on the SPEC CPU2006 benchmark suite. We find that, while -O2 has a significant impact relative to -O1, the performance impact of -O3 over -O2 optimizations is indistinguishable from random noise.

Year	DOI	Venue
2013	10.1145/2451116.2451141	Special Interest Group on Computer Architecture
Keywords	Field	DocType
re-randomizing layout,automatic performance regression test,layout effect,software developer,exact layout,degrade performance,program layout,heap object,sound performance evaluation,effective performance evaluation,randomization,measurement bias	Null hypothesis,Computer science,Parallel computing,Optimizing compiler,Real-time computing,Heap (data structure),Regression testing,Gaussian,Software,Statistical hypothesis testing,Binary number	Conference
Volume	Issue	ISSN
41	1	0163-5964
Citations	PageRank	References
37	1.72	18
Authors
2

Authors (2 rows)

Cited by (37 rows)

References (18 rows)

Name	Order	Citations	PageRank
Charlie Curtsinger	1	339	12.95
Emery D. Berger	2	1048	55.87

1