Title
DeFT: Design space exploration for on-the-fly detection of coherence misses
Abstract
While multicore processors promise large performance benefits for parallel applications, writing these applications is notoriously difficult. Tuning a parallel application to achieve good performance, also known as performance debugging, is often more challenging than debugging the application for correctness. Parallel programs have many performance-related issues that are not seen in sequential programs. An increase in cache misses is one of the biggest challenges that programmers face. To minimize these misses, programmers must not only identify the source of the extra misses, but also perform the tricky task of determining if the misses are caused by interthread communication (i.e., coherence misses) and if so, whether they are caused by true or false sharing (since the solutions for these two are quite different). In this article, we propose a new programmer-centric definition of false sharing misses and describe our novel algorithm to perform coherence miss classification. We contrast our approach with existing data-centric definitions of false sharing. A straightforward implementation of our algorithm is too expensive to be incorporated in real hardware. Therefore, we explore the design space for low-cost hardware support that can classify coherence misses on-the-fly into true and false sharing misses, allowing existing performance counters and profiling tools to expose and attribute them. We find that our approximate schemes achieve good accuracy at only a fraction of the cost of the ideal scheme. Additionally, we demonstrate the usefulness of our work in a case study involving a real application.
Year
DOI
Venue
2011
10.1145/1970386.1970389
TACO
Keywords
Field
DocType
low-cost hardware support,real application,existing performance counter,parallel application,design space exploration,good performance,on-the-fly detection,false sharing,large performance benefit,parallel program,performance debugging,good accuracy,multicore processors
Cache,Computer science,Profiling (computer programming),Parallel computing,Correctness,False sharing,Real-time computing,Coherence (physics),Multi-core processor,Design space exploration,Debugging
Journal
Volume
Issue
ISSN
8
2
1544-3566
Citations 
PageRank 
References 
3
0.37
19
Authors
4
Name
Order
Citations
PageRank
Guru Venkataramani139429.49
Christopher J. Hughes298863.34
Sanjeev Kumar32727139.04
Milos Prvulovic492654.94