Title
HBTM: A Heartbeat-based Behavior Detection Mechanism for POSIX Threads and OpenMP Applications.
Abstract
Extreme-scale computing involves hundreds of millions of threads with multi-level parallelism running on large-scale hierarchical and heterogeneous hardware. In POSIX threads and OpenMP applications, some key behaviors occurring in runtime such as thread failure, busy waiting, and exit need to be accurately and timely detected. However, for the most of these applications, there are lack of unified and efficient detection mechanisms to do this. In this paper, a heartbeat-based behavior detection mechanism for POSIX threads (Pthreads) and OpenMP applications (HBTM) is proposed. In the design, two types of implementations are conducted, centralized and decentralized respectively. In both implementations, unified API has been designed to guarantee the generality of the mechanism. Meanwhile, a ring-based detection algorithm is designed to ease the burden of the centra thread at runtime. To evaluate the mechanism, the NAS Parallel Benchmarks (NPB) are used to test the performance of the HBTM. The experimental results show that the HBTM supports detection of behaviors of POSIX threads and OpenMP applications while acquiring a short latency and near 1% overhead.
Year
Venue
Field
2015
arXiv: Distributed, Parallel, and Cluster Computing
setcontext,Heartbeat,Computer science,Latency (engineering),Parallel computing,Busy waiting,POSIX Threads,Real-time computing,Thread (computing),Implementation,Distributed computing
DocType
Volume
Citations 
Journal
abs/1512.00665
0
PageRank 
References 
Authors
0.34
5
5
Name
Order
Citations
PageRank
Weidong Wang181.47
Chunhua Liao233030.72
Liqiang Wang370356.71
Daniel J. Quinlan465280.13
Wei Lu514435.60