Title
A Performance Counter Based Workload Characterization on Blue Gene/P
Abstract
IBM's Blue Gene/P, the second generation of the Blue Genesupercomputer is designed with a Universal Performance Counter (UPC) Unit at each node capable of monitoring 256 events concurrently, unlike many microprocessors that provide only a few performance counters. In this paper we demonstrate the efficacy of the interface library that we have developed, taking advantage of the UPC unit, enabling users to effortlessly instrument applications and get a profound insight into its execution on the Blue Gene/P system which could scale in thousands of nodes. The interface library allows the user to monitor about 512 performance related events out of a total of 1024 possible events and aggregate the data collected at different nodes and compute meaningful metrics through data mining.Using the developed interface, we instrumented the NAS parallel benchmarks and collected the performance counter data. We studied the MFLOPS, L3-DDR Traffic and the dynamic instruction mix based on the counters in the FPU and the cache hierarchy for different compiler optimizations, modes of operations of the system and different L3, L2 configurations for the NAS benchmarks. Our analysis identifies that compiler optimization O5 along with "-qarch440d", which uses the architectural information of the chip in optimization, is very effective in incorporating a lot of SIMD instructions and results in the most efficient execution of the benchmarks. The experiments on the L3 size indicate that an L3 size of 4MB is optimal for the NAS benchmarks and they do not benefit by increasing it further. Also, the virtual node mode of operation of the Blue Gene/P system is very effective and yields superior performance for the selected benchmarks taking advantage of the chip multiprocessor architecture of the quad-core HPC chip.
Year
DOI
Venue
2008
10.1109/ICPP.2008.57
Portland, OR
Keywords
Field
DocType
workload characterization,interface library,performance counter,selected benchmarks,blue gene,l3 size,nas parallel benchmarks,blue genesupercomputer,p system,nas benchmarks,performance counter data,chip,benchmark testing,compiler optimization,radiation detectors,data collection,supercomputing,computer architecture,high performance computing,data mining,optimization
IBM,Supercomputer,FLOPS,Computer science,Block cipher mode of operation,Parallel computing,SIMD,Chip,Optimizing compiler,Benchmark (computing),Operating system,Distributed computing
Conference
ISSN
ISBN
Citations 
0190-3918 E-ISBN : 978-0-7695-3374-2
978-0-7695-3374-2
12
PageRank 
References 
Authors
0.63
9
4
Name
Order
Citations
PageRank
Karthik Ganesan11289.41
Lizy Kurian John22315185.19
Valentina Salapura338342.12
James Sexton4484.11