Title
A Memory-Efficient Implementation of a Plasmonics Simulation Application on SX-ACE.
Abstract
Since recent scientific and engineering simulations require heavy computations with large volumes of data, High-performance Computing (HPC) systems need a high computational capability with a large memory capacity. Most recent HPC systems adopt a parallel processing architecture, where the computational capability of the processors is increasing, however, the performance of the memory system is constrained. The bytes per flop (B/F), which is a ratio of the memory bandwidth to the flop/s, for the HPC systems have been reduced with the evolution of the HPC systems. To fully exploit the potential of the recent HPC systems, and to meet the increasing demand for large memory, it is necessary to optimize practical scientific and engineering applications, considering not only the parallelism of the applications, but also the limitations of the memory subsystems of the HPC systems. In this paper, we discuss a set of approaches to optimization of the memory access behavior of the applications, which enable their executions with improved performance on the recent HPC systems. Our approaches include memory optimizations through memory footprint controlling, restructuring of data structures for active elements, redundant data structure elimination through combined calculations and optimized re-calculation of data. To validate the effectiveness of our approaches, a plasmonics simulation application is evaluated on vector platforms NEC SX-ACE, NEC SX-9, and Intel Xeon based platform NEC LX 406-Re2. By applying our approaches to the implementation, the memory usage of the plasmonics simulation application can be reduced up to nearly 1/71 of the original, and its execution can be possible on a single node of a distributed parallel system with smaller memory capacity. The optimization results in 1.14 times faster execution on SX-ACE and 1.81 times faster execution on LX 406-Re2.
Year
Venue
Field
2016
IJNC
Uniform memory access,Memory bandwidth,Computer science,Parallel computing,Distributed memory,Computing with Memory,Memory management,Non-uniform memory access,Memory footprint,CUDA Pinned memory,Embedded system
DocType
Volume
Issue
Journal
6
2
Citations 
PageRank 
References 
0
0.34
8
Authors
6
Name
Order
Citations
PageRank
Raghunandan Mathur101.35
Hiroshi Matsuoka252.42
Osamu Watanabe3960104.55
Akihiro Musa4358.08
Ryusuke Egawa510928.68
Hiroaki Kobayashi69816.62