Title
Performance analysis and tuning for clusters with ccNUMA nodes for scientific coputing - a case study.
Abstract
In the quest for higher performance and with the increasing availability of multi-core chips, many systems are currently packing more processors per node. Adopting a ccNUMA node architecture in these cases has the promise of achieving a balance between cost and performance. In this paper, a 2312 Opteron cores system based on Sun Fire servers is considered as a case study to examine the performance issues associated with such architectures. In this study, we characterize the performance behavior of the system with focus on the node level using different configurations. It will be shown that the benefits from larger nodes can be severely limited for many reasons. These reasons were isolated, the associated performance losses were assessed, and some potential solutions were proposed. With the proposed performance tunings, up to 30% application performance improvement was observed. The results revealed that such problems were mainly caused by topological imbalances, limitations of the cache coherence protocol used, operating system services distribution and the lack of intelligent management of memory affinity. In addition, provided experimental analysis can be utilized by HPC application developers in order to better understand clusters with ccNUMA nodes and also as a guideline for the use of such architectures for scientific computing.
Year
Venue
Keywords
2009
COMPUTER SYSTEMS SCIENCE AND ENGINEERING
ccNUMA,performance evaluation,memory locality,cluster computing,multi-core architectures,cache coherence protocol
Field
DocType
Volume
Cluster (physics),Computer science,Distributed computing
Journal
24
Issue
ISSN
Citations 
5
0267-6192
0
PageRank 
References 
Authors
0.34
0
5
Name
Order
Citations
PageRank
Abdullah Kayi1404.80
Edward Kornkven2101.42
tarek elghazawi369784.30
Samy Al-Bahra450.81
Gregory B. Newby522032.13