Title
Exploiting Very-Wide Vectors on Intel Xeon Phi with Lattice-QCD Kernels
Abstract
Our target in this work is to study ways of exploring the parallelism offered by vectorization on accelerators with very wide vector units. To this end, we implemented two kernels that derive from the Wilson Dslash operator and investigate several data layout techniques for increasing the scalability of lattice QCD scientific kernels suitable for the Intel Xeon Phi. In parts of the application where real numbers are used for computation, we see a 6.6x increase in bandwidth compared to scalar code, thanks to the auto-vectorization by the compiler. In other kernels where arithmetic operations on complex numbers dominate, our hand-vectorized code out-performs the auto-vectorization of the compiler. In this paper we find that our proposed Hopping Vector-friendly Ordering allows for more efficient vectorization of complex arithmetic floating point operations. Using this data layout, we manage to increase the sustained bandwidth by approximately 1.8x.
Year
DOI
Venue
2016
10.1109/PDP.2016.116
2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)
Keywords
Field
DocType
Lattice QCD,Xeon Phi,many-cores,accelerators
Kernel (linear algebra),Xeon Phi,Floating point,Computer science,Parallel computing,Vectorization (mathematics),Compiler,Lattice QCD,Coprocessor,Scalability
Conference
ISSN
Citations 
PageRank 
1066-6192
1
0.35
References 
Authors
7
3
Name
Order
Citations
PageRank
Andreas Diavastos194.48
Giannos Stylianou211.03
Giannis Koutsou310.69