Exploiting Very-Wide Vectors on Intel Xeon Phi with Lattice-QCD Kernels - Citegraph

Paper Info

Title
Exploiting Very-Wide Vectors on Intel Xeon Phi with Lattice-QCD Kernels

Abstract
Our target in this work is to study ways of exploring the parallelism offered by vectorization on accelerators with very wide vector units. To this end, we implemented two kernels that derive from the Wilson Dslash operator and investigate several data layout techniques for increasing the scalability of lattice QCD scientific kernels suitable for the Intel Xeon Phi. In parts of the application where real numbers are used for computation, we see a 6.6x increase in bandwidth compared to scalar code, thanks to the auto-vectorization by the compiler. In other kernels where arithmetic operations on complex numbers dominate, our hand-vectorized code out-performs the auto-vectorization of the compiler. In this paper we find that our proposed Hopping Vector-friendly Ordering allows for more efficient vectorization of complex arithmetic floating point operations. Using this data layout, we manage to increase the sustained bandwidth by approximately 1.8x.

Year	DOI	Venue
2016	10.1109/PDP.2016.116	2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)
Keywords	Field	DocType
Lattice QCD,Xeon Phi,many-cores,accelerators	Kernel (linear algebra),Xeon Phi,Floating point,Computer science,Parallel computing,Vectorization (mathematics),Compiler,Lattice QCD,Coprocessor,Scalability	Conference
ISSN	Citations	PageRank
1066-6192	1	0.35
References	Authors
7	3

Authors (3 rows)

Cited by (1 rows)

References (7 rows)

Name	Order	Citations	PageRank
Andreas Diavastos	1	9	4.48
Giannos Stylianou	2	1	1.03
Giannis Koutsou	3	1	0.69

1