Title
Optimizing Overlapped Memory Accesses in User-directed Vectorization
Abstract
Current processors incorporate wide and powerful vector units whose optimal exploitation is crucial to reach peak performance. However, present autovectorizing compilers fall short of that goal. Exploiting some vector instructions requires aggressive approaches that are not affordable in production compilers. Thus, advanced programmers pursuing the best performance from their applications are compelled to manually vectorize them using low-level SIMD intrinsics. We propose a user-directed code optimization that targets overlapped vector loads, i.e., vector loads that read scalar elements redundantly from memory. Instead, our optimization loads these elements once and combines them using advanced register-to-register vector instructions.This code is potentially more efficient and it uses advanced vector instructions that compilers do not widely exploit automatically. We also extend the OpenMP* SIMD directives with a new clause called overlap that allows users to easily enable and tune this optimization on demand. We implement our proposal for the Intel® Xeon Phi™ coprocessor. Our evaluation shows up to 29% speed-up over five highly-optimized stencil kernels and workloads from real-world applications. Results also demonstrate how important user hints are to maximize performance.
Year
DOI
Venue
2015
10.1145/2751205.2751224
International Conference on Supercomputing
Keywords
Field
DocType
SIMD, Vectorization, Compiler Optimization, OpenMP, Stencil, Intel Many Integrated Core Architecture
Program optimization,Computer science,Xeon Phi,Parallel computing,SIMD,Vectorization (mathematics),Optimizing compiler,Compiler,Coprocessor,Intrinsics
Conference
Citations 
PageRank 
References 
4
0.39
17
Authors
5
Name
Order
Citations
PageRank
Diego Caballero1202.51
Sara Royuela2245.23
Roger Ferrer3394.04
Alejandro Duran494361.43
Xavier Martorell51470125.40