Title
Use of SIMD Vector Operations to Accelerate Application Code Performance on Low-Powered ARM and Intel Platforms
Abstract
Augmenting a processor with special hardware that is able to apply a Single Instruction to Multiple Data(SIMD) at the same time is a cost effective way of improving processor performance. It also offers a means of improving the ratio of processor performance to power usage due to reduced and more effective data movement and intrinsically lower instruction counts. This paper considers and compares the NEON SIMD instruction set used on the ARM Cortex-A series of RISC processors with the SSE2 SIMD instruction set found on Intel platforms within the context of the Open Computer Vision (OpenCV) library. The performance obtained using compiler auto-vectorization is compared with that achieved using hand-tuning across a range of five different benchmarks and ten different hardware platforms. On the ARM platforms the hand-tuned NEON benchmarks were between 1.05x and13.88x faster than the auto-vectorized code, while for the Intel platforms the hand-tuned SSE benchmarks were between1.34x and 5.54x faster.
Year
DOI
Venue
2013
10.1109/IPDPSW.2013.207
IPDPS Workshops
Keywords
Field
DocType
simd vector operations,neon simd instruction,processor performance,hand-tuned neon benchmarks,lower instruction count,different benchmarks,low-powered arm,hand-tuned sse benchmarks,risc processor,sse2 simd instruction,arm cortex-a series,intel platforms,intel platform,accelerate application code performance,risc processors,registers,image processing,neon,simd,benchmark testing,arm,parallel processing,vectorization,assembly,vectors,reduced instruction set computing
MMX,SSE2,Instruction set,Computer science,Parallel computing,SIMD,Compiler,SSE3,Reduced instruction set computing,Benchmark (computing)
Conference
Citations 
PageRank 
References 
25
1.33
11
Authors
5
Name
Order
Citations
PageRank
Gaurav Mitra1494.29
Beau Johnston2264.05
Alistair P. Rendell320934.55
Eric McCreath413214.64
Jun Zhou5251.33