Title
Optimizing N-dimensional, winograd-based convolution for manycore CPUs.
Abstract
Recent work on Winograd-based convolution allows for a great reduction of computational complexity, but existing implementations are limited to 2D data and a single kernel size of 3 by 3. They can achieve only slightly better, and often worse performance than better optimized, direct convolution implementations. We propose and implement an algorithm for N-dimensional Winograd-based convolution that allows arbitrary kernel sizes and is optimized for manycore CPUs. Our algorithm achieves high hardware utilization through a series of optimizations. Our experiments show that on modern ConvNets, our optimized implementation, is on average more than 3 x, and sometimes 8 x faster than other state-of-the-art CPU implementations on an Intel Xeon Phi manycore processors. Moreover, our implementation on the Xeon Phi achieves competitive performance for 2D ConvNets and superior performance for 3D ConvNets, compared with the best GPU implementations.
Year
DOI
Venue
2018
10.1145/3200691.3178496
PPOPP
Keywords
Field
DocType
convolution, parallelization, vectorization, winograd
Kernel (linear algebra),Xeon Phi,Convolution,Computer science,Parallel computing,Vectorization (mathematics),Implementation,Computational complexity theory
Conference
Volume
Issue
ISSN
53
1
0362-1340
Citations 
PageRank 
References 
3
0.39
23
Authors
4
Name
Order
Citations
PageRank
Zhen Jia133817.82
aleksandar zlateski2395.65
Frédo Durand38625414.94
Kai Li46492584.91