Title
ZCOMP: Reducing DNN Cross-Layer Memory Footprint Using Vector Extensions
Abstract
Deep Neural Networks (DNNs) are becoming the prevalent approach in computer vision, machine learning, natural language processing, and speech recognition applications. Although DNNs are perceived as compute-intensive tasks, they also apply intense pressure on the capacity and bandwidth of the memory hierarchy, primarily due to the large intermediate data communicated across network layers. Prior work on hardware DNN accelerators leverages the cross-layer data sparsity via fully-customized datapaths. However, dynamically compressing/expanding such data is a challenging task for general-purpose multi-processors with virtual memory and hardware-managed coherent cache hierarchies. In this paper, we observe that the DNN intermediate data is either sequentially streamed or reshaped with a regular transformation between layers. Hence, accesses to this data can tolerate a sequential or block sequential compression/expansion without requiring random element retrieval. Based on this insight, we propose ZCOMP, a CPU vector ISA extension tailored for DNN cross-layer communication. ZCOMP compactly represents zero value compression/expansion and fully automates the metadata generation, storage and retrieval which eliminates the need for several extra instruction executions and register usage. ZCOMP can be targeted both for inference and training to dynamically compress/expand cross-layer data before being written to memory. Our evaluations for individual layers and end-to-end DNN networks demonstrate that ZCOMP offers substantial data traffic reduction, both on-chip across cache-hierarchy and off-chip to DRAM, and performance improvements over no compression and existing AVX512 compression approaches.
Year
DOI
Venue
2019
10.1145/3352460.3358305
Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture
Keywords
Field
DocType
CPU, Deep learning, ISA, compression, memory system, sparsity
Dram,Metadata,Memory hierarchy,Computer science,Inference,Virtual memory,Parallel computing,Bandwidth (signal processing),Artificial intelligence,Deep learning,Memory footprint
Conference
ISBN
Citations 
PageRank 
978-1-4503-6938-1
5
0.44
References 
Authors
0
3
Name
Order
Citations
PageRank
Berkin Akin1835.59
Zeshan Chishti272334.65
Alaa R. Alameldeen3167280.06