Title
Compiler supports for VLIW DSP processors with SIMD intrinsics
Abstract
To sustain growing multimedia workload, modern digital signal processing (DSP) processors are commonly equipped with subword instructions to accelerate signal processing. Besides subword, functional units of very long instruction word (VLIW) DSP processors can also be employed to process multiple data streams in parallel. However, because of power and area concerns, many embedded VLIW DSP processors adopt distributed register files to reduce read/write ports and wire connection by privatizing register files for clusters and even for functional units. The distributed design presents great challenges to compilers in distributing single instruction, multiple data (SIMD) workload to functional units. In this paper, we address the issue in supporting SIMD parallelism on VLIW DSP processors with subword instructions and distributed register files. Currently, industrial practices have adopted intrinsics that enable developers to utilize hardware resources and compete with hand-coded assembly in performance. However, it is still an open issue to provide such a solution for VLIW DSP processors with distributed register files. In this work, we provide SIMD intrinsics to allow programmers to write highly optimized codes by following given programming guides. In addition, an enhanced register allocation scheme and data replication optimizations are devised to enable efficient code generation. In our experiments, DSPstone benchmark and a set of H.264 kernels are used to evaluate the proposed programming and optimization schemes. The result shows that by combining SIMD intrinsics and compiler optimizations, one is able to obtain remarkable performance improvements, speedups of 2.9 and 3.5 for DSPstone and H.264 kernels, respectively. Copyright © 2011 John Wiley & Sons, Ltd.
Year
DOI
Venue
2012
10.1002/cpe.1845
Concurrency and Computation: Practice and Experience
Keywords
Field
DocType
VLIW DSP processor,register file,functional unit,SIMD intrinsics,enhanced register allocation scheme,H.264 kernel,subword instruction,embedded VLIW DSP processor,compiler support,SIMD parallelism,DSP processor
Computer architecture,Register allocation,Computer science,Very long instruction word,Parallel computing,SIMD,Code generation,Compiler,Optimizing compiler,Texas Instruments DaVinci,Intrinsics
Journal
Volume
Issue
ISSN
24
5
1532-0626
Citations 
PageRank 
References 
4
0.48
14
Authors
2
Name
Order
Citations
PageRank
Chi-Bang Kuan1194.06
Jenq Kuen Lee245948.71