Title
A Dynamic Modulo Scheduling with Binary Translation: Loop optimization with software compatibility
Abstract
In the past years, many works have demonstrated the applicability of Coarse-Grained Reconfigurable Array (CGRA) accelerators to optimize loops by using software pipelining approaches. They are proven to be effective in reducing the total execution time of multimedia and signal processing applications. However, the run-time reconfigurability of CGRAs is hampered overheads introduced by the needed translation and mapping steps. In this work, we present a novel run-time translation technique for the modulo scheduling approach that can convert binary code on-the-fly to run on a CGRA. We propose a greedy approach, since the modulo scheduling for CGRA is an NP-complete problem. In addition to read-after-write dependencies, the dynamic modulo scheduling faces new challenges, such as register insertion to solve recurrence dependences and to balance the pipelining paths. Our results demonstrate that the greedy run-time algorithm can reach a near-optimal ILP rate, better than an off-line compiler approach for a 16-issue VLIW processor. The proposed mechanism ensures software compatibility as it supports different source ISAs. As proof of concept of scaling, a change in the memory bandwidth has been evaluated. In this analysis it is demonstrated that when changing from one memory access per cycle to two memory accesses per cycle, the modulo scheduling algorithm is able to exploit this increase in memory bandwidth and enhance performance accordingly. Additionally, to measure area and performance, the proposed CGRA was prototyped on an FPGA. The area comparisons show that a crossbar CGRA (with 16 processing elements and including an 4-issue VLIW host processor) is only 1.11 × bigger than a standalone 8-issue VLIW softcore processor.
Year
DOI
Venue
2016
10.1007/s11265-015-0974-8
Journal of Signal Processing Systems
Keywords
Field
DocType
Modulo scheduling,Binary translation,Run-time,Coarse-grained reconfigurable accelerator
Reconfigurability,Memory bandwidth,Software pipelining,Computer science,Scheduling (computing),Modulo,Very long instruction word,Parallel computing,Loop optimization,Real-time computing,Binary translation,Embedded system
Journal
Volume
Issue
ISSN
85
1
1939-8018
Citations 
PageRank 
References 
0
0.34
35
Authors
6
Name
Order
Citations
PageRank
Ricardo Ferreira14913.81
waldir denver200.34
Monica Magalhaes Pereira3162.60
Stephan Wong411912.80
carlos a lisbȏa500.34
Luigi Carro61393166.42