Title
Speculative hardware/software co-designed floating-point multiply-add fusion
Abstract
A Fused Multiply-Add (FMA) instruction is currently available in many general-purpose processors. It increases performance by reducing latency of dependent operations and increases precision by computing the result as an indivisible operation with no intermediate rounding. However, since the arithmetic behavior of a single-rounding FMA operation is different than independent FP multiply followed by FP add instructions, some algorithms require significant revalidation and rewriting efforts to work as expected when they are compiled to operate with FMA--a cost that developers may not be willing to pay. Because of that, abundant legacy applications are not able to utilize FMA instructions. In this paper we propose a novel HW/SW collaborative technique that is able to efficiently execute workloads with increased utilization of FMA, by adding the option to get the same numerical result as separate FP multiply and FP add pairs. In particular, we extended the host ISA of a HW/SW co-designed processor with a new Combined Multiply-Add (CMA) instruction that performs an FMA operation with an intermediate rounding. This new instruction is used by a transparent dynamic translation software layer that uses a speculative instruction-fusion optimization to transform FP multiply and FP add sequences into CMA instructions. The FMA unit has been slightly modified to support both single-rounding and double-rounding fused instructions without increasing their latency and to provide a conservative fall-back path in case of mispeculation. Evaluation on a cycle-accurate timing simulator showed that CMA improved SPECfp performance by 6.3% and reduced executed instructions by 4.7%.
Year
DOI
Venue
2014
10.1145/2541940.2541978
ASPLOS
Keywords
Field
DocType
speculative hardware,floating-point multiply-add fusion,fma instruction,fma operation,separate fp,single-rounding fma operation,dependent operation,independent fp,fused instruction,intermediate rounding,cma instruction,fma unit,fma
Computer science,Floating point,Latency (engineering),Machine translation,Parallel computing,SPECfp,Real-time computing,Rounding,Rewriting,Hardware software,Legacy system
Conference
Volume
Issue
ISSN
42
1
0163-5964
Citations 
PageRank 
References 
4
0.39
25
Authors
7
Name
Order
Citations
PageRank
Marc Lupon1814.08
Enric Gibert2877.85
Grigorios Magklis370245.64
Sridhar Samudrala493.23
Raúl Martínez550.74
Kyriakos Stavrou61038.61
David Ditzel75333.98