Adaptable Butterfly Accelerator for Attention-based NNs via Hardware and Algorithm Co-design - Citegraph

Paper Info

Title
Adaptable Butterfly Accelerator for Attention-based NNs via Hardware and Algorithm Co-design

Abstract
Attention-based neural networks have become pervasive in many AI tasks. Despite their excellent algorithmic performance, the use of the attention mechanism and feedforward network (FFN) demands excessive computational and memory resources, which often compromises their hardware performance. Although various sparse variants have been introduced, most approaches only focus on mitigating the quadratic scaling of attention on the algorithm level, without explicitly considering the efficiency of mapping their methods on real hardware designs. Furthermore, most efforts only focus on either the attention mechanism or the FFNs but without jointly optimizing both parts, causing most of the current designs to lack scalability when dealing with different input lengths. This paper systematically considers the sparsity patterns in different variants from a hardware perspective. On the algorithmic level, we propose FABNet, a hardware-friendly variant that adopts a unified butterfly sparsity pattern to approximate both the attention mechanism and the FFNs. On the hardware level, a novel adaptable butterfly accelerator is proposed that can be configured at runtime via dedicated hardware control to accelerate different butterfly layers using a single unified hardware engine. On the Long-Range-Arena dataset, FABNet achieves the same accuracy as the vanilla Transformer while reducing the amount of computation by 10$\sim66\times$ and the number of parameters 2$\sim22\times$. By jointly optimizing the algorithm and hardware, our FPGA-based butterfly accelerator achieves 14.2$\sim23.2\times$ speedup over state-of-the-art accelerators normalized to the same computational budget. Compared with optimized CPU and GPU designs on Raspberry Pi 4 and Jetson Nano, our system is up to $273.8\times$ and $15.1\times$ faster under the same power budget

Year	DOI	Venue
2022	10.1109/MICRO56248.2022.00050	2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO)
Keywords	DocType	ISBN
Adaptable Butterfly Accelerator,Attention-based Neural Networks,Butterfly Sparsity,Algorithm and Hardware Co-Design	Conference	978-1-6654-7428-3
Citations	PageRank	References
0	0.34	16
Authors
8

Authors (8 rows)

Cited by (0 rows)

References (16 rows)

Name	Order	Citations	PageRank
Hongxiang Fan	1	0	0.34
Thomas C. P. Chau	2	7	2.64
Stylianos I. Venieris	3	106	12.98
Royson Lee	4	8	2.85
Alexandros Kouris	5	0	0.34
Wayne Luk	6	3752	438.09
Nicholas D. Lane	7	4247	248.15
Mohamed S. Abdelfattah	8	144	13.65

1