Title | ||
---|---|---|
CAMA: Energy and Memory Efficient Automata Processing in Content-Addressable Memories |
Abstract | ||
---|---|---|
Accelerating finite automata processing is critical for advancing real-time analytic in pattern matching, data mining, bioinformatics, intrusion detection, and machine learning. Recent in-memory automata accelerators leveraging SRAMs and DRAMs have shown exciting improvements over conventional digital designs. However, the bit-vector representation of state transitions used by all state-of-the-art (SOTA) designs is only optimal in processing worst-case completely random patterns, while a significant amount of memory and energy is wasted in running most real-world benchmarks.We present CAMA, a Content-Addressable Memory (CAM) enabled Automata accelerator for processing homogeneous non-deterministic finite automata (NFA). A radically different state representation scheme, along with co-designed novel circuits and data encoding schemes, greatly reduces energy, memory, and chip area for most realistic NFAs. CAMA is holistically optimized with the following major contributions: (1) a 16 × 256 8-transistor (8T) CAM array for state matching, replacing the 256 × 256 6T SRAM array or two 16×256 6T SRAM banks in state-of-the-art (SOTA) designs; (2) a novel encoding scheme that enables content searching within 8T SRAMs and adapts to different applications; (3) a reconfigurable and scalable architecture that improves efficiency on all tested benchmarks, without losing support for any NFA that’s compatible with SOTA designs; (4) an optimization framework that automates the choice of encoding schemes and maps a given NFA to the proposed hardware.Two versions of CAMA, one optimized for energy (CAMA-E) and the other for throughput (CAMA-T), are comprehensively evaluated in a 28nm CMOS process, and across 21 real-world and synthetic benchmarks. CAMA-E achieves 2.1×, 2.8 ×, and 2.04× lower energy than CA, 2-stride Impala, and eAP. CAMA-T shows 2.68×, 3.87× and 2.62 × higher average compute density than 2-stride Impala, CA, and eAP. Both versions reduce the chip area required for the largest tested benchmark by 2.48× over CA, 1.91× over 2-stride Impala, and 1.78× over eAP. |
Year | DOI | Venue |
---|---|---|
2022 | 10.1109/HPCA53966.2022.00011 | 2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA) |
Keywords | DocType | ISSN |
Automata,Processing In Memory,Content Addressable Memory,Automata Processor,Pattern Matching | Conference | 1530-0897 |
ISBN | Citations | PageRank |
978-1-6654-2028-0 | 0 | 0.34 |
References | Authors | |
30 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Yi Huang | 1 | 0 | 0.34 |
Zhiyu Chen | 2 | 0 | 0.34 |
Dai Li | 3 | 0 | 0.34 |
Kuiyuan Yang | 4 | 148 | 20.89 |