Abstract | ||
---|---|---|
Inability to hide main memory latency has been increasingly limiting the performance of modern processors. The problem is worse in large-scale shared memory systems, where remote memory latencies are hundreds, and soon thousands, of processor cycles. To mitigate this problem, we propose an intelligent memory and cache coherence controller (AMC) that can execute Active Memory Operations (AMOs). AMOs are select operations sent to and executed on the home memory controller of data. AMOs can eliminate a significant number of coherence messages, minimize intranode and internode memory traffic, and create opportunities for parallelism. Our implementation of AMOs is cache-coherent and requires no changes to the processor core or DRAM chips.In this paper, we present the microarchitecture design of AMC, and the programming model of AMOs. We compare AMOs' performance to that of several other memory architectures on a variety of scientific and commercial benchmarks. Through simulation, we show that AMOs offer dramatic performance improvements for an important set of data-intensive operations, e.g., up to 50脳 faster barriers, 12脳 faster spinlocks, 8.5脳---15脳 faster stream/array operations, and 3脳 faster database queries. We also present an analytical model that can predict the performance benefits of using AMOs with decent accuracy. The silicon cost required to support AMOs is less than 1% of the die area of a typical high performance processor, based on a standard cell implementation. |
Year | DOI | Venue |
---|---|---|
2012 | 10.1007/s11227-011-0735-9 | The Journal of Supercomputing |
Keywords | Field | DocType |
performance benefit,remote memory latency,home memory controller,dramatic performance improvement,main memory latency,large-scale shared memory system,memory architecture,internode memory traffic,typical high performance processor,active memory controller,intelligent memory,chip,support,cache coherence,synchronization,distributed shared memory | Registered memory,Extended memory,Uniform memory access,Computer science,Non-uniform memory access,Memory controller,Distributed computing,Interleaved memory,Parallel computing,Distributed memory,Memory map,Operating system,Embedded system | Journal |
Volume | Issue | ISSN |
62 | 1 | 0920-8542 |
Citations | PageRank | References |
4 | 0.42 | 34 |
Authors | ||
7 |
Name | Order | Citations | PageRank |
---|---|---|---|
Zhen Fang | 1 | 86 | 4.87 |
Lixin Zhang | 2 | 571 | 45.96 |
John B. Carter | 3 | 1785 | 162.82 |
Sally A. Mckee | 4 | 1928 | 152.59 |
Ali Ibrahim | 5 | 4 | 0.42 |
Michael A. Parker | 6 | 4 | 0.42 |
Xiaowei Jiang | 7 | 7 | 1.51 |