Abstract | ||
---|---|---|
Recent technology improvements allow multiprocessor designers to put some key components inside the processor chip, such as the memory controller, the coherence hardware and the network interface/router. In this work we exploit such integration scale, presenting a novel node architecture aimed at reducing the long L2 miss latencies and the memory overhead of using directories that characterize cc-NUMA machines and limit their scalability. Our proposal replaces the traditional directory with a novel threelevel directory architecture and adds a small shared data cache to each of the nodes of a multiprocessor system. Due to their small size, the first-level directory and the shared data cache are integrated into the processor chip in every node. A taxonomy of the L2 misses, according to the actions performed by the directory to satisfy them is also presented. Using execution-driven simulations, we show significant L2 miss latency reductions (more than 60% in some cases). These important improvements translate into reductions of more than 30% in the application execution time in some cases. |
Year | DOI | Venue |
---|---|---|
2002 | 10.1109/IPDPS.2002.1015554 | IPDPS |
Keywords | DocType | ISBN |
protocols,hardware,taxonomy,memory controller,process design,computer networks,network interfaces,scalability,network interface | Conference | 0-7695-1573-8 |
Citations | PageRank | References |
8 | 0.60 | 9 |
Authors | ||
4 |
Name | Order | Citations | PageRank |
---|---|---|---|
M. E. Acacio | 1 | 419 | 41.45 |
José González | 2 | 526 | 35.85 |
J. M. García | 3 | 588 | 58.90 |
José Duato | 4 | 3481 | 294.85 |