Title
SCORPIO: A 36-core research chip demonstrating snoopy coherence on a scalable mesh NoC with in-network ordering
Abstract
In the many-core era, scalable coherence and on-chip interconnects are crucial for shared memory processors. While snoopy coherence is common in small multicore systems, directory-based coherence is the de facto choice for scalability to many cores, as snoopy relies on ordered interconnects which do not scale. However, directory-based coherence does not scale beyond tens of cores due to excessive directory area overhead or inaccurate sharer tracking. Prior techniques supporting ordering on arbitrary unordered networks are impractical for full multicore chip designs We present SCORPIO, an ordered mesh Network-on-Chip (NoC) architecture with a separate fixed-latency, bufferless network to achieve distributed global ordering. Message delivery is decoupled from the ordering, allowing messages to arrivein any order and at any time, and still be correctly ordered. The architecture is designed to plug-and-play with existing multicore IP and with practicality, timing, area, and power as top concerns. Full-system 36 and 64-core simulations on SPLASH-2 and PARSEC benchmarks show an average application runtime reduction of 24.1% and 12.9%, in comparison to distributed directory and AMD HyperTransport coherence protocols, respectively The SCORPIO architecture is incorporated in an 11 mm-by-13mm chip prototype, fabricated in IBM 45nm SOI technology, comprising 36 Freescale e200 Power ArchitectureTMcores with private L1 and L2 caches interfacing with the NoC via ARM AMBA, along with two Cadence on-chip DDR2 controllers. The chip prototype achieves a post synthesis operating frequency of 1 GHz (833MHz post-layout) with an estimated power of 28.8W (768mW per tile), while the network consumes only 10% of tile area and 19 % of tile power.
Year
DOI
Venue
2014
10.1109/ISCA.2014.6853232
Computer Architecture
Keywords
DocType
ISSN
cache storage,network-on-chip,shared memory systems,36-core research chip,amd hypertransport coherence protocols,cadence on-chip ddr2 controllers,parsec benchmark,scorpio,soi technology,splash-2 benchmark,directory-based coherence,distributed directory protocols,frequency 1 ghz,in-network ordering,many-core era,message delivery,multicore chip design,multicore systems,power 28.8 w,private l1 cache,private l2 cache,scalable mesh noc,snoopy coherence,tile power
Conference
1063-6897
ISBN
Citations 
PageRank 
978-1-4799-4394-4
34
1.05
References 
Authors
21
10
Name
Order
Citations
PageRank
bhavya kishor daya1432.23
Owen Chia-Hsin Chen250718.69
Suvinay Subramanian31699.54
Woo-Cheol Kwon430015.08
Sunghyun Park515410.83
Tushar Krishna6186486.95
j r holt7341.05
Anantha P. Chandrakasan8144421946.93
Li-Shiuan Peh95077398.57
Jim Holt10433.06