Title | ||
---|---|---|
Energy-Efficient Processor Design Using Multiple Clock Domains with Dynamic Voltage and Frequency Scaling |
Abstract | ||
---|---|---|
As clock frequency increases and feature size decreases, clock distribution and wire delays present a growing challenge to the designers of singly-clocked, globally synchronous systems. We describe an alternative approach, which we call a Multiple Clock Domain (MCD) processor, in which the chip is divided into several (coarse-grained) clock domains, within which independent voltage and frequency scaling can be performed. Boundaries between domains are chosen to exploit existing queues, thereby minimizing inter-domain synchronization costs. We propose four clock domains, corresponding to the front end (including L1 instruction cache), integer units, floating point units, and load-store units (including L1 data cache and L2 cache). We evaluate this design using a simulation infrastructure based on SimpleScalar and Wattch. In an attempt to quantify potential energy savings independent of any particular on-line control strategy, we use off-line analysis of traces from a single-speed run of each of our benchmark applications to identify profitable reconfiguration points for a subsequent dynamic scaling run. Dynamic runs incorporate a detailed model of inter-domain synchronization delays, with latencies for intra-domain scaling similar to the whole-chip scaling latencies of Intel XScale and Transmeta LongRun technologies. Using applications from the MediaBench, Olden, and SPEC2000 benchmark suites, we obtain an average energy-delay product improvement of 20% with MCD compared to a modest 3% savings from voltage scaling a single clock and voltage system. |
Year | DOI | Venue |
---|---|---|
2002 | 10.1109/HPCA.2002.995696 | HPCA |
Keywords | Field | DocType |
multiple clock domains,dynamic voltage,single clock,clock frequency increase,subsequent dynamic scaling run,frequency scaling,energy-efficient processor design,l1 instruction cache,whole-chip scaling latency,clock domain,l1 data cache,l2 cache,clock distribution,chip,logic simulation,profitability,integrated circuit design,synchronisation,energy efficient,front end,potential energy,low power electronics | Clock gating,Computer science,Underclocking,Parallel computing,Real-time computing,Clock synchronization,Synchronous circuit,Clock skew,Digital clock manager,CPU multiplier,Clock rate | Conference |
Citations | PageRank | References |
136 | 11.45 | 11 |
Authors | ||
6 |
Name | Order | Citations | PageRank |
---|---|---|---|
Greg Semeraro | 1 | 448 | 29.82 |
Grigorios Magklis | 2 | 702 | 45.64 |
Rajeev Balasubramonian | 3 | 2302 | 116.79 |
Albonesi, David H. | 4 | 2091 | 165.88 |
Sandhya Dwarkadas | 5 | 3504 | 257.31 |
Michael L. Scott | 6 | 2843 | 248.01 |