Title
Staged Reads: Mitigating the impact of DRAM writes on DRAM reads
Abstract
Main memory latencies have always been a concern for system performance. Given that reads are on the critical path for CPU progress, reads must be prioritized over writes. However, writes must be eventually processed and they often delay pending reads. In fact, a single channel in the main memory system offers almost no parallelism between reads and writes. This is because a single off-chip memory bus is shared by reads and writes and the direction of the bus has to be explicitly turned around when switching from writes to reads. This is an expensive operation and its cost is amortized by carrying out a burst of writes or reads every time the bus direction is switched. As a result, no reads can be processed while a memory channel is busy servicing writes. This paper proposes a novel mechanism to boost read-write parallelism and perform useful components of read operations even when the memory system is busy performing writes. If some of the banks are busy servicing writes, we start issuing reads to the other idle banks. The results of these reads are stored in a few registers near the memory chip's I/O pads. These results are quickly returned immediately following the bus turnaround. The process is referred to as a Staged Read because it decouples a single read operation into two stages, with the first step being performed in parallel with writes. This innovation can also be viewed as a form of prefetch that is internal to a memory chip. The proposed technique works best when there is bank imbalance in the write stream. We also introduce a write scheduling algorithm that artificially creates bank imbalance and allows useful read operations to be performed during the write drain. Across a suite of memory-intensive workloads, we show that Staged Reads can boost throughput by up to 33% (average 7%) with an average DRAM access latency improvement of 17%, while incurring a very small cost (0.25%) in terms of memory chip area. The throughput improvements are even greater when considering write-intensive workloads (average 11%) or future systems (average 12%).
Year
DOI
Venue
2012
10.1109/HPCA.2012.6168943
HPCA
Keywords
Field
DocType
average dram access latency,memory system,staged reads,memory chip area,main memory latency,main memory system,busy servicing,single off-chip memory bus,bank imbalance,memory chip,memory channel,optimization,registers,system performance,memory latency,scheduling algorithm,critical path,chip
Dram,Interleaved memory,Computer science,Scheduling (computing),Parallel computing,Real-time computing,Memory bus,Instruction prefetch,Critical path method,Throughput,Multi-channel memory architecture,Operating system
Conference
ISSN
Citations 
PageRank 
1530-0897
26
0.96
References 
Authors
24
5
Name
Order
Citations
PageRank
Niladrish Chatterjee126711.53
Naveen Muralimanohar2129557.58
Rajeev Balasubramonian32302116.79
Al Davis498654.47
Norman P. Jouppi56042791.53