Staged Reads: Mitigating the impact of DRAM writes on DRAM reads - Citegraph

Paper Info

Title
Staged Reads: Mitigating the impact of DRAM writes on DRAM reads

Abstract
Main memory latencies have always been a concern for system performance. Given that reads are on the critical path for CPU progress, reads must be prioritized over writes. However, writes must be eventually processed and they often delay pending reads. In fact, a single channel in the main memory system offers almost no parallelism between reads and writes. This is because a single off-chip memory bus is shared by reads and writes and the direction of the bus has to be explicitly turned around when switching from writes to reads. This is an expensive operation and its cost is amortized by carrying out a burst of writes or reads every time the bus direction is switched. As a result, no reads can be processed while a memory channel is busy servicing writes. This paper proposes a novel mechanism to boost read-write parallelism and perform useful components of read operations even when the memory system is busy performing writes. If some of the banks are busy servicing writes, we start issuing reads to the other idle banks. The results of these reads are stored in a few registers near the memory chip's I/O pads. These results are quickly returned immediately following the bus turnaround. The process is referred to as a Staged Read because it decouples a single read operation into two stages, with the first step being performed in parallel with writes. This innovation can also be viewed as a form of prefetch that is internal to a memory chip. The proposed technique works best when there is bank imbalance in the write stream. We also introduce a write scheduling algorithm that artificially creates bank imbalance and allows useful read operations to be performed during the write drain. Across a suite of memory-intensive workloads, we show that Staged Reads can boost throughput by up to 33% (average 7%) with an average DRAM access latency improvement of 17%, while incurring a very small cost (0.25%) in terms of memory chip area. The throughput improvements are even greater when considering write-intensive workloads (average 11%) or future systems (average 12%).

Year	DOI	Venue
2012	10.1109/HPCA.2012.6168943	HPCA
Keywords	Field	DocType
average dram access latency,memory system,staged reads,memory chip area,main memory latency,main memory system,busy servicing,single off-chip memory bus,bank imbalance,memory chip,memory channel,optimization,registers,system performance,memory latency,scheduling algorithm,critical path,chip	Dram,Interleaved memory,Computer science,Scheduling (computing),Parallel computing,Real-time computing,Memory bus,Instruction prefetch,Critical path method,Throughput,Multi-channel memory architecture,Operating system	Conference
ISSN	Citations	PageRank
1530-0897	26	0.96
References	Authors
24	5

Authors (5 rows)

Cited by (26 rows)

References (24 rows)

Name	Order	Citations	PageRank
Niladrish Chatterjee	1	267	11.53
Naveen Muralimanohar	2	1295	57.58
Rajeev Balasubramonian	3	2302	116.79
Al Davis	4	986	54.47
Norman P. Jouppi	5	6042	791.53

1