Title
A comparison of architectural support for messaging in the TMC CM-5 and the Cray T3D
Abstract
Programming models based on messaging continue to be an important programming model for parallel machines. Messaging costs are strongly influenced by a machine's network interface architecture. We examine the impact of architectural support for messaging in two machines --- the TMC CM-5 and the Cray T3D --- by exploring the design and performance of several messaging implementations. The additional features in the T3D support remote operations: memory access, fetch-and-increment, atomic swaps, and prefetch.Experiments on the CM-5 show that requiring processor involvement for message reception can increase the communication overheads from 60% to 300% for moderate variations in computation grain size at the destination. In contrast, the T3D hardware for remote operations decouples message reception from processor activity, producing high-performance messaging independent of computation grain size or variability.In addition, hardware support for a shared address space in the T3D can be used to solve the output contention problem (output hot spots), producing messaging implementations that are robust over a wide variety of traffic patterns. Atomic swap hardware can be used to build a distributed message queue, enabling a "pull" messaging scheme where the destination requests data transfer upon receive. This scheme uses prefetches to mask receive latency. While this yields performance robust over output contention, its base cost is competitive only for small messages (up to 64 bytes) because of the high cost of issuing and resolving prefetches in the T3D. Emulation shows that if the interaction costs can be reduced by a factor of eight (250ns to 31ns), perhaps by moving the prefetch queue on chip, and there is a corresponding increase in the prefetch queue size, the pull scheme can give superior performance in all eases.
Year
DOI
Venue
1995
10.1145/225830.224440
international symposium on computer architecture
Keywords
Field
DocType
computer architecture,parallel machines,performance evaluation,programming,Cray T3D,TMC CM-5,architectural support,atomic swap hardware,atomic swaps,distributed message queue,fetch-and-increment,hardware support,memory access,messaging,parallel machines,prefetch,programming model,receive latency,remote operations,shared address space,traffic patterns
Address space,Byte,Programming paradigm,Computer science,Parallel computing,Queue,Messaging pattern,Real-time computing,Message queue,Instruction prefetch,Operating system,Network interface
Conference
Volume
Issue
ISSN
23
2
0163-5964
ISBN
Citations 
PageRank 
0-89791-698-0
35
6.50
References 
Authors
11
2
Name
Order
Citations
PageRank
Vijay Karamcheti164667.03
Andrew A. Chien23696405.97