Title
Performance modelling and optimization of memory access on cellular computer architecture cyclops64
Abstract
This paper focuses on the Cyclops64 computer architecture and presents an analytical model and performance simulation results for the preloading and loop unrolling approaches to optimize the performance of SVD (Singular Value Decomposition) benchmark. A performance model for dissecting the total execution cycles is presented. The data preloading using “memcpy” or hand optimized “inline” assembly code, and the loop unrolling approach are implemented and compared with each other in terms of the total number of memory access cycles. The key idea is to preload data from offchip to onchip memory and store the data back after the computation. These approaches can reduce the total memory access cycles and can thus improve the benchmark performance significantly.
Year
DOI
Venue
2005
10.1007/11577188_18
NPC
Keywords
Field
DocType
singular value decomposition,analytical model,total number,total execution cycle,cellular computer architecture cyclops64,performance modelling,cyclops64 computer architecture,memory access cycle,benchmark performance,total memory access cycle,performance model,performance simulation result,computer architecture
Singular value decomposition,Computer architecture,Computer science,Parallel algorithm,Parallel computing,Assembly language,Performance model,Loop unrolling,Computation
Conference
Volume
ISSN
ISBN
3779
0302-9743
3-540-29810-X
Citations 
PageRank 
References 
2
0.42
4
Authors
4
Name
Order
Citations
PageRank
Yanwei Niu1182.33
Ziang Hu221714.98
Kenneth Barner3223.50
Guang R. Gao42661265.87