Title | ||
---|---|---|
16-Bit FP Sub-Word Parallelism to Facilitate Compiler Vectorization and Improve Performance of Image and Media Processing |
Abstract | ||
---|---|---|
We consider the implementation of 16-bit floating point instructions on a Pentium 4 and a PowerPC G5 for image and media processing. By measuring the execution time of benchmarks with these new simulated instructions, we show that significant speed-up is obtained compared to 32-bit FP versions. For image processing, the speed-up both comes from doubling the number of operations per SIMD instruction and the better cache behavior with byte storage. For data stream processing with arrays of structures, the speed-up mainly comes from the wider SIMD instructions. |
Year | DOI | Venue |
---|---|---|
2004 | 10.1109/ICPP.2004.1 | ICPP |
Keywords | Field | DocType |
improve performance,facilitate compiler vectorization,16-bit floating point instruction,wider simd instruction,media processing,powerpc g5,significant speed-up,image processing,simd instruction,16-bit fp sub-word parallelism,data stream processing,byte storage,32-bit fp version,instruction sets,parallel processing,floating point arithmetic,floating point | Central processing unit,Cache,Instruction set,Computer science,Parallel computing,Image processing,SIMD,Pentium,Digital image processing,Computer hardware,PowerPC | Conference |
ISSN | ISBN | Citations |
0190-3918 | 0-7695-2197-5 | 1 |
PageRank | References | Authors |
0.44 | 6 | 2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Daniel Etiemble | 1 | 300 | 42.43 |
Lionel Lacassagne | 2 | 127 | 23.17 |