The Next 700 Accelerated Layers: From Mathematical Expressions of Network Computation Graphs to Accelerated GPU Kernels, Automatically | 2 | 0.36 | 2020 |
Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions. | 17 | 0.68 | 2018 |
A Tale of Three Runtimes. | 3 | 0.41 | 2014 |
Tiling and optimizing time-iterated computations on periodic domains | 14 | 0.57 | 2014 |
Runnemede: An architecture for Ubiquitous High-Performance Computing | 38 | 1.17 | 2013 |
Memory reuse optimizations in the R-Stream compiler. | 3 | 0.40 | 2013 |
Automatic communication optimizations through memory reuse strategies | 0 | 0.34 | 2012 |
A mapping path for multi-GPGPU accelerated computers from a portable high level programming abstraction | 33 | 2.35 | 2010 |
Iterative Optimization in the Polyhedral Model: Part I, One-Dimensional Time | 52 | 3.82 | 2007 |
Automatic Correction of Loop Transformations | 7 | 0.58 | 2007 |
Violated dependence analysis | 17 | 1.28 | 2006 |
Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies | 101 | 3.63 | 2006 |
Polyhedral code generation in the real world | 30 | 1.93 | 2006 |
Facilitating the search for compositions of program transformations | 37 | 1.91 | 2005 |