Abstract | ||
---|---|---|
We propose a static loop vectorization optimization on top of high level dataflow IR used by frameworks like TensorFlow. A new statically vectorized parallel-for abstraction is provided on top of TensorFlow, and used for applications ranging from auto-batching and per-example gradients, to jacobian computation, optimized map functions and input pipeline optimization. We report huge speedups compared to both loop based implementations, as well as run-time batching adopted by the DyNet framework. |
Year | Venue | DocType |
---|---|---|
2019 | arXiv: Distributed, Parallel, and Cluster Computing | Journal |
Volume | Citations | PageRank |
abs/1903.04243 | 0 | 0.34 |
References | Authors | |
5 | 2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Ashish Agarwal | 1 | 1110 | 67.41 |
Igor Ganichev | 2 | 1 | 1.03 |