Name
Affiliation
Papers
TAL BEN-NUN
The Hebrew University, Jerusalem, Israel
34
Collaborators
Citations 
PageRank 
74
116
14.21
Referers 
Referees 
References 
458
1457
486
Search Limit
1001000
Title
Citations
PageRank
Year
A data-centric optimization framework for machine learning00.342022
Lifting C semantics for dataflow optimization00.342022
Clairvoyant prefetching for distributed machine learning I/O20.372021
On the parallel I/O optimality of linear algebra kernels: near-optimal matrix factorizations00.342021
Pebbles, Graphs, and a Pinch of Combinatorics: Towards Tight I/O Lower Bounds for Statically Analyzable Programs00.342021
Breaking (Global) Barriers in Parallel Stochastic Optimization With Wait-Avoiding Group Averaging10.362021
StencilFlow: Mapping Large Stencil Programs to Distributed Spatial Computing Systems00.342021
On the parallel I/O optimality of linear algebra kernels: near-optimal LU factorization00.342021
Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks00.342021
Programl: A Graph-Based Program Representation For Data Flow Analysis And Compiler Optimizations00.342021
Workflows are the New Applications: Challenges in Performance, Portability, and Productivity10.352020
Augment Your Batch: Improving Generalization Through Instance Repetition00.342020
Substream-Centric Maximum Matchings on FPGA00.342020
Groute: Asynchronous Multi-GPU Programming Model with Applications to Large-scale Graph Processing00.342020
Taming unbalanced training workloads in deep learning with partial collective operations.40.462020
Stateful dataflow multigraphs: a data-centric model for performance portability on heterogeneous architectures10.372019
A Modular Benchmarking Infrastructure for High-Performance and Reproducible Deep Learning30.462019
Stateful Dataflow Multigraphs: A Data-Centric Model for High-Performance Parallel Programs.00.342019
Substream-Centric Maximum Matchings on FPGA.10.402019
Augment your batch: better training with larger batches.10.352019
Graph Processing on FPGAs: Taxonomy, Survey, Challenges.00.342019
Optimizing the data movement in quantum transport simulations via data-centric parallel programming00.342019
Accelerating Deep Learning Frameworks with Micro-Batches30.402018
Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis.461.442018
Neural Code Comprehension: A Learnable Representation of Code Semantics.10.352018
Optimizing Parallel Graph Connectivity Computation via Subgraph Sampling40.402018
Big data causing big (TLB) problems: taming random memory accesses on the GPU.20.372017
Groute: An Asynchronous Multi-GPU Programming Model for Irregular Computations.200.612017
Adaptive Work-Efficient Connected Components on the GPU.00.342016
Spline-based parallel nonlinear optimization of function sequences.00.342016
Reciprocal Grids: A Hierarchical Algorithm for Computing Solution X-ray Scattering Curves from Supramolecular Complexes at High Resolution.00.342016
Memory access patterns: the missing piece of the multi-GPU puzzle140.612015
Design and implementation of a generic resource sharing virtual time dispatcher30.512010
A global scheduling framework for virtualization environments90.662009