Asynchronous Distributed-Memory Triangle Counting and LCC with RMA Caching | 1 | 0.37 | 2022 |
I/O-Optimal Cache-Oblivious Sparse Matrix-Sparse Matrix Multiplication | 0 | 0.34 | 2022 |
Motif Prediction with Graph Neural Networks | 0 | 0.34 | 2022 |
SeBS: a serverless benchmark suite for function-as-a-service computing | 2 | 0.39 | 2021 |
On the parallel I/O optimality of linear algebra kernels: near-optimal matrix factorizations | 0 | 0.34 | 2021 |
GraphMineSuite: enabling high-performance and programmable graph mining algorithms with set algebra | 1 | 0.35 | 2021 |
The future is big graphs: a community view on graph processing systems | 1 | 0.40 | 2021 |
SISA: Set-Centric Instruction Set Architecture for Graph Mining on Processing-in-Memory Systems | 4 | 0.38 | 2021 |
Pebbles, Graphs, and a Pinch of Combinatorics: Towards Tight I/O Lower Bounds for Statically Analyzable Programs | 0 | 0.34 | 2021 |
On the parallel I/O optimality of linear algebra kernels: near-optimal LU factorization | 0 | 0.34 | 2021 |
High-Performance Routing With Multipathing and Path Diversity in Ethernet and HPC Networks | 1 | 0.40 | 2021 |
GraphMineSuite: Enabling High-Performance and Programmable Graph Mining Algorithms with Set Algebra. | 0 | 0.34 | 2021 |
Parallel Algorithms for Finding Large Cliques in Sparse Graphs | 1 | 0.35 | 2021 |
Substream-Centric Maximum Matchings on FPGA | 0 | 0.34 | 2020 |
High-performance parallel graph coloring with strong guarantees on work, depth, and quality | 0 | 0.34 | 2020 |
FatPaths: routing in supercomputers and data centers when shortest paths fall short | 0 | 0.34 | 2020 |
Slim graph: practical lossy graph compression for approximate graph processing, storage, and analytics | 0 | 0.34 | 2019 |
A Modular Benchmarking Infrastructure for High-Performance and Reproducible Deep Learning | 3 | 0.46 | 2019 |
Substream-Centric Maximum Matchings on FPGA. | 1 | 0.40 | 2019 |
FatPaths: Routing in Supercomputers, Data Centers, and Clouds with Low-Diameter Networks when Shortest Paths Fall Short. | 0 | 0.34 | 2019 |
Graph Processing on FPGAs: Taxonomy, Survey, Challenges. | 0 | 0.34 | 2019 |
Red-blue pebbling revisited: near optimal parallel matrix-matrix multiplication | 6 | 0.48 | 2019 |
Network-accelerated non-contiguous memory transfers | 0 | 0.34 | 2019 |
Enabling highly scalable remote memory access programming with MPI-3 one sided. | 0 | 0.34 | 2018 |
Log(graph): a near-optimal high-performance graph representation | 0 | 0.34 | 2018 |
Survey and Taxonomy of Lossless Graph Compression and Space-Efficient Graph Representations. | 0 | 0.34 | 2018 |
Slim NoC: A Low-Diameter On-Chip Network Topology for High Energy Efficiency and Scalability. | 2 | 0.35 | 2018 |
Communication-avoiding parallel minimum cuts and connected components. | 2 | 0.36 | 2018 |
To Push or To Pull: On Reducing Communication and Synchronization in Graph Computations. | 26 | 0.73 | 2017 |
SlimSell: A Vectorizable Graph Representation for Breadth-First Search | 6 | 0.47 | 2017 |
Scaling betweenness centrality using communication-efficient sparse matrix multiplication | 6 | 0.41 | 2017 |
High-Performance Distributed RMA Locks. | 1 | 0.35 | 2016 |
Betweenness Centrality is more Parallelizable than Dense Matrix Multiplication. | 0 | 0.34 | 2016 |
Evaluating the Cost of Atomic Operations on Modern Architectures. | 16 | 0.84 | 2015 |
Active Access: A Mechanism for High-Performance Distributed Data-Centric Computations | 3 | 0.40 | 2015 |
Accelerating Irregular Computations with Hardware Transactional Memory and Active Messages | 3 | 0.37 | 2015 |
Slim Fly: A Cost Effective Low-Diameter Network Topology | 65 | 2.12 | 2014 |
Fault tolerance for remote memory access programming models | 8 | 0.51 | 2014 |
Enabling highly-scalable remote memory access programming with MPI-3 one sided | 34 | 1.41 | 2013 |