Designing a 3D Parallel Memory-Aware Lattice Boltzmann Algorithm on Manycore Systems | 0 | 0.34 | 2021 |
An Extended Roofline Model with Communication-Awareness for Distributed-Memory HPC Systems. | 0 | 0.34 | 2019 |
Building a scientific workflow framework to enable real-time machine learning and visualization. | 0 | 0.34 | 2019 |
Interactive 3D simulation for fluid–structure interactions using dual coupled GPUs | 0 | 0.34 | 2018 |
Designing a Parallel Memory-Aware Lattice Boltzmann Algorithm on Manycore Systems | 1 | 0.35 | 2018 |
Designing a Synchronization-reducing Clustering Method on Manycores: Some Issues and Improvements | 0 | 0.34 | 2017 |
A Simpler and More Direct Derivation of System Reliability Using Markov Chain Usage Models. | 0 | 0.34 | 2017 |
Correcting soft errors online in fast fourier transform | 3 | 0.37 | 2017 |
Modeling and Implementation of an Asynchronous Approach to Integrating HPC and Big Data Analysis. | 0 | 0.34 | 2016 |
Sucaqr: A Simplified Communication-Avoiding Qr Factorization Solver Using The Tblas Framework | 0 | 0.34 | 2016 |
A scalable approach to solving dense linear algebra problems on hybrid CPU‐GPU systems | 5 | 0.58 | 2015 |
Quality Assurance through Rigorous Software Specification and Testing: A Case Study | 0 | 0.34 | 2015 |
LBM-IB: A Parallel Library to Solve 3D Fluid-Structure Interaction Problems on Manycore Systems. | 2 | 0.39 | 2015 |
Scaling up matrix computations on shared-memory manycore systems with 1000 CPU cores | 3 | 0.39 | 2014 |
Implementing a high-performance recommendation system using Phoenix++ | 2 | 0.39 | 2013 |
KV-Cache: A Scalable High-Performance Web-Object Cache for Manycore | 6 | 0.47 | 2013 |
A scalable framework for heterogeneous GPU-based clusters | 22 | 1.33 | 2012 |
Enabling and scaling matrix computations on heterogeneous multi-core and multi-GPU systems | 48 | 1.84 | 2012 |
Scalable Tile Communication-Avoiding QR Factorization on Multicore Cluster Systems | 17 | 0.90 | 2010 |
A Scalable Non-blocking Multicast Scheme for Distributed DAG Scheduling | 0 | 0.34 | 2009 |
Analytical modeling and optimization for affinity based thread scheduling on multicore systems | 10 | 0.66 | 2009 |
Dynamic task scheduling for linear algebra algorithms on distributed-memory multicore systems | 52 | 3.00 | 2009 |
L2 Cache Modeling for Scientific Applications on Chip Multi-Processors | 14 | 0.63 | 2007 |
Feedback-directed thread scheduling with memory considerations | 10 | 0.77 | 2007 |
Performance instrumentation and compiler optimizations for MPI/OpenMP applications | 5 | 0.49 | 2006 |
Automatic Experimental Analysis of Communication Patterns in Virtual Topologies | 1 | 0.35 | 2005 |
An Algebra for Cross-Experiment Performance Analysis | 31 | 3.59 | 2004 |