Optimizing Performance and Storage of Memory-Mapped Persistent Data Structures | 0 | 0.34 | 2022 |
On the Characterization of the Performance-Productivity Gap for FPGA | 0 | 0.34 | 2022 |
Scaling Out a Combinatorial Algorithm for Discovering Carcinogenic Gene Combinations to Thousands of GPUs | 0 | 0.34 | 2021 |
Mitigating Catastrophic Forgetting in Deep Learning in a Streaming Setting Using Historical Summary | 0 | 0.34 | 2021 |
A Feasibility Study for MPI over HDFS | 0 | 0.34 | 2020 |
SparkLeBLAST: Scalable Parallelization of BLAST Sequence Alignment Using Spark | 1 | 0.36 | 2020 |
GPU-Based Iterative Medical CT Image Reconstructions | 2 | 0.38 | 2019 |
Adaptive Task Aggregation for High-Performance Sparse Solvers on GPUs | 0 | 0.34 | 2019 |
C to D-Wave: A High-level C Compilation Framework for Quantum Annealers | 0 | 0.34 | 2019 |
A Composable Workflow for Productive Heterogeneous Computing on FPGAs via Whole-Program Analysis and Transformation | 0 | 0.34 | 2018 |
Making A Case For Green High-Performance Visualization Via Embedded Graphics Processors | 0 | 0.34 | 2018 |
A language and hardware independent approach to quantum–classical computing | 2 | 0.41 | 2018 |
Exploring FPGA-specific Optimizations for Irregular OpenCL Applications | 1 | 0.48 | 2018 |
Highly Efficient Compensation-Based Parallelism for Wavefront Loops on GPUs | 1 | 0.35 | 2018 |
Taming irregular applications via advanced dynamic parallelism on GPUs | 2 | 0.38 | 2018 |
Fast segmented sort on GPUs. | 13 | 0.60 | 2017 |
Towards Scalable Deep Learning via I/O Analysis and Optimization | 5 | 0.44 | 2017 |
Center for High-Performance Reconfigurable Computing (CHREC): A Ten-Year Odyssey. | 0 | 0.34 | 2017 |
A framework for fast and fair evaluation of automata processing hardware | 1 | 0.35 | 2017 |
PaPar: A Parallel Data Partitioning Framework for Big Data Applications | 2 | 0.36 | 2017 |
Eliminating Irregularities of Protein Sequence Search on Multicore Architectures | 2 | 0.36 | 2017 |
Congestion Control Scheme Performance Analysis Based on Nonlinear RED | 0 | 0.34 | 2017 |
cuBLASTP: Fine-Grained Parallelization of Protein Sequence Search on CPU+GPU. | 13 | 0.59 | 2017 |
OpenDwarfs: Characterization of Dwarf-Based Benchmarks on Fixed and Reconfigurable Architectures | 7 | 0.58 | 2016 |
Characterizing Performance and Power towards Efficient Synchronization of GPU Kernels | 0 | 0.34 | 2016 |
Fast Detection of Transformed Data Leaks | 11 | 0.65 | 2016 |
Bridging the FPGA programmability-portability Gap via automatic OpenCL code generation and tuning | 0 | 0.34 | 2016 |
MPI-ACC: Accelerator-Aware MPI for Scientific Applications | 7 | 0.50 | 2016 |
O3FA: A Scalable Finite Automata-based Pattern-Matching Engine for Out-of-Order Deep Packet Inspection. | 4 | 0.41 | 2016 |
Design and Evaluation of Scalable Concurrent Queues for Many-Core Architectures | 8 | 0.56 | 2015 |
On The Greenness Of In-Situ And Post-Processing Visualization Pipelines | 1 | 0.35 | 2015 |
On the Energy Proportionality of Scale-Out Workloads. | 1 | 0.36 | 2015 |
Automatic Command Queue Scheduling for Task-Parallel Workloads in OpenCL | 6 | 0.46 | 2015 |
On the Performance, Energy, and Power of Data-Access Methods in Heterogeneous Computing Systems | 0 | 0.34 | 2015 |
CoreTSAR: Core Task-Size Adapting Runtime | 3 | 0.42 | 2015 |
pDindel: Accelerating indel detection on a multicore CPU architecture with SIMD | 1 | 0.35 | 2015 |
Runtime Adaptation for Autonomic Heterogeneous Computing | 1 | 0.41 | 2014 |
Locality-Aware Memory Association for Multi-Target Worksharing in OpenMP | 0 | 0.34 | 2014 |
SDAFT: A novel scalable data access framework for parallel BLAST. | 4 | 0.41 | 2014 |
A power-measurement methodology for large-scale, high-performance computing | 15 | 0.73 | 2014 |
Aeromancer: A Workflow Manager for Large-Scale MapReduce-Based Scientific Workflows | 0 | 0.34 | 2014 |
cuBLASTP: Fine-Grained Parallelization of Protein Sequence Search on a GPU | 5 | 0.46 | 2014 |
Delivering Parallel Programmability to the Masses via the Intel MIC Ecosystem: A Case Study | 5 | 0.54 | 2014 |
On the performance and energy efficiency of FPGAs and GPUs for polyphase channelization | 0 | 0.34 | 2014 |
Petascale Application of a Coupled CPU-GPU Algorithm for Simulation and Analysis of Multiphase Flow Solutions in Porous Medium Systems | 1 | 0.35 | 2014 |
On the efficacy of GPU-integrated MPI for scientific applications | 8 | 0.56 | 2013 |
On the Portability of the OpenCL Dwarfs on Fixed and Reconfigurable Parallel Platforms | 1 | 0.37 | 2013 |
Online Performance Projection for Clusters with Heterogeneous GPUs | 2 | 0.37 | 2013 |
Cascaded TCP: Applying pipelining to TCP for efficient communication over wide-area networks | 3 | 0.42 | 2013 |
Synchronization and Ordering Semantics in Hybrid MPI+GPU Programming | 2 | 0.40 | 2013 |