Virginia Tech, Dept Comp Sci, Blacksburg, VA 24061 USA
Search Limit
Optimizing Performance and Storage of Memory-Mapped Persistent Data Structures00.342022
On the Characterization of the Performance-Productivity Gap for FPGA00.342022
Scaling Out a Combinatorial Algorithm for Discovering Carcinogenic Gene Combinations to Thousands of GPUs00.342021
Mitigating Catastrophic Forgetting in Deep Learning in a Streaming Setting Using Historical Summary00.342021
A Feasibility Study for MPI over HDFS00.342020
SparkLeBLAST: Scalable Parallelization of BLAST Sequence Alignment Using Spark10.362020
GPU-Based Iterative Medical CT Image Reconstructions20.382019
Adaptive Task Aggregation for High-Performance Sparse Solvers on GPUs00.342019
C to D-Wave: A High-level C Compilation Framework for Quantum Annealers00.342019
A Composable Workflow for Productive Heterogeneous Computing on FPGAs via Whole-Program Analysis and Transformation00.342018
Making A Case For Green High-Performance Visualization Via Embedded Graphics Processors00.342018
A language and hardware independent approach to quantum–classical computing20.412018
Exploring FPGA-specific Optimizations for Irregular OpenCL Applications10.482018
Highly Efficient Compensation-Based Parallelism for Wavefront Loops on GPUs10.352018
Taming irregular applications via advanced dynamic parallelism on GPUs20.382018
Fast segmented sort on GPUs.130.602017
Towards Scalable Deep Learning via I/O Analysis and Optimization50.442017
Center for High-Performance Reconfigurable Computing (CHREC): A Ten-Year Odyssey.00.342017
A framework for fast and fair evaluation of automata processing hardware10.352017
PaPar: A Parallel Data Partitioning Framework for Big Data Applications20.362017
Eliminating Irregularities of Protein Sequence Search on Multicore Architectures20.362017
Congestion Control Scheme Performance Analysis Based on Nonlinear RED00.342017
cuBLASTP: Fine-Grained Parallelization of Protein Sequence Search on CPU+GPU.130.592017
OpenDwarfs: Characterization of Dwarf-Based Benchmarks on Fixed and Reconfigurable Architectures70.582016
Characterizing Performance and Power towards Efficient Synchronization of GPU Kernels00.342016
Fast Detection of Transformed Data Leaks110.652016
Bridging the FPGA programmability-portability Gap via automatic OpenCL code generation and tuning00.342016
MPI-ACC: Accelerator-Aware MPI for Scientific Applications70.502016
O3FA: A Scalable Finite Automata-based Pattern-Matching Engine for Out-of-Order Deep Packet Inspection.40.412016
Design and Evaluation of Scalable Concurrent Queues for Many-Core Architectures80.562015
On The Greenness Of In-Situ And Post-Processing Visualization Pipelines10.352015
On the Energy Proportionality of Scale-Out Workloads.10.362015
Automatic Command Queue Scheduling for Task-Parallel Workloads in OpenCL60.462015
On the Performance, Energy, and Power of Data-Access Methods in Heterogeneous Computing Systems00.342015
CoreTSAR: Core Task-Size Adapting Runtime30.422015
pDindel: Accelerating indel detection on a multicore CPU architecture with SIMD10.352015
Runtime Adaptation for Autonomic Heterogeneous Computing10.412014
Locality-Aware Memory Association for Multi-Target Worksharing in OpenMP00.342014
SDAFT: A novel scalable data access framework for parallel BLAST.40.412014
A power-measurement methodology for large-scale, high-performance computing150.732014
Aeromancer: A Workflow Manager for Large-Scale MapReduce-Based Scientific Workflows00.342014
cuBLASTP: Fine-Grained Parallelization of Protein Sequence Search on a GPU50.462014
Delivering Parallel Programmability to the Masses via the Intel MIC Ecosystem: A Case Study50.542014
On the performance and energy efficiency of FPGAs and GPUs for polyphase channelization00.342014
Petascale Application of a Coupled CPU-GPU Algorithm for Simulation and Analysis of Multiphase Flow Solutions in Porous Medium Systems10.352014
On the efficacy of GPU-integrated MPI for scientific applications80.562013
On the Portability of the OpenCL Dwarfs on Fixed and Reconfigurable Parallel Platforms10.372013
Online Performance Projection for Clusters with Heterogeneous GPUs20.372013
Cascaded TCP: Applying pipelining to TCP for efficient communication over wide-area networks30.422013
Synchronization and Ordering Semantics in Hybrid MPI+GPU Programming20.402013
  • 1
  • 2