Name
Affiliation
Papers
WU-CHUN FENG
Virginia Tech, Dept Comp Sci, Blacksburg, VA 24061 USA
236
Collaborators
Citations 
PageRank 
320
2812
232.50
Referers 
Referees 
References 
5487
4302
2566
Search Limit
1001000
Title
Citations
PageRank
Year
Optimizing Performance and Storage of Memory-Mapped Persistent Data Structures00.342022
On the Characterization of the Performance-Productivity Gap for FPGA00.342022
Scaling Out a Combinatorial Algorithm for Discovering Carcinogenic Gene Combinations to Thousands of GPUs00.342021
Mitigating Catastrophic Forgetting in Deep Learning in a Streaming Setting Using Historical Summary00.342021
A Feasibility Study for MPI over HDFS00.342020
SparkLeBLAST: Scalable Parallelization of BLAST Sequence Alignment Using Spark10.362020
GPU-Based Iterative Medical CT Image Reconstructions20.382019
Adaptive Task Aggregation for High-Performance Sparse Solvers on GPUs00.342019
C to D-Wave: A High-level C Compilation Framework for Quantum Annealers00.342019
A Composable Workflow for Productive Heterogeneous Computing on FPGAs via Whole-Program Analysis and Transformation00.342018
Making A Case For Green High-Performance Visualization Via Embedded Graphics Processors00.342018
A language and hardware independent approach to quantum–classical computing20.412018
Exploring FPGA-specific Optimizations for Irregular OpenCL Applications10.482018
Highly Efficient Compensation-Based Parallelism for Wavefront Loops on GPUs10.352018
Taming irregular applications via advanced dynamic parallelism on GPUs20.382018
Fast segmented sort on GPUs.130.602017
Towards Scalable Deep Learning via I/O Analysis and Optimization50.442017
Center for High-Performance Reconfigurable Computing (CHREC): A Ten-Year Odyssey.00.342017
A framework for fast and fair evaluation of automata processing hardware10.352017
PaPar: A Parallel Data Partitioning Framework for Big Data Applications20.362017
Eliminating Irregularities of Protein Sequence Search on Multicore Architectures20.362017
Congestion Control Scheme Performance Analysis Based on Nonlinear RED00.342017
cuBLASTP: Fine-Grained Parallelization of Protein Sequence Search on CPU+GPU.130.592017
OpenDwarfs: Characterization of Dwarf-Based Benchmarks on Fixed and Reconfigurable Architectures70.582016
Characterizing Performance and Power towards Efficient Synchronization of GPU Kernels00.342016
Fast Detection of Transformed Data Leaks110.652016
Bridging the FPGA programmability-portability Gap via automatic OpenCL code generation and tuning00.342016
MPI-ACC: Accelerator-Aware MPI for Scientific Applications70.502016
O3FA: A Scalable Finite Automata-based Pattern-Matching Engine for Out-of-Order Deep Packet Inspection.40.412016
Design and Evaluation of Scalable Concurrent Queues for Many-Core Architectures80.562015
On The Greenness Of In-Situ And Post-Processing Visualization Pipelines10.352015
On the Energy Proportionality of Scale-Out Workloads.10.362015
Automatic Command Queue Scheduling for Task-Parallel Workloads in OpenCL60.462015
On the Performance, Energy, and Power of Data-Access Methods in Heterogeneous Computing Systems00.342015
CoreTSAR: Core Task-Size Adapting Runtime30.422015
pDindel: Accelerating indel detection on a multicore CPU architecture with SIMD10.352015
Runtime Adaptation for Autonomic Heterogeneous Computing10.412014
Locality-Aware Memory Association for Multi-Target Worksharing in OpenMP00.342014
SDAFT: A novel scalable data access framework for parallel BLAST.40.412014
A power-measurement methodology for large-scale, high-performance computing150.732014
Aeromancer: A Workflow Manager for Large-Scale MapReduce-Based Scientific Workflows00.342014
cuBLASTP: Fine-Grained Parallelization of Protein Sequence Search on a GPU50.462014
Delivering Parallel Programmability to the Masses via the Intel MIC Ecosystem: A Case Study50.542014
On the performance and energy efficiency of FPGAs and GPUs for polyphase channelization00.342014
Petascale Application of a Coupled CPU-GPU Algorithm for Simulation and Analysis of Multiphase Flow Solutions in Porous Medium Systems10.352014
On the efficacy of GPU-integrated MPI for scientific applications80.562013
On the Portability of the OpenCL Dwarfs on Fixed and Reconfigurable Parallel Platforms10.372013
Online Performance Projection for Clusters with Heterogeneous GPUs20.372013
Cascaded TCP: Applying pipelining to TCP for efficient communication over wide-area networks30.422013
Synchronization and Ordering Semantics in Hybrid MPI+GPU Programming20.402013
  • 1
  • 2