Name
Affiliation
Papers
FENGGUANG SONG
University of Tennessee
27
Collaborators
Citations 
PageRank 
39
232
19.88
Referers 
Referees 
References 
616
771
303
Search Limit
100771
Title
Citations
PageRank
Year
Designing a 3D Parallel Memory-Aware Lattice Boltzmann Algorithm on Manycore Systems00.342021
An Extended Roofline Model with Communication-Awareness for Distributed-Memory HPC Systems.00.342019
Building a scientific workflow framework to enable real-time machine learning and visualization.00.342019
Interactive 3D simulation for fluid–structure interactions using dual coupled GPUs00.342018
Designing a Parallel Memory-Aware Lattice Boltzmann Algorithm on Manycore Systems10.352018
Designing a Synchronization-reducing Clustering Method on Manycores: Some Issues and Improvements00.342017
A Simpler and More Direct Derivation of System Reliability Using Markov Chain Usage Models.00.342017
Correcting soft errors online in fast fourier transform30.372017
Modeling and Implementation of an Asynchronous Approach to Integrating HPC and Big Data Analysis.00.342016
Sucaqr: A Simplified Communication-Avoiding Qr Factorization Solver Using The Tblas Framework00.342016
A scalable approach to solving dense linear algebra problems on hybrid CPU‐GPU systems50.582015
Quality Assurance through Rigorous Software Specification and Testing: A Case Study00.342015
LBM-IB: A Parallel Library to Solve 3D Fluid-Structure Interaction Problems on Manycore Systems.20.392015
Scaling up matrix computations on shared-memory manycore systems with 1000 CPU cores30.392014
Implementing a high-performance recommendation system using Phoenix++20.392013
KV-Cache: A Scalable High-Performance Web-Object Cache for Manycore60.472013
A scalable framework for heterogeneous GPU-based clusters221.332012
Enabling and scaling matrix computations on heterogeneous multi-core and multi-GPU systems481.842012
Scalable Tile Communication-Avoiding QR Factorization on Multicore Cluster Systems170.902010
A Scalable Non-blocking Multicast Scheme for Distributed DAG Scheduling00.342009
Analytical modeling and optimization for affinity based thread scheduling on multicore systems100.662009
Dynamic task scheduling for linear algebra algorithms on distributed-memory multicore systems523.002009
L2 Cache Modeling for Scientific Applications on Chip Multi-Processors140.632007
Feedback-directed thread scheduling with memory considerations100.772007
Performance instrumentation and compiler optimizations for MPI/OpenMP applications50.492006
Automatic Experimental Analysis of Communication Patterns in Virtual Topologies10.352005
An Algebra for Cross-Experiment Performance Analysis313.592004