Name
Papers
Collaborators
GUANG R. GAO
269
389
Citations 
PageRank 
Referers 
2661
265.87
4746
Referees 
References 
3233
2674
Search Limit
1001000
Title
Citations
PageRank
Year
Parallel Turing Machine, a Proposal.20.422017
Generating Fine-Grain Multithreaded Applications Using a Multigrain Approach.20.432017
Hamr: A Dataflow-Based Real-Time In-Memory Cluster Computing Engine20.412017
The Importance of Efficient Fine-Grain Synchronization for Many-Core Systems.00.342016
The Design and Implementation of TIDeFlow: A Dataflow-Inspired Execution Model for Parallel Loops and Task Pipelining.30.372016
Toward a Parallel Turing Machine Model.10.362016
Gregarious Data Re-structuring in a Many Core Architecture00.342015
Energy efficient multi-level tiling for dense matrix multiplication on many-core architecture.00.342015
Locality aware concurrent start for stencil applications50.432015
Position Paper: Locality-Driven Scheduling of Tasks for Data-Dependent Multithreading00.342014
The TERAFLUX Project: Exploiting the DataFlow Paradigm in Next Generation Teradevices170.762013
Automatic Locality Exploitation in the Codelet Model00.342013
Towards Memory-Load Balanced Fast Fourier Transformations in Fine-Grain Execution Models00.342013
An implementation of the codelet model90.622013
StreamTMC: Stream compilation for tiled multi-core architectures20.412013
Optimizing the LU Factorization for Energy Efficiency on a Many-Core Architecture.60.452013
Dynamic percolation: a case of study on the shortcomings of traditional optimization in many-core architectures60.482012
A Discussion in Favor of Dynamic Scheduling for Regular Applications in Many-core Architectures80.702012
Software Pipelining for Stream Programs on Resource Constrained Multicore Architectures90.662012
Toward high-throughput algorithms on many-core architectures140.782012
Demystifying Performance Predictions of Distributed FFT3D Implementations.00.342012
Analysis and performance results of computing betweenness centrality on IBM Cyclops64110.712011
DEEP: an iterative fpga-based many-core emulation system for chip verification and architecture research30.582011
Experiments with the Fresh Breeze tree-based memory model60.482011
Performance analysis of Cooley-Tukey FFT algorithms for a many-core architecture70.612010
Locality optimization of stencil applications using data dependency graphs180.802010
Optimized dense matrix multiplication on a many-core architecture140.802010
A study of a software cache implementation of the OpenMP memory model for multicore and manycore architectures70.522010
Tile Reduction: The First Step towards Tile Aware Parallelization in OpenMP30.442009
Open64 compiler infrastructure for emerging multicore/manycore architecture All Symposium Tutorial10.382008
Concurrency Analysis for Shared Memory Programs with Textually Unaligned Barriers80.542007
Exploring a Multithreaded Methodology to Implement a Network Communication Protocol on the Cyclops-64 Multithreaded Architecture00.342007
Performance characteristics of OpenMP language constructs on a many-core-on-a-chip architecture80.672006
Hierarchical multithreading: programming model and system software20.522006
Sustained Petaflop and Beyond: Can Parallel Computing Systems Meet The Challenges?00.342005
Performance portability on EARTH: a case study across several parallel architectures40.532005
Madd Operation Aware Redundancy Elimination00.342005
Identifying Multiply-Add Operations in Kylin Compiler00.342005
Improving power efficiency with compiler-assisted cache replacement50.462005
Performance modelling and optimization of memory access on cellular computer architecture cyclops6420.422005
TiNy Threads: A Thread Virtual Machine for the Cyclops64 Cellular Architecture311.962005
Embedded and Ubiquitous Computing, International Conference EUC 2004, Aizu-Wakamatsu City, Japan, August 25-27, 2004, Proceedings10710.032004
An Improved Hidden Markov Model for Transmembrane Topology Prediction10.472004
Network and Parallel Computing, IFIP International Conference, NPC 2004, Wuhan, China, October 18-20, 2004, Proceedings9910.952004
A cluster-based solution for high performance hmmpfam using EARTH execution model50.482003
Compiler-Assisted Cache Replacement: Problem Formulation and Performance Evaluation90.592003
Implementation Of The Earth Programming Model On Smp Clusters: A Multi-Threaded Language And Runtime System30.462003
Evaluation and choice of various branch predictors for low-power embedded processor10.432003
Special issue on compilers, architecture, and synthesis for embedded systems00.342003
An Executable Analytical Performance Evaluation Approach for Early Performance Prediction50.632003
  • 1
  • 2