Guang R. Gao - Citegraph

Author Info

Name	Papers	Collaborators
GUANG R. GAO	269	389
Citations	PageRank	Referers
2661	265.87	4746
Referees	References
3233	2674

Search Limit

1001000

Publications (100 rows)

Collaborators (100 rows)

Referers (100 rows)

Referees (100 rows)

Title	Citations	PageRank	Year
Parallel Turing Machine, a Proposal.	2	0.42	2017
Generating Fine-Grain Multithreaded Applications Using a Multigrain Approach.	2	0.43	2017
Hamr: A Dataflow-Based Real-Time In-Memory Cluster Computing Engine	2	0.41	2017
The Importance of Efficient Fine-Grain Synchronization for Many-Core Systems.	0	0.34	2016
The Design and Implementation of TIDeFlow: A Dataflow-Inspired Execution Model for Parallel Loops and Task Pipelining.	3	0.37	2016
Toward a Parallel Turing Machine Model.	1	0.36	2016
Gregarious Data Re-structuring in a Many Core Architecture	0	0.34	2015
Energy efficient multi-level tiling for dense matrix multiplication on many-core architecture.	0	0.34	2015
Locality aware concurrent start for stencil applications	5	0.43	2015
Position Paper: Locality-Driven Scheduling of Tasks for Data-Dependent Multithreading	0	0.34	2014
The TERAFLUX Project: Exploiting the DataFlow Paradigm in Next Generation Teradevices	17	0.76	2013
Automatic Locality Exploitation in the Codelet Model	0	0.34	2013
Towards Memory-Load Balanced Fast Fourier Transformations in Fine-Grain Execution Models	0	0.34	2013
An implementation of the codelet model	9	0.62	2013
StreamTMC: Stream compilation for tiled multi-core architectures	2	0.41	2013
Optimizing the LU Factorization for Energy Efficiency on a Many-Core Architecture.	6	0.45	2013
Dynamic percolation: a case of study on the shortcomings of traditional optimization in many-core architectures	6	0.48	2012
A Discussion in Favor of Dynamic Scheduling for Regular Applications in Many-core Architectures	8	0.70	2012
Software Pipelining for Stream Programs on Resource Constrained Multicore Architectures	9	0.66	2012
Toward high-throughput algorithms on many-core architectures	14	0.78	2012
Demystifying Performance Predictions of Distributed FFT3D Implementations.	0	0.34	2012
Analysis and performance results of computing betweenness centrality on IBM Cyclops64	11	0.71	2011
DEEP: an iterative fpga-based many-core emulation system for chip verification and architecture research	3	0.58	2011
Experiments with the Fresh Breeze tree-based memory model	6	0.48	2011
Performance analysis of Cooley-Tukey FFT algorithms for a many-core architecture	7	0.61	2010
Locality optimization of stencil applications using data dependency graphs	18	0.80	2010
Optimized dense matrix multiplication on a many-core architecture	14	0.80	2010
A study of a software cache implementation of the OpenMP memory model for multicore and manycore architectures	7	0.52	2010
Tile Reduction: The First Step towards Tile Aware Parallelization in OpenMP	3	0.44	2009
Open64 compiler infrastructure for emerging multicore/manycore architecture All Symposium Tutorial	1	0.38	2008
Concurrency Analysis for Shared Memory Programs with Textually Unaligned Barriers	8	0.54	2007
Exploring a Multithreaded Methodology to Implement a Network Communication Protocol on the Cyclops-64 Multithreaded Architecture	0	0.34	2007
Performance characteristics of OpenMP language constructs on a many-core-on-a-chip architecture	8	0.67	2006
Hierarchical multithreading: programming model and system software	2	0.52	2006
Sustained Petaflop and Beyond: Can Parallel Computing Systems Meet The Challenges?	0	0.34	2005
Performance portability on EARTH: a case study across several parallel architectures	4	0.53	2005
Madd Operation Aware Redundancy Elimination	0	0.34	2005
Identifying Multiply-Add Operations in Kylin Compiler	0	0.34	2005
Improving power efficiency with compiler-assisted cache replacement	5	0.46	2005
Performance modelling and optimization of memory access on cellular computer architecture cyclops64	2	0.42	2005
TiNy Threads: A Thread Virtual Machine for the Cyclops64 Cellular Architecture	31	1.96	2005
Embedded and Ubiquitous Computing, International Conference EUC 2004, Aizu-Wakamatsu City, Japan, August 25-27, 2004, Proceedings	107	10.03	2004
An Improved Hidden Markov Model for Transmembrane Topology Prediction	1	0.47	2004
Network and Parallel Computing, IFIP International Conference, NPC 2004, Wuhan, China, October 18-20, 2004, Proceedings	99	10.95	2004
A cluster-based solution for high performance hmmpfam using EARTH execution model	5	0.48	2003
Compiler-Assisted Cache Replacement: Problem Formulation and Performance Evaluation	9	0.59	2003
Implementation Of The Earth Programming Model On Smp Clusters: A Multi-Threaded Language And Runtime System	3	0.46	2003
Evaluation and choice of various branch predictors for low-power embedded processor	1	0.43	2003
Special issue on compilers, architecture, and synthesis for embedded systems	0	0.34	2003
An Executable Analytical Performance Evaluation Approach for Early Performance Prediction	5	0.63	2003

1
2
50 / page