Parallel Turing Machine, a Proposal. | 2 | 0.42 | 2017 |
Generating Fine-Grain Multithreaded Applications Using a Multigrain Approach. | 2 | 0.43 | 2017 |
Hamr: A Dataflow-Based Real-Time In-Memory Cluster Computing Engine | 2 | 0.41 | 2017 |
The Importance of Efficient Fine-Grain Synchronization for Many-Core Systems. | 0 | 0.34 | 2016 |
The Design and Implementation of TIDeFlow: A Dataflow-Inspired Execution Model for Parallel Loops and Task Pipelining. | 3 | 0.37 | 2016 |
Toward a Parallel Turing Machine Model. | 1 | 0.36 | 2016 |
Gregarious Data Re-structuring in a Many Core Architecture | 0 | 0.34 | 2015 |
Energy efficient multi-level tiling for dense matrix multiplication on many-core architecture. | 0 | 0.34 | 2015 |
Locality aware concurrent start for stencil applications | 5 | 0.43 | 2015 |
Position Paper: Locality-Driven Scheduling of Tasks for Data-Dependent Multithreading | 0 | 0.34 | 2014 |
The TERAFLUX Project: Exploiting the DataFlow Paradigm in Next Generation Teradevices | 17 | 0.76 | 2013 |
Automatic Locality Exploitation in the Codelet Model | 0 | 0.34 | 2013 |
Towards Memory-Load Balanced Fast Fourier Transformations in Fine-Grain Execution Models | 0 | 0.34 | 2013 |
An implementation of the codelet model | 9 | 0.62 | 2013 |
StreamTMC: Stream compilation for tiled multi-core architectures | 2 | 0.41 | 2013 |
Optimizing the LU Factorization for Energy Efficiency on a Many-Core Architecture. | 6 | 0.45 | 2013 |
Dynamic percolation: a case of study on the shortcomings of traditional optimization in many-core architectures | 6 | 0.48 | 2012 |
A Discussion in Favor of Dynamic Scheduling for Regular Applications in Many-core Architectures | 8 | 0.70 | 2012 |
Software Pipelining for Stream Programs on Resource Constrained Multicore Architectures | 9 | 0.66 | 2012 |
Toward high-throughput algorithms on many-core architectures | 14 | 0.78 | 2012 |
Demystifying Performance Predictions of Distributed FFT3D Implementations. | 0 | 0.34 | 2012 |
Analysis and performance results of computing betweenness centrality on IBM Cyclops64 | 11 | 0.71 | 2011 |
DEEP: an iterative fpga-based many-core emulation system for chip verification and architecture research | 3 | 0.58 | 2011 |
Experiments with the Fresh Breeze tree-based memory model | 6 | 0.48 | 2011 |
Performance analysis of Cooley-Tukey FFT algorithms for a many-core architecture | 7 | 0.61 | 2010 |
Locality optimization of stencil applications using data dependency graphs | 18 | 0.80 | 2010 |
Optimized dense matrix multiplication on a many-core architecture | 14 | 0.80 | 2010 |
A study of a software cache implementation of the OpenMP memory model for multicore and manycore architectures | 7 | 0.52 | 2010 |
Tile Reduction: The First Step towards Tile Aware Parallelization in OpenMP | 3 | 0.44 | 2009 |
Open64 compiler infrastructure for emerging multicore/manycore architecture All Symposium Tutorial | 1 | 0.38 | 2008 |
Concurrency Analysis for Shared Memory Programs with Textually Unaligned Barriers | 8 | 0.54 | 2007 |
Exploring a Multithreaded Methodology to Implement a Network Communication Protocol on the Cyclops-64 Multithreaded Architecture | 0 | 0.34 | 2007 |
Performance characteristics of OpenMP language constructs on a many-core-on-a-chip architecture | 8 | 0.67 | 2006 |
Hierarchical multithreading: programming model and system software | 2 | 0.52 | 2006 |
Sustained Petaflop and Beyond: Can Parallel Computing Systems Meet The Challenges? | 0 | 0.34 | 2005 |
Performance portability on EARTH: a case study across several parallel architectures | 4 | 0.53 | 2005 |
Madd Operation Aware Redundancy Elimination | 0 | 0.34 | 2005 |
Identifying Multiply-Add Operations in Kylin Compiler | 0 | 0.34 | 2005 |
Improving power efficiency with compiler-assisted cache replacement | 5 | 0.46 | 2005 |
Performance modelling and optimization of memory access on cellular computer architecture cyclops64 | 2 | 0.42 | 2005 |
TiNy Threads: A Thread Virtual Machine for the Cyclops64 Cellular Architecture | 31 | 1.96 | 2005 |
Embedded and Ubiquitous Computing, International Conference EUC 2004, Aizu-Wakamatsu City, Japan, August 25-27, 2004, Proceedings | 107 | 10.03 | 2004 |
An Improved Hidden Markov Model for Transmembrane Topology Prediction | 1 | 0.47 | 2004 |
Network and Parallel Computing, IFIP International Conference, NPC 2004, Wuhan, China, October 18-20, 2004, Proceedings | 99 | 10.95 | 2004 |
A cluster-based solution for high performance hmmpfam using EARTH execution model | 5 | 0.48 | 2003 |
Compiler-Assisted Cache Replacement: Problem Formulation and Performance Evaluation | 9 | 0.59 | 2003 |
Implementation Of The Earth Programming Model On Smp Clusters: A Multi-Threaded Language And Runtime System | 3 | 0.46 | 2003 |
Evaluation and choice of various branch predictors for low-power embedded processor | 1 | 0.43 | 2003 |
Special issue on compilers, architecture, and synthesis for embedded systems | 0 | 0.34 | 2003 |
An Executable Analytical Performance Evaluation Approach for Early Performance Prediction | 5 | 0.63 | 2003 |