Fast Networks and Slow Memories: A Mechanism for Mitigating Bandwidth Mismatches | 0 | 0.34 | 2017 |
Enabling Scalable High-Performance Systems with the Intel Omni-Path Architecture. | 6 | 0.59 | 2016 |
Remote Memory Access Programming in MPI-3 | 15 | 0.72 | 2015 |
Exploiting Offload Enabled Network Interfaces | 3 | 0.37 | 2015 |
Intel® Omni-path Architecture: Enabling Scalable, High Performance Fabrics | 30 | 1.26 | 2015 |
Reducing Synchronization Overhead Through Bundled Communication | 6 | 0.68 | 2014 |
Evaluating on-die interconnects for a 4 TB/s router | 1 | 0.37 | 2013 |
Exploiting communication and packaging locality for cost-effective large scale networks | 4 | 0.42 | 2012 |
A low impact flow control implementation for offload communication interfaces | 2 | 0.42 | 2012 |
Enhanced Support for OpenSHMEM Communication in Portals | 8 | 0.71 | 2011 |
Using triggered operations to offload rendezvous messages | 6 | 0.47 | 2011 |
Scientific Application Demands on a Reconfigurable Functional Unit Interface | 3 | 0.39 | 2011 |
Enabling Flexible Collective Communication Offload with Triggered Operations | 10 | 0.66 | 2011 |
Challenges for High-Performance Networking for Exascale Computing | 3 | 0.47 | 2010 |
Fast, Efficient Floating-Point Adders and Multipliers for FPGAs | 11 | 0.81 | 2010 |
Using triggered operations to offload collective communication operations | 8 | 0.57 | 2010 |
Performance evaluation of the Red Storm dual-core upgrade | 0 | 0.34 | 2010 |
Architectural Modifications to Enhance the Floating-Point Performance of FPGAs | 20 | 1.06 | 2008 |
High Message Rate, Nic-Based Atomics: Design And Performance Considerations | 1 | 0.38 | 2008 |
Evaluating NIC hardware requirements to achieve high message rate PGAS support on multi-core processors | 8 | 0.60 | 2007 |
An architecture to perform NIC based MPI matching | 4 | 0.48 | 2007 |
Scientific Application Acceleration with Reconfigurable Functional Units | 11 | 0.97 | 2007 |
Floating-point divider design for FPGAs | 6 | 0.50 | 2007 |
SeaStar Interconnect: Balanced Bandwidth for Scalable Performance | 56 | 3.73 | 2006 |
Reconfigurable supercomputing - Is high-performance reconfigurable computing the next supercomputing paradigm? | 0 | 0.34 | 2006 |
Tools and techniques for performance - Architectures and APIs: assessing requirements for delivering FPGA performance to applications | 0 | 0.34 | 2006 |
Embedded floating-point units in FPGAs | 28 | 2.59 | 2006 |
Open Source High Performance Floating-Point Modules | 14 | 1.09 | 2006 |
Challenges and issues in benchmarking MPI | 2 | 0.45 | 2006 |
A preliminary analysis of the infinipath and XD1 network interfaces | 8 | 1.07 | 2006 |
Implications of application usage characteristics for collective communication offload | 18 | 0.91 | 2006 |
Poster reception - The structural simulation toolkit: exploring novel architectures | 2 | 0.36 | 2006 |
The implications of working set analysis on supercomputing memory hierarchy design | 12 | 1.43 | 2005 |
A Hardware Acceleration Unit for MPI Queue Processing | 20 | 1.01 | 2005 |
An Analysis of the Double-Precision Floating-Point FFT on FPGAs | 25 | 1.89 | 2005 |
Analyzing the Impact of Overlap, Offload, and Independent Progress for Message Passing Interface Applications | 27 | 1.62 | 2005 |
Enhancing NIC Performance for MPI using Processing-in-Memory | 5 | 0.50 | 2005 |
Initial Performance Evaluation of the Cray SeaStar Interconnect | 19 | 1.76 | 2005 |
Considering the Relative Importance of Network Performance and Network Features | 0 | 0.34 | 2005 |
RC-BLAST: Towards a Portable, Cost-Effective Open Source Hardware Implementation | 30 | 2.18 | 2005 |
The Impact of MPI Queue Usage on Message Latency | 27 | 2.09 | 2004 |
An Initial Analysis of the Impact of Overlap and Independent Progress for MPI | 11 | 1.13 | 2004 |
An Analysis of NIC Resource Usage for Offloading MPI | 26 | 1.96 | 2004 |
Closing the Gap: CPU and FPGA Trends in Sustainable Floating-Point BLAS Performance | 69 | 6.87 | 2004 |
FPGAs vs. CPUs: trends in peak floating-point performance | 129 | 11.02 | 2004 |
An Analysis of the Cost Effectiveness of an Adaptable Computing Cluster | 5 | 0.57 | 2004 |
An analysis of the impact of MPI overlap and independent progress | 28 | 2.39 | 2004 |
A Configurable Network Protocol for Cluster Based Communications using Modular Hardware Primitives on an Intelligent NIC | 3 | 0.49 | 2003 |
Analysis Of A Prototype Intelligent Network Interface | 4 | 0.53 | 2003 |
Evaluation of an Eager Protocol Optimization for MPI | 20 | 1.34 | 2003 |