A Deep Neural Network Accelerator using Residue Arithmetic in a Hybrid Optoelectronic System | 0 | 0.34 | 2022 |
iSample: Intelligent Client Sampling in Federated Learning | 0 | 0.34 | 2022 |
Adaptive and Efficient Resource Allocation in Cloud Datacenters Using Actor-Critic Deep Reinforcement Learning | 0 | 0.34 | 2022 |
ReCPE: A PE for Reconfigurable Lightweight Cryptography | 1 | 0.38 | 2021 |
A Machine-Learning-Based Framework for Productive Locality Exploitation | 0 | 0.34 | 2021 |
Virtualizing Analog Mesh Computers: The Case of a Photonic PDE Solving Accelerator | 0 | 0.34 | 2020 |
Software stack for an analog mesh computer: the case of a nanophotonic PDE accelerator | 0 | 0.34 | 2020 |
Photonic Processor for Fully Discretized Neural Networks | 0 | 0.34 | 2019 |
Can Photonic Computing be the Answer to Green and Sustainable Computing? | 0 | 0.34 | 2019 |
LAPPS: Locality-Aware Productive Prefetching Support for PGAS. | 0 | 0.34 | 2018 |
HyPPI NoC: Bringing Hybrid Plasmonics to an Opto-Electronic Network-on-Chip | 0 | 0.34 | 2017 |
Reordering GPU Kernel Launches to Enable Efficient Concurrent Execution | 1 | 0.35 | 2015 |
Adaptive Cache Coherence Mechanisms with Producer–Consumer Sharing Optimization for Chip Multiprocessors | 3 | 0.41 | 2015 |
Bandwidth Adaptive Cache Coherence Optimizations for Chip Multiprocessors | 0 | 0.34 | 2014 |
An Adaptive Hybrid OLAP Architecture with optimized memory access patterns | 5 | 0.46 | 2013 |
Application-specific processors for web-browsing: An exploration and evaluation of the design space | 2 | 0.36 | 2013 |
Accelerated high-performance computing through efficient multi-process GPU resource sharing | 1 | 0.34 | 2012 |
Distributed Shared Memory Programming in the Cloud | 1 | 0.35 | 2012 |
A convolve-and-merge approach for exact computations on high-performance reconfigurable computers | 0 | 0.34 | 2012 |
Bandwidth Adaptive Write-update Optimizations for Chip Multiprocessors | 1 | 0.35 | 2012 |
Task Scheduling for GPU Accelerated Hybrid OLAP Systems with Multi-core Support and Text-to-Integer Translation | 4 | 0.42 | 2012 |
Productivity of GPUs under different programming paradigms | 8 | 0.77 | 2012 |
Efficient Mapping of Task Graphs onto Reconfigurable Hardware Using Architectural Variants | 9 | 0.49 | 2012 |
A Compartive Study of Cloud Computing Middleware | 0 | 0.34 | 2012 |
Towards efficient GPU sharing on multicore processors | 7 | 0.61 | 2012 |
Exploiting Hierarchical Parallelism Using UPC | 1 | 0.36 | 2011 |
A Static Task Scheduling Framework for Independent Tasks Accelerated Using a Shared Graphics Processing Unit | 6 | 0.52 | 2011 |
An Architecture for Reconfigurable Multi-core Explorations | 3 | 0.40 | 2011 |
New Hardware Architectures for Montgomery Modular Multiplication Algorithm | 33 | 1.82 | 2011 |
GPU Resource Sharing and Virtualization on High Performance Computing Systems | 19 | 1.07 | 2011 |
Task scheduling for GPU accelerated OLAP systems | 1 | 0.36 | 2011 |
A Framework for Evaluating High-Level Design Methodologies for High-Performance Reconfigurable Computers | 4 | 0.52 | 2011 |
Scaling scientific applications on clusters of hybrid multicore/GPU nodes | 8 | 0.97 | 2011 |
Modelling the performance of an SSD-Aware storage system using least squares regression | 1 | 0.36 | 2011 |
Reflex Barrier: A Scalable Network-Based Synchronization Barrier | 0 | 0.34 | 2011 |
Reconfiguration and Communication-Aware Task Scheduling for High-Performance Reconfigurable Computing | 18 | 0.74 | 2010 |
Efficient cache design for solid-state drives | 4 | 0.48 | 2010 |
An adaptive cache coherence protocol for chip multiprocessors | 4 | 0.39 | 2010 |
Parameterized hardware design on reconfigurable computers: an image processing case study | 3 | 0.47 | 2010 |
Space and time sharing of reconfigurable hardware for accelerated parallel processing | 1 | 0.37 | 2010 |
RDMS: A hardware task scheduling algorithm for Reconfigurable Computing | 8 | 0.67 | 2009 |
Performance issues in emerging homogeneous multi-core architectures | 10 | 0.62 | 2009 |
Efficient Mapping of Hardware Tasks on Reconfigurable Computers Using Libraries of Architecture Variants | 3 | 0.41 | 2009 |
Exploiting Partial Runtime Reconfiguration for High-Performance Reconfigurable Computing | 26 | 1.58 | 2009 |
Performance Evaluation of Clusters with ccNUMA Nodes - A Case Study | 5 | 0.47 | 2008 |
Portable library development for reconfigurable computing systems: A case study | 4 | 0.49 | 2008 |
An optimized hardware architecture for the montgomery multiplication algorithm | 17 | 1.41 | 2008 |
Extreme parallel architectures for the masses | 0 | 0.34 | 2008 |
Application Performance Tuning for Clusters with ccNUMA Nodes | 5 | 0.61 | 2008 |
DNA and Protein Sequence Alignment with High Performance Reconfigurable Systems | 5 | 0.48 | 2007 |