Delegated Replies: Alleviating Network Clogging in Heterogeneous Architectures | 0 | 0.34 | 2022 |
LMT: Accurate and Resource-Scalable Slowdown Prediction | 0 | 0.34 | 2022 |
Modeling Periodic Energy-Harvesting Computing Systems | 0 | 0.34 | 2021 |
Fast And Accurate Edge Computing Energy Modeling And Dvfs Implementation In Gem5 Using System Call Emulation Mode | 0 | 0.34 | 2021 |
TIP: Time-Proportional Instruction Profiling | 0 | 0.34 | 2021 |
HSM: A Hybrid Slowdown Model for Multitasking GPUs | 4 | 0.47 | 2020 |
DCMI: A Scalable Strategy for Accelerating Iterative Stencil Loops on FPGAs | 0 | 0.34 | 2020 |
Selective Replication in Memory-Side GPU Caches | 0 | 0.34 | 2020 |
MDM: The GPU Memory Divergence Model | 1 | 0.35 | 2020 |
Scalability analysis of AVX-512 extensions | 0 | 0.34 | 2020 |
Modeling Emerging Memory-Divergent GPU Applications. | 1 | 0.36 | 2019 |
GDP: Using Dataflow Properties to Accurately Estimate Interference-Free Performance at Runtime | 0 | 0.34 | 2018 |
Get Out of the Valley: Power-Efficient Address Mapping for GPUs. | 3 | 0.36 | 2018 |
Supporting Utilities for Heterogeneous Embedded Image Processing Platforms (STHEM): An Overview. | 1 | 0.37 | 2018 |
Streamlined Deployment for Quantized Neural Networks. | 1 | 0.47 | 2017 |
Scaling Binarized Neural Networks on Reconfigurable Logic. | 10 | 0.56 | 2017 |
FINN: A Framework for Fast, Scalable Binarized Neural Network Inference. | 143 | 4.87 | 2017 |
Extending OMPT to Support Grain Graphs. | 2 | 0.43 | 2017 |
The READEX formalism for automatic tuning for energy efficiency. | 6 | 0.49 | 2017 |
Towards Efficient Design Space Exploration of FPGA-based Accelerators for Streaming HPC Applications (Abstract Only). | 0 | 0.34 | 2017 |
DTP: Enabling Exhaustive Exploration of FPGA Temporal Partitions for Streaming HPC Applications. | 0 | 0.34 | 2017 |
TULIPP: Towards ubiquitous low-power image processing platforms | 1 | 0.37 | 2016 |
Random access schemes for efficient FPGA SpMV acceleration. | 1 | 0.34 | 2016 |
Tuning the victim selection policy of Intel TBB | 0 | 0.34 | 2015 |
A Vector Caching Scheme for Streaming FPGA SpMV Accelerators. | 2 | 0.39 | 2015 |
Hybrid breadth-first search on a single-chip FPGA-CPU heterogeneous platform | 10 | 0.60 | 2015 |
ParVec: vectorizing the PARSEC benchmark suite | 4 | 0.49 | 2015 |
An energy efficient column-major backend for FPGA SpMV accelerators | 5 | 0.45 | 2014 |
Optimized hardware for suboptimal software: The case for SIMD-aware benchmarks | 6 | 0.46 | 2014 |
Victim Selection Policies for Intel TBB: Overheads and Energy Footprint | 1 | 0.36 | 2014 |
Challenges Of Reducing Cycle-Accurate Simulation Time For Tbp Applications | 0 | 0.34 | 2013 |
On the energy footprint of task based parallel applications | 2 | 0.38 | 2013 |
A high performance adaptive miss handling architecture for chip multiprocessors | 4 | 0.40 | 2011 |
Storage Efficient Hardware Prefetching using Delta-Correlating Prediction Tables. | 4 | 0.42 | 2011 |
Exploring the prefetcher/memory controller design space: an opportunistic prefetch scheduling strategy | 0 | 0.34 | 2011 |
Computational Computer Architecture Research At Ntnu | 0 | 0.34 | 2010 |
DIEF: an accurate interference feedback mechanism for chip multiprocessor memory systems | 1 | 0.37 | 2010 |
Multi-level hardware prefetching using low complexity delta correlating prediction tables with partial matching | 3 | 0.46 | 2010 |
A light-weight fairness mechanism for chip multiprocessor memory systems | 5 | 0.42 | 2009 |
A Quantitative Study of Memory System Interference in Chip Multiprocessor Architectures | 3 | 0.42 | 2009 |
Low-cost open-page prefetch scheduling in chip multiprocessors | 2 | 0.39 | 2008 |