Compressing RNNs to Kilobyte Budget for IoT Devices Using Kronecker Products | 0 | 0.34 | 2021 |
Rank and run-time aware compression of NLP Applications. | 0 | 0.34 | 2020 |
Efficient Winograd or Cook-Toom Convolution Kernel Implementation on Widely Used Mobile CPUs | 1 | 0.35 | 2019 |
Run-Time Efficient RNN Compression for Inference on Edge Devices | 1 | 0.36 | 2019 |
Measuring scheduling efficiency of RNNs for NLP applications. | 0 | 0.34 | 2019 |
Skipping RNN State Updates without Retraining the Original Model | 0 | 0.34 | 2019 |
Compressing RNNs for IoT devices by 15-38x using Kronecker Products. | 0 | 0.34 | 2019 |
Ternary Hybrid Neural-Tree Networks for Highly Constrained IoT Applications. | 0 | 0.34 | 2019 |
Guest Editors' Introduction: Frontiers of Hardware and Algorithms for On-chip Learning. | 0 | 0.34 | 2018 |
BONSEYES: Platform for Open Development of Systems of Artificial Intelligence: Invited paper. | 3 | 0.51 | 2017 |
Scalpel: Customizing DNN Pruning to the Underlying Hardware Parallelism. | 30 | 0.81 | 2017 |
Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks. | 122 | 4.82 | 2016 |
Predicting room occupancy with a single passive infrared (PIR) sensor through behavior extraction. | 4 | 0.53 | 2016 |
APOGEE: adaptive prefetching on GPUs for energy efficiency | 24 | 0.78 | 2013 |
A 1GHz hardware loop-accelerator with razor-based dynamic adaptation for energy-efficient operation | 2 | 0.39 | 2013 |
A Customized Processor for Energy Efficient Scientific Computing | 7 | 0.57 | 2012 |
Correction to "A Power-Efficient 32 bit ARM Processor Using Timing-Error Detection and Correction for Transient-Error Tolerance and Adaptation to PVT Variation". | 73 | 4.39 | 2011 |
PEPSC: A Power-Efficient Processor for Scientific Computing | 10 | 0.61 | 2011 |
MEDICS: ultra-portable processing for medical image reconstruction | 4 | 0.97 | 2010 |
CoreGenesis: erasing core boundaries for robust and configurable performance | 2 | 0.39 | 2010 |
Bridging The Computation Gap Between Programmable Processors And Hardwired Accelerators | 42 | 2.68 | 2009 |
Power-Efficient Medical Image Processing Using Puma | 2 | 0.38 | 2009 |
DVFS in loop accelerators using BLADES | 9 | 0.68 | 2008 |
Partitioning Variables across Register Windows to Reduce Spill Code in a Low-Power Processor | 18 | 0.88 | 2005 |
Compiler Managed Dynamic Instruction Placement in a Low-Power Code Cache | 27 | 1.17 | 2005 |
Increasing the number of effective registers in a low-power processor using a windowed register file | 6 | 0.67 | 2003 |