High performance lattice regression on FPGAs via a high level hardware description language | 0 | 0.34 | 2021 |
Aurochs: An Architecture for Dataflow Threads | 1 | 0.34 | 2021 |
Chopping off the Tail: Bounded Non-Determinism for Real-Time Accelerators | 0 | 0.34 | 2021 |
SARA: Scaling a Reconfigurable Dataflow Accelerator | 4 | 0.42 | 2021 |
Capstan: A Vector RDA for Sparsity | 3 | 0.36 | 2021 |
Gorgon: Accelerating Machine Learning from Relational Data | 5 | 0.40 | 2020 |
DeepFreak: Learning Crystallography Diffraction Patterns with Automated Machine Learning. | 0 | 0.34 | 2019 |
Scalable interconnects for reconfigurable spatial architectures | 0 | 0.34 | 2019 |
Analysis of DAWNBench, a Time-to-Accuracy Machine Learning Performance Benchmark. | 6 | 0.55 | 2018 |
LevelHeaded: A Unified Engine for Business Intelligence and Linear Algebra Querying | 1 | 0.35 | 2018 |
High-Accuracy Low-Precision Training. | 0 | 0.34 | 2018 |
Exploring the Utility of Developer Exhaust | 0 | 0.34 | 2018 |
Practical Design Space Exploration | 1 | 0.36 | 2018 |
Mind the gap: bridging multi-domain query workloads with EmptyHeaded | 1 | 0.35 | 2017 |
Flare: Native Compilation for Heterogeneous Workloads in Apache Spark. | 3 | 0.37 | 2017 |
Infrastructure for Usable Machine Learning: The Stanford DAWN Project. | 3 | 0.37 | 2017 |
LevelHeaded: Making Worst-Case Optimal Joins Work in the Common Case. | 0 | 0.34 | 2017 |
Understanding and Optimizing Asynchronous Low-Precision Stochastic Gradient Descent. | 14 | 0.64 | 2017 |
Ensuring Rapid Mixing and Low Bias for Asynchronous Gibbs Sampling. | 6 | 0.55 | 2016 |
EmptyHeaded: A Relational Engine for Graph Processing | 29 | 0.79 | 2016 |
Automatic Generation of Efficient Accelerators for Reconfigurable Hardware. | 19 | 0.70 | 2016 |
Old techniques for new join algorithms: A case study in RDF processing | 7 | 0.47 | 2016 |
GraphOps: A Dataflow Library for Graph Analytics Acceleration. | 19 | 0.66 | 2016 |
Scaling Data Analytics with Moore's Law. | 0 | 0.34 | 2016 |
Automatic support for multi-module parallelism from computational patterns | 4 | 0.45 | 2015 |
Rapidly Mixing Gibbs Sampling for a Class of Factor Graphs Using Hierarchy Width | 6 | 0.50 | 2015 |
Energy-Efficient Abundant-Data Computing: The N3XT 1,000x | 24 | 1.82 | 2015 |
Taming the Wild: A Unified Analysis of Hogwild!-Style Algorithms | 0 | 0.34 | 2015 |
Simplifying Scalable Graph Processing with a Domain-Specific Language | 16 | 0.65 | 2014 |
Delite: A Compiler Architecture for Performance-Oriented Embedded Domain-Specific Languages | 48 | 1.40 | 2014 |
Beyond parallel programming with domain specific languages | 1 | 0.35 | 2014 |
Hardware acceleration of database operations | 56 | 1.84 | 2014 |
Global Convergence of Stochastic Gradient Descent for Some Nonconvex Matrix Problems. | 2 | 0.50 | 2014 |
Hardware system synthesis from Domain-Specific Languages | 20 | 0.88 | 2014 |
Locality-Aware Mapping of Nested Parallel Patterns on GPUs | 20 | 0.88 | 2014 |
Surgical precision JIT compilers | 16 | 0.77 | 2014 |
Composition and reuse with compiled domain-specific languages | 26 | 0.94 | 2013 |
On fast parallel detection of strongly connected components (SCC) in small-world graphs | 33 | 1.00 | 2013 |
Optimizing data structures in high-level programs: new directions for extensible compilers based on staging | 50 | 1.42 | 2013 |
Green-Marl: a DSL for easy and efficient graph analysis | 119 | 4.03 | 2012 |
A case of system-level hardware/software co-design and co-verification of a commodity multi-processor system with custom hardware | 1 | 0.35 | 2012 |
High performance embedded domain specific languages | 1 | 0.36 | 2012 |
Implementing Domain-Specific Languages for Heterogeneous Parallel Computing | 37 | 1.42 | 2011 |
OptiML: An Implicitly Parallel Domain-Specific Language for Machine Learning. | 65 | 2.53 | 2011 |
Runtime automatic speculative parallelization | 13 | 0.60 | 2011 |
Efficient Parallel Graph Exploration on Multi-Core CPU and GPU | 130 | 4.60 | 2011 |
A domain-specific approach to heterogeneous parallelism | 73 | 3.47 | 2011 |
Accelerating CUDA graph algorithms at maximum warp | 171 | 5.87 | 2011 |
Hardware/software co-design for high performance computing: challenges and opportunities | 4 | 0.68 | 2010 |
Eigenbench: A simple exploration tool for orthogonal TM characteristics | 31 | 0.99 | 2010 |