Name
Affiliation
Papers
JINGWEN LENG
Univ Texas Austin, Dept Elect & Comp Engn, Austin, TX 78712 USA
36
Collaborators
Citations 
PageRank 
98
49
12.97
Referers 
Referees 
References 
171
701
208
Search Limit
100701
Title
Citations
PageRank
Year
SALO: an efficient spatial accelerator enabling hybrid sparse attention mechanisms for long sequences00.342022
Transkimmer: Transformer Learns to Layer-wise Skim10.352022
PAME: precision-aware multi-exit DNN serving for reducing latencies of batched inferences00.342022
SQuant: On-the-Fly Data-Free Quantization via Diagonal Hessian Approximation00.342022
Tacker: Tensor-CUDA Core Kernel Fusion for Improving the GPU Utilization while Ensuring QoS00.342022
Block-Skim: Efficient Question Answering for Transformer.00.342022
ANT: Exploiting Adaptive Numerical Data Type for Low-bit Deep Neural Network Quantization00.342022
Dual-side Sparse Tensor Core50.412021
Exploiting Intra-SM Parallelism in GPUs via Persistent and Elastic Blocks10.382021
AlphaR: Learning-Powered Resource Management for Irregular, Dynamic Microservice Graph10.352021
System-level Early-stage Modeling and Evaluation of IVR-assisted Processor Power Delivery System00.342021
Predictive Guardbanding: Program-Driven Timing Margin Reduction for GPUs10.392021
Enable simultaneous DNN services based on deterministic operator overlap and precise latency prediction10.352021
Erratum to “Predictive Guardbanding: Program-Driven Timing Margin Reduction for GPUs”00.342021
How Far Does BERT Look At:Distance-based Clustering and Analysis of BERT\'s Attention00.342020
Probabilistic robust regression with adaptive weights - a case study on face recognition.00.342020
URSA - Precise Capacity Planning and Fair Scheduling based on Low-level Statistics for Public Clouds.20.362020
Asymmetric Resilience: Exploiting Task-Level Idempotency for Transient Error Recovery in Accelerator-Based Systems30.352020
Sturgeon: Preference-aware Co-location for Improving Utilization of Power Constrained Computers10.352020
Balancing Efficiency and Flexibility for DNN Acceleration via Temporal GPU-Systolic Array Integration10.352020
Ptolemy: Architecture Support for Robust Deep Learning10.352020
CODA: Improving Resource Utilization by Slimming and Co-locating DNN and CPU Jobs10.352020
DLFusion: An Auto-Tuning Compiler for Layer Fusion on Deep Neural Network Accelerator00.342020
Survey and design of paleozoic: a high-performance compiler tool chain for deep learning inference accelerator.00.342020
Accelerating sparse DNN models without hardware-support via tile-wise sparsity00.342020
Predicting and reining in application-level slowdown on spatial multitasking GPUs30.382020
Avalon: towards QoS awareness and improved utilization through multi-resource management in datacenters30.362019
DR Refresh: Releasing DRAM Potential by Enabling Read Accesses under Refresh00.342019
Adversarial Defense Through Network Profiling Based Path Extraction20.362019
Characterizing Perception Module Performance and Robustness in Production-Scale Autonomous Driving System.20.432019
Ebird: Elastic Batch for Improving Responsiveness and Throughput of Deep Learning Services30.432019
Themis: Predicting And Reining In Application-Level Slowdown On Spatial Multitasking Gpus10.342019
DR DRAM: Accelerating Memory-Read-Intensive Applications00.342018
Ivory: Early-Stage Design Space Exploration Tool for Integrated Voltage Regulators.00.342017
GPU voltage noise: Characterization and hierarchical smoothing of spatial and temporal voltage noise interference in GPU architectures120.502015
Exploiting Webpage Characteristics for Energy-Efficient Mobile Web Browsing40.422014