Accelerating Fourier and Number Theoretic Transforms using Tensor Cores and Warp Shuffles. PACT, 2021.

PDF Project Project

FFT Blitz: the Tensor Cores Strike Back. PPoPP, 2021.

PDF Project Project

★ The Design and Implementation of a Scalable DL Benchmarking Platform. CLOUD, 2020.

PDF Project website best paper

DLSpec: A Deep Learning Task Exchange Specification. USENIX OpML, 2020.

PDF Project website

MLModelScope: A Distributed Platform for Model Evaluation and Benchmarking at Scale. In ArXiv, 2020.

Preprint PDF Project

★ XSP: Across-Stack Profiling and Analysis of Machine Learning Models on GPUs. IPDPS, 2020.

PDF Project Slides Video best paper website

The Design and Implementation of the Wolfram Language Compiler. CGO, 2020.

PDF Project

MLModelScope: Evaluate and Introspect Cognitive Pipelines. SERVICES, 2019.

PDF Project

Challenges and Pitfalls of Reproducing Machine Learning Artifacts. In ArXiv, 2019.

PDF Project

Accelerating Reduction and Scan Using Tensor Core Units. ICS, 2019.

PDF Code Project Project Slides website

★ Evaluating Characteristics of CUDA Communication Primitives on High-Bandwidth Interconnects. In ICPE, 2019.

PDF Code Project Slides best paper

Accelerating Reduction Using Tensor Core Units. HPCaML, 2019.

PDF Code Project Project Slides

SCOPE: C3SR Systems Characterization and Benchmarking Framework. In ArXiv, 2018.

PDF Code Project Source Document website