The Design and Implementation of a Scalable DL Benchmarking Platform. To appear in CLOUD 2020, 2020.

PDF Project website

DLSpec: A Deep Learning Task Exchange Specification. To appear in USENIX OpML, 2020.

PDF Project website

MLModelScope: A Distributed Platform for Model Evaluation and Benchmarking at Scale. In ArXiv, 2020.

Preprint PDF Project

XSP: Across-Stack Profiling and Analysis of Machine Learning Models on GPUs. To appear in IPDPS, 2020.

PDF Project Slides Video

The Design and Implementation of the Wolfram Language Compiler. CGO, 2020.

PDF Project

DLBricks: Composable Benchmark Generation to Reduce Deep Learning Benchmarking Effort on CPUs. To appear in ICPE, 2020.

PDF Project Slides extended

Challenges and Pitfalls of Reproducing Machine Learning Artifacts. In ArXiv, 2019.

PDF Project

Accelerating Reduction and Scan Using Tensor Core Units. ICS, 2019.

PDF Code Project Project Slides website

Evaluating Characteristics of CUDA Communication Primitives on High-Bandwidth Interconnects. In ICPE, 2019.

PDF Code Project Slides

Accelerating Reduction Using Tensor Core Units. HPCaML, 2019.

PDF Code Project Project Slides

SCOPE: C3SR Systems Characterization and Benchmarking Framework. In ArXiv, 2018.

PDF Code Project Source Document website