DLSpec: A Deep Learning Task Exchange Specification. To appear in USENIX OpML, 2020.

PDF Project website

MLModelScope: A Distributed Platform for Model Evaluation and Benchmarking at Scale. In ArXiv, 2020.

Preprint PDF Project

XSP: Across-Stack Profiling and Analysis of Machine Learning Models on GPUs. To appear in IPDPS, 2020.

PDF Project

The Design and Implementation of the Wolfram Language Compiler. CGO, 2020.

PDF Project

DLBricks: Composable Benchmark Generation to Reduce Deep Learning Benchmarking Effort on CPUs. To appear in ICPE, 2020.

PDF Project Project extended

The Design and Implementation of a Scalable DL Benchmarking Platform. In ArXiv, 2019.

PDF Project

Challenges and Pitfalls of Reproducing Machine Learning Artifacts. In ArXiv, 2019.

PDF Project

Accelerating Reduction and Scan Using Tensor Core Units. ICS, 2019.

PDF Code Project Project Slides website

Evaluating Characteristics of CUDA Communication Primitives on High-Bandwidth Interconnects. In ICPE, 2019.

PDF Code Project Slides

Accelerating Reduction Using Tensor Core Units. HPCaML, 2019.

PDF Code Project Project Slides

SCOPE: C3SR Systems Characterization and Benchmarking Framework. In ArXiv, 2018.

PDF Code Project Source Document website

MLModelScope: Evaluate and Introspect Cognitive Pipelines. IEEE Services, 2018.

PDF Project