Publications

. The Design and Implementation of the Wolfram Language Compiler. To appear in CGO 20, 2020.

PDF Project

. XSP: Across-Stack Profiling and Analysis of Machine Learning Models on GPUs. To appear in IPDPS 2020, 2019.

PDF Project

. The Design and Implementation of a Scalable DL Benchmarking Platform. In arXiv, 2019.

PDF Project

. DLBricks: Composable Benchmark Generation to Reduce Deep Learning Benchmarking Effort on CPUs. To appear in ICPE 2020, 2019.

PDF Project

. Challenges and Pitfalls of Reproducing Machine Learning Artifacts. In arXiv, 2019.

PDF Project

. Accelerating Reduction and Scan Using Tensor Core Units. ICS, 2019.

PDF Code Project Slides website

. Evaluating Characteristics of CUDA Communication Primitives on High-Bandwidth Interconnects. In ICPE, 2019.

PDF Code Project Slides

. Accelerating Reduction Using Tensor Core Units. HPCaML, 2019.

PDF Code Project Slides

. SCOPE: C3SR Systems Characterization and Benchmarking Framework. In arXiv, 2018.

PDF Code Project Source Document website

. MLModelScope: Evaluate and Introspect Cognitive Pipelines. IEEE Services, 2018.

PDF Project

. RAI: A Scalable Project Submission System for Parallel Programming Courses. In IPDPSW, 2017.

PDF Code Project