Senior Compiler Developer @ Wolfram Research

PhD candidate in CS @ UIUC


Abdul Dakkak is a Ph.D. candidate in Computer Science at the University of Illinois at Urbana-Champaign (UIUC) advised by Professor Wen-mei Hwu. He is a senior compiler developer at Wolfram Research, leading the Wolfram Compiler effort. Abdul’s research interest lies between programming languages and accelerated computing, with a focus on compiling high-level languages into performant code running on different hardware. In the process, he has developed industry-grade tools for compiling, running, profiling, and introspecting real-world applications to optimize their performance across both the hardware and software stack. As a primary developer of the Wolfram Compiler, Abdul has developed the Wolfram type system and architected the Wolfram runtime. As a result, the compiled Wolfram code matches the speed to hand-optimized C code and can target accelerator and multi-node systems.

Abdul has been involved in teaching activities. He developed tools to enable teaching for large classrooms and is the author of WebGPU and RAI. Both WebGPU and RAI have over 100k users and are used across over 14 universities (including the University of Michigan, BSC/UPC, UIC, the University of Tennessee, …) to evaluate over 2.5 million labs. He has aided in teaching the Coursera HPP course (3 times), the introductory and advanced CUDA courses (2 times), and the PUMPS summer school at BSC (4 times).

Aside from the above, Abdul also has been developing MLModelScope, which is a distributed platform allowing people to deploy, profile, and experiment with ML/DL frameworks and models. The tools are used to inform system design for Deep Learning model serving and develop highly tuned GPU kernels for model inference.


  • Compilers and Systems
  • Performance Optimizations
  • Artificial Intelligence


  • PhD Candidate in Computer Science, 2013-

    University of Illinois Urbana-Champaign

  • B.A. in Pure Mathematics, 2009

    University of Toledo



Senior Compiler Developer

Wolfram Research

Jan 2019 – Present Illinois
Responsibilities include:

  • Co-lead the Wolfram Compiler effort
  • Researched and developed the Wolfram type-system, code generation, and optimizations
  • Prototyped paths to compile to accelerators and to JavaScript

Ph.D. Candidate in Computer Science

University of Illinois, Urbana-Champaign

Aug 2013 – Present Illinois
Responsibilities include:

  • Performed research on cutting-edge compiler, GPU, and AI
  • Developed highly-used systems including WebGPU, RAI, D4P, and MLModelScope
  • Mentored under-graduate, masters, and graduate students

Kernel Developer

Wolfram Research

Apr 2010 – Jan 2019 Illinois
Responsibilities include:

  • Lead a team to develop GPU integration in Mathematica
  • Developed a domain specific language to write financial code for Wolfram Finance Platform
  • Optimized the core Mathematica engine and designed next-gen Wolfram runtime
  • Developed primitives for the Wolfram Geometry project

Junior Kernel Developer

Wolfram Research

Apr 2009 – Apr 2010 Illinois
Responsibilities include:

  • Developed and architected CUDALink and OpenCLLink
  • Optimized C foreign function interface path for Mathematica
  • Developed NVIDIA Compiler bindings for C compiler driver

Recent Publications

The Design and Implementation of a Scalable DL Benchmarking Platform. To appear in CLOUD 2020, 2020.

PDF Project website

DLSpec: A Deep Learning Task Exchange Specification. To appear in USENIX OpML, 2020.

PDF Project website

MLModelScope: A Distributed Platform for Model Evaluation and Benchmarking at Scale. In ArXiv, 2020.

Preprint PDF Project

DLBricks: Composable Benchmark Generation to Reduce Deep Learning Benchmarking Effort on CPUs. To appear in ICPE, 2020.

PDF Project Slides extended


Wolfram Compiler

The Wolfram Compiler compiles the Wolfram Language into optimized native machine code.


An open-source, framework and hardware agnostic, extensible and customizable, distributed platform design for evaluating and profiling ML models across datasets/frameworks/systems.


Automatic μBenchmark Generation to Compute “Lower-bound” Latency and Inform Optimizations of Deep Learning Models on GPUs.


A Scalable Project Submission System for Parallel Programming Courses.


A Scalable Lab Submission System for Parallel Programming Courses.

Recent & Upcoming Talks

Workshop on Benchmarking Machine Learning Workloads

With evolving system architectures, hardware and software stacks, diverse machine learning (ML) workloads, and data, it is important to understand how these components interact with each other. Well-defined benchmarking procedures help evaluate and reason the performance gains with ML workload-to-system mappings. We welcome all novel submissions in benchmarking machine learning workloads from all disciplines, such as image and speech recognition, language processing, drug discovery, simulations, …

Digging Deep into Model Performing using Across Stack Profiling

This talk presents an Across-Stack Profiling (XSP) which is a leveled profiling design that leverages existing profiling tools to give a drill down view of model, system, and hardware bottlenecks. The design does so in spite of the profiling overheads incurred from the profiling. We coupled the profiling capability with an automatic analysis pipeline to systematically characterize over 150 state-of-the-art ML models. Through this characterization, we show that our across-stack profiling solution …

Using Benanza & DLBricks to Inform Optimizations

Benanza and DLBricks are a sustainable way to develop ML benchmarks along with analyses the results to inform and pin-point optimization opportunities. Benanza and DLBricks consist of: a model processor which parses models into an internal representation, a benchmark generator that automatically generates micro-benchmarks given a set of models, a database of benchmark results, and an analyzer that computes the “lower-bound” latency of DL models using the benchmark data and informs optimizations …

SC 2019 - Across-Stack Profiling and Characterization of State-of-the-Art Machine Learning Models on GPUs

The past few years have seen a surge of using Machine Learning (ML) and Deep Learning (DL) algorithms for traditional HPC tasks such as feature detection, numerical analysis, and graph analytics. While ML and DL help solving HPC tasks, their adoption has been hampered in part because of the complexity of understanding ML/DL and their interactions with systems utilization. Optimizing these algorithms requires characterizing their performance and resource utilization across the hardware/software …

Recognitions & Awards

Best Paper XSP: Across-Stack Profiling and Analysis of Machine Learning Models on GPUs

ACM Artifact Evaluation Stamp for The Design and Implementation of the Wolfram Language Compiler

Best Paper and ACM Artifact Evaluation Stamp for Evaluating CUDA Communication Primitives on High-Bandwidth Interconnects

Best Poster

Top-20 Poster

Countries Visited