Abdul Dakkak

Senior Compiler Developer @ Wolfram Research

PhD candidate in CS @ UIUC


Abdul Dakkak is a Ph.D. candidate in Computer Science at the University of Illinois at Urbana-Champaign (UIUC) advised by Professor Wen-mei Hwu. He is a senior compiler developer at Wolfram Research, leading the Wolfram Compiler effort. Abdul's research interest lies between programming languages and accelerated computing, with a focus on compiling high-level languages into performant code running on different hardware. In the process, he has developed industry-grade tools for compiling, running, profiling, and introspecting real-world applications to optimize their performance across both the hardware and software stack. As a primary developer of the Wolfram Compiler, Abdul has developed the Wolfram type system and architected the Wolfram runtime. As a result, the compiled Wolfram code matches the speed to hand-optimized C code and can target accelerator and multi-node systems.

Abdul has been involved in teaching activities. He developed tools to enable teaching for large classrooms and is the author of WebGPU and RAI. Both WebGPU and RAI have over 100k users and are used across over 14 universities (including the University of Michigan, BSC/UPC, UIC, the University of Tennessee, …) to evaluate over 2.5 million labs. He has aided in teaching the Coursera HPP course (3 times), the introductory and advanced CUDA courses (2 times), and the PUMPS summer school at BSC (4 times).

Aside from the above, Abdul also has been developing MLModelScope, which is a distributed platform allowing people to deploy, profile, and experiment with ML/DL frameworks and models. The tools are used to inform system design for Deep Learning model serving and develop highly tuned GPU kernels for model inference.


  • Compilers and Systems
  • Performance Optimizations
  • Artificial Intelligence


  • PhD Candidate in Computer Science, 2013-

    University of Illinois Urbana-Champaign

  • B.A. in Pure Mathematics, 2009

    University of Toledo



Senior Compiler Developer

Wolfram Research

Jan 2019 – Present Illinois
Responsibilities include:

  • Co-lead the Wolfram Compiler effort
  • Researched and developed the Wolfram type-system, code generation, and optimizations
  • Prototyped paths to compile to accelerators and to JavaScript

Visiting Researcher

IBM Research

Jun 2018 – Aug 2018 New York
Responsibilities include:

  • Developed a performance estimator and benchmarking platform for deep neural networks
  • Developed fast reduction and scan primitives that utilize the systolic matrix-array available on GPUs

Accelerated Computing Teaching Kit Developer

University of Illinois, Urbana-Champaign

Oct 2016 – May 2017 Illinois
Responsibilities include:

  • Authored labs downloaded by over 1000 instructors for GPU computing
  • Developed infrastructure to run, evaluate, and setup the labs
  • Developed a Syllabus website to host all course videos

Ph.D. Candidate in Computer Science

University of Illinois, Urbana-Champaign

Aug 2013 – Present Illinois
Responsibilities include:

  • Performed research on cutting-edge compiler, GPU, and AI
  • Developed highly-used systems including WebGPU, RAI, D4P, and MLModelScope
  • Mentored under-graduate, masters, and graduate students

Kernel Developer

Wolfram Research

Apr 2010 – Jan 2019 Illinois
Responsibilities include:

  • Lead a team to develop GPU integration in Mathematica
  • Optimized the core Mathematica engine and designed next-gen Wolfram runtime
  • Developed primitives for the Wolfram Geometry project

Junior Kernel Developer

Wolfram Research

Apr 2009 – Apr 2010 Illinois
Responsibilities include:

  • Developed and architected CUDALink and OpenCLLink
  • Developed CUDA capabilities of Wolfram Finance Platform


Wolfram Compiler

The Wolfram Compiler compiles the Wolfram Language into optimized native machine code.


An open-source, framework and hardware agnostic, extensible and customizable, distributed platform design for evaluating and profiling …


Leveraging NVIDIA’s Tensor Cores to express Collectives with matrix multiplication and exploring the benefits in terms of program …


Composable Benchmark Generation to Reduce Deep Learning Benchmarking Effort on CPUs.


Automatic μBenchmark Generation to Compute “Lower-bound” Latency and Inform Optimizations of Deep Learning Models on GPUs.


Transparent and Isolated Model Sharing for Low Latency Deep Learning Inference in Function as a Service Environments.


An extendable and customizable GPU benchmarking framework


A Scalable Project Submission System for Parallel Programming Courses.


A Scalable Lab Submission System for Parallel Programming Courses.

Recent Publications

. The Design and Implementation of the Wolfram Language Compiler. To appear in CGO 20, 2020.

PDF Project

. XSP: Across-Stack Profiling and Analysis of Machine Learning Models on GPUs. To appear in IPDPS 2020, 2019.

PDF Project

. The Design and Implementation of a Scalable DL Benchmarking Platform. In arXiv, 2019.

PDF Project

. DLBricks: Composable Benchmark Generation to Reduce Deep Learning Benchmarking Effort on CPUs. To appear in ICPE 2020, 2019.

PDF Project

Recent & Upcoming Talks

SC 2019 - Across-Stack Profiling and Characterization of State-of-the-Art Machine Learning Models on GPUs

The past few years have seen a surge of using Machine Learning (ML) and Deep Learning (DL) algorithms for traditional HPC tasks such as …

Developing in the Wolfram Compiler

The Wolfram Language is a dynamic untyped language that has a 30-year history. The talk will describe current work in developing a …

HotChips 2019 - MLModelScope: Evaluate and Profile ML Models at Scale and Across Stack

The current landscape of Machine Learning (ML) and Deep Learning (DL) is rife with non-uniform frameworks, models, and system stacks …

Recognitions & Awards

Best Paper Candidate XSP: Across-Stack Profiling and Analysis of Machine Learning Models on GPUs

ACM Artifact Evaluation Stamp for The Design and Implementation of the Wolfram Language Compiler

Best Paper and ACM Artifact Evaluation Stamp for Evaluating CUDA Communication Primitives on High-Bandwidth Interconnects

Best Poster

Top-20 Poster

Countries Visited