Using Benanza & DLBricks to Inform Optimizations

Abstract

Benanza and DLBricks are a sustainable way to develop ML benchmarks along with analyses the results to inform and pin-point optimization opportunities. Benanza and DLBricks consist of: a model processor which parses models into an internal representation, a benchmark generator that automatically generates micro-benchmarks given a set of models, a database of benchmark results, and an analyzer that computes the “lower-bound” latency of DL models using the benchmark data and informs optimizations of model execution. The metrics are used to estimate the ideal model execution on a GPU system and serve as the basis for identifying optimization opportunities in frameworks or system libraries. We used Benanza and DLBricks to evaluate over 80 models in MXNet, ONNX Runtime, and PyTorch on 7 GPUs and 5 CPUs.

Date
Apr 18, 2020 3:30 PM
Location
Lausanne, Switzerland