The emergence of Machine Learning (ML) and Deep Learning (DL) algorithms as a viable programming model has inspired plenty of innovations in the software development, algorithms, and system and architecture communities. Today, many software, hardware, and cloud providers are vying to capture the emerging market and make claims that are not easy to verify. Furthermore, since different communities optimize for different targets and use different terminology, users are overloaded with a lot of concepts and assumptions. This weeds out many novice users. MLModelScope is a tool that lowers the entry barrier for model management and measurement; with capabilities to simplify the usage, deployment, evaluation, and profiling of models across frameworks, systems, and datasets.
Users in the ML domain range from model consumers (who consumes the models for either training or inferencing), model/framework developers (who develop ML/DL models and frameworks), and system builders (who develop underlying hardware systems and infrastructures to support ML/DL workloads). A common challenge faced by all of these communities is the diversity and different usage pattern of the algorithms and frameworks: Models: are posted daily on arXiv and Github with results that are not easily replicated Frameworks: such as Torch, Caffe, Tensorflow, and MXNet have incompatible APIs and are not trivial to install Datasets: such as CIFAR, MNIST, and ImageNet employ different formats and are not optimized for system I/O Hardware infrastructures: such as X86, POWER, GPUs, and FPGAs have different design points
MLModelScope is an open-source distributed platform allowing users to easily deploy, evaluate, experiment, and benchmark machine learning frameworks and models across different hardware infrastructures, all through a common interface. Users can try different ML models with a click through the website, validate ML models performance/accuracy, and optimize model, framework, and hardware selection based on their own data and performance, power and cost constraints. For ML/DL model researchers, MLModelScope is also a deployment platform designed to promote the researcher’s model. The model researchers can receive feedback from the public on where the model breaks. The researcher can also validate and correlate their results against their peers' results. For the system architects, MLModelScope is an end-to-end workload characterization platform to understand system bottlenecks. It integrates with GPU and CPU profilers to provide distributed tracing and health monitoring and can be integrated with hardware simulators to design next-generation software-hardware co-designs. We are currently using MLModelScope for research in memory persistence, edge computing, and development of customized inference processors.
MLModelScope is under active development, but currently has built-in support for the Caffe, Caffe2, CNTK, MXNet, Tensorflow, and TensorRT frameworks. It has over 170 models available in its model repository, and all the popular vision datasets available in its dataset repository. It runs on X86, ARM, and POWER architectures with and without a GPU.
This talk will discuss the current challenges in ML both in research and in practice. We will then describe MLModelScope’s architecture and design points. We will showcase the tool’s profiling and evaluation reporting capabilities. Finally, time permitting, we will demo the usage of MLModelScope from both the command line and website.