Splunk Inc.

11/14/2024 | News release | Distributed by Public on 11/14/2024 17:53

Open Neural Network Exchange (ONNX) Explained

With plenty of options in the machine learning and artificial intelligence (AI) frameworks ecosystem, how does one ensure compatibility between them? That's where the Open Neural Network Exchange (ONNX) comes in.

What is Open Neural Network Exchange (ONNX)?

Developed as an open-source initiative, ONNX is a common format that bridges the gap between different AI frameworks and enables seamless interoperability and model portability.

Think of ONNX as a common language that allows AI models to be transferred between various frameworks such as PyTorch, TensorFlow, and Caffe2. This flexibility makes it easier for developers to leverage the strengths of different tools without being locked into a single ecosystem.

The rise of ONNX represents a significant milestone in the quest for a more collaborative and integrated AI development environment, facilitating smoother transitions and more efficient workflows.

Similar to other open-source initiatives, ONNX is community-driven and welcomes contributions from developers and researchers from all over the world. More information on this format can be found in the ONNX GitHub repository.

How ONNX works: Exploring interoperability and model portability

At its core, ONNX defines:

This structure makes it easy to transfer your ML models across frameworks.

Interoperability is achieved through ONNX's ability to represent complex neural network graphs and operations consistently. This consistency ensures that models perform predictably, regardless of the platform on which they are executed.

Model portability is another critical aspect of ONNX, enabling developers to deploy their models across various environments, including cloud services, edge devices, and mobile applications. This versatility is crucial for creating scalable solutions that can adapt to diverse deployment scenarios.

Advantages of using ONNX

For developers, ONNX offers a range of benefits that streamline the AI development lifecycle. These include:

  • Flexibility in utilizing different frameworks and tools without constraints.
  • Easy model transfer and deployment across various platforms.
  • Ability to switch between frameworks without losing progress or performance.
  • Improved collaboration between teams using different AI frameworks.
  • Accelerates the inference process and improves performance of models.

Additionally, one of the most significant advantages is the ONNX also simplifies the process of optimizing and deploying models, as it is supported by numerous hardware and software vendors. This broad support ensures that ONNX models can be executed efficiently on a wide array of devices, from high-performance GPUs to resource-constrained edge devices.

For enterprises, ONNX offers the following advantages:

  • Reduced costs and time-to-market by eliminating the need for reimplementation when switching between frameworks.
  • Increased compatibility between different parts of an AI solution, promoting a more integrated system.
  • Ability to leverage the latest advancements in multiple frameworks simultaneously, enabling faster innovation and better performance.

These are substantial benefits that can help any data team within an organization achieve greater efficiency and productivity.

(Related reading: AI-augmented software engineering & secure AI system development.)

ONNX in MLOps and model serving

ONNX is also gaining popularity in the MLOps (Machine Learning Operations) world, as it facilitates smooth integration with model serving platforms. ONNX models can be easily deployed on a variety of serving systems, such as Azure Machine Learning and Azure Cognitive Services.

This compatibility enables seamless orchestration between different tools and stages of the machine learning pipeline, from development to deployment.

Furthermore, ONNX models are also supported by popular frameworks used for model serving, such as TensorFlow Serving and TorchServe. A common problem in MLOps is managing different versions of models, especially when dealing with multiple frameworks. ONNX solves this issue by serving as a single format that can be used to transfer and deploy models consistently.

For example, large language models (LLMs) can make use of the features provided by ONNX. LLMs tend to be resource-intensive and may take longer processing times. In this article on optimizing LLM performance, the ONNX format was used to speed up these processing times.

What is ONNX Runtime?

ONNX Runtime is a high-performance engine for executing ONNX models, developed and maintained by Microsoft. It offers cross-platform support, including Windows, Linux, MacOS, and mobile devices.

ONNX Runtime provides fast and efficient inferencing with support for advanced hardware acceleration techniques such as NVIDIA TensorRT and CoreML and many more. This optimization makes it an ideal choice for deploying ONNX models in production environments.

In addition to performance optimizations, ONNX Runtime also supports the most recent versions of ONNX specifications, ensuring compatibility with the latest features introduced in different AI frameworks.

ONNX also has tight integrations to common platforms in AI, such as:

More information can be found on the ONNX Runtime GitHub repository.

Examples of ONNX Runtime applications

Here are some ways ONNX Runtime is already being used to great effect:

Optimizing BERT Model for Intel CPU Cores. The BERT (Bidirectional Encoder Representations from Transformers) model is a popular technique for Natural Language Processing (NLP). Powered by deep neural networks, it leverages the Transformer architecture to achieve state-of-the-art results in various NLP tasks.

Using the ONNX Runtime engine can increase the throughput and performance of the model. Read more on in this article on the Microsoft Open Source Blog.

Optimizing MiniLM Sentence Transformers Model. ONNX Runtime can also optimize models for deployment on edge devices, such as mobile phones and IoT devices. In this tutorial by Philipp Schmid, ONNX Runtime and Hugging Face Optimum were used to optimize the MiniLM Sentence Transformers Model, resulting in a 2.03x reduction in latency.

Accelerating NLP pipelines. In an article by Morgan Funtowicz from Hugging Face and Tianlei Wu from Microsoft, they used ONNX Runtime to optimize and deploy NLP models, resulting in a 5x in inference speedup times compared to the default PyTorch implementation.

Accelerating scikit-learn model inference. ONNX Runtime can also optimize and accelerate traditional machine learning models. In this article, ONNX Runtime was used to achieve a 5x performance improvement for different scikit-learn models.

These are just a few examples of how ONNX and ONNX Runtime are being used to improve the performance of AI solutions across different industries and applications. As adoption continues to grow, we can expect even more innovations and advancements in this field.

Final words

ONNX truly offers a valuable contribution to the world of machine learning, providing a standardized format for transferring and deploying models across different frameworks and platforms. Its compatibility with different tools and systems makes it an essential tool for data teams working on developing and deploying AI solutions.

With the continued development of ONNX Runtime, we can expect smoother and faster deployment of models in production environments, driving greater efficiency and productivity for data teams within organizations.