8-bit integer models using the AI Model Efficiency Toolkit
Jan 22, 2021
Qualcomm products mentioned within this post are offered by Qualcomm Technologies, Inc. and/or its subsidiaries.
Making neural network models smaller is crucial for the widespread deployment of AI. Qualcomm AI Research has been developing state-of-the-art quantization techniques that enable power-efficient fixed-point inference while preserving model accuracy, such as Data Free Quantization (DFQ) and AdaRound, which are post-training techniques that achieve accurate 8-bit quantization without data.
To make this research more accessible and contribute to the open-source community, Qualcomm Innovation Center (QuIC) launched the AI Model Efficiency Toolkit (AIMET) on GitHub in May 2020. AIMET's goal is to enable power efficient integer inference by providing a simple library plugin for AI developers to utilize for state-of-the-art model efficiency performance. The AIMET project is flourishing with regularly updated quantization techniques based on work from Qualcomm AI Research and active use by the broader AI community, including multiple mobile OEMs, ISVs, and researchers in academia.
Click for larger image
QuIC is now taking it a step further by contributing a collection of popular pre-trained models optimized for 8-bit inference to GitHub in the form of 'AIMET Model Zoo.' Together with the models, AIMET Model Zoo also provides the recipe for quantizing popular 32-bit floating point (FP32) models to 8-bit integer (INT8) models with little loss in accuracy. The tested and verified recipes include a script that optimizes TensorFlow or PyTorch models across a broad range of categories from image classification, object detection, semantic segmentation, and pose estimation to super resolution, and speech recognition.
Click to see a larger image
This will allow researchers and developers direct access to highly accurate quantized models, saving them time in achieving performance benefits like reduced energy consumption, latency, and memory requirements for on-target inference. For example, imagine you are a developer wanting to do semantic segmentation for image beautification or autonomous driving use cases by using DeepLabv3+ model. AIMET Model Zoo provides an optimized DeepLabv3+ model using the DFQ and Quantization Aware Training (QAT) features from AIMET. The corresponding AIMET Model Zoo recipe points to this optimized model and provides proper calls to the AIMET library to run INT8 simulation and assess performance. In fact, the AIMET quantized version has a Mean Intersection over Union (mIoU) score of 72.08%, which is virtually equivalent to the 72.32% provided by the original FP32 model. The image below visually shows how the quantized model in AIMET Model Zoo results in accurate semantic segmentation.
Click to see a larger image
This is one example. The AIMET Model Zoo has many INT8 quantized neural network models that provide accurate inference comparable to FP32 models. With this initial contribution of 14 INT8 models to AIMET Model Zoo, we are easing the hurdles for the ecosystem in using quantized models in their AI workloads and thus marching toward making fixed-point power-efficient inference ubiquitous. You can get the best of both worlds -- the high accuracy of a floating-point model and the model efficiency of 8-bit integer models.
Check ourAIMET Model ZooandAIMET.
Sign up for our newsletter to receive the latest information about mobile computing
Engage with us on
Twitter and LinkedIn
Opinions expressed in the content posted here are the personal opinions of the original authors, and do not necessarily reflect those of Qualcomm Incorporated or its subsidiaries ('Qualcomm'). Qualcomm products mentioned within this post are offered by Qualcomm Technologies, Inc. and/or its subsidiaries. The content is provided for informational purposes only and is not meant to be an endorsement or representation by Qualcomm or any other party. This site may also provide links or references to non-Qualcomm sites and resources. Qualcomm makes no representations, warranties, or other commitments whatsoever about any non-Qualcomm sites or third-party resources that may be referenced, accessible from, or linked to this site.
Chirag Patel
Engineer, Principal/Mgr., Qualcomm Technologies
More articles from this author
About this author
Attachments
- Original document
- Permalink
Disclaimer
Qualcomm Inc. published this content on 22 January 2021 and is solely responsible for the information contained therein. Distributed by Public, unedited and unaltered, on 23 January 2021 03:03:02 UTC