NPU Development Guide

2025-11-26

Quectel Pi H1 product SoC is equipped with Qualcomm ® Hexagon ™ Processor (NPU) is a hardware accelerator specifically designed for AI inference. To use NPU for model inference, QAIRT (Qualcomm® AI Runtime SDK) is required performs model porting on pre trained models. Qualcomm ® Provide a series of SDK for NPU developers to facilitate users in NPU porting their AI models.

Model quantification library: AIMET
Model transplantation SDK: QAIRT
Model Application Library: QAI-APP-BUILDER
Online Model Conversion Library: QAI-HUB

Preparation work

Create Python Execution Environment

sudo apt install python3-numpy
python3.10 -m venv venv-quecpi-alpha-ai
. venv-quecpi-alpha-ai/bin/activate

Download the testing program

Download ai-test.zip, extract and switch directories

unzip ai-test.zip
cd ai-test

Execute AI reasoning

Execute program to load model and test dataset

./qnn-net-run --backend ./libQnnHtp.so \
     --retrieve_context resnet50_aimet_quantized_6490.bin \
     --input_list test_list.txt --output_dir output_bin

View Results

Execute the script to view the results

python3 show_resnet50_classifications.py \
      --input_list test_list.txt -o output_bin/ \
      --labels_file imagenet_classes.txt

Script output result:

Classification results
./images/ILSVRC2012_val_00003441.raw [acoustic guitar]
./images/ILSVRC2012_val_00008465.raw [trifle]
./images/ILSVRC2012_val_00010218.raw [tabby]
./images/ILSVRC2012_val_00044076.raw [proboscis monkey]

Test image collection

NPU software stack

QAIRT

QAIRT (Qualcomm ® AI Runtime ) SDK is an integrated system that incorporates Qualcomm ® AI Software packages, including Qualcomm ® AI Engine Direct、Qualcomm ® Neural Processing SDK and Qualcomm ® Genie。 QAIRT provides developers with access to Qualcomm ® All tools required for porting and deploying AI models on hardware accelerators, as well as the runtime for running the models on CPU, GPU, and NPU.

Support reasoning backend

SoC Architecture Comparison Table

SoC	dsp_arch	soc_id
QCS6490	v68	35
QCS9075	v73	77

AIMET

AIMET（AI Model Efficiency Toolkit）is a quantization tool for deep learning models such as PyTorch and ONNX. AIMET improves the performance of deep learning models by reducing their computational load and memory usage. With AIMET, developers can quickly iterate and find the optimal quantization configuration to achieve the best balance between accuracy and latency. Developers can compile and deploy the quantitative model exported by AIMET using QAIRT on Qualcomm NPU, or run it directly using ONNX Runtime.

QAI-APPBUILDER

Quick AI Application Builder (QAI AppBuilder) can help developers easily use Qualcomm based software® AI Runtime SDK is equipped with Qualcomm® Hexagon™ Processor (NPU) from Deploy AI models and design AI applications on the Qualcomm® SoC platform. It encapsulates the model deployment API into a simplified set of interfaces for loading the model into NPU and performing inference. QAI AppBuilder greatly reduces the complexity of deploying models for developers and provides multiple demos for developers to reference in designing their own AI applications.

QAI-Hub

Qualcomm® AI Hub （QAI-Hub）is a one-stop model conversion cloud platform that provides online model compilation, model quantification, model performance analysis, model inference, and model download services. Qualcomm® AI Hub automatically processes the model transformation from pre trained models to device runtime, and the system automatically configures devices in the cloud for performance analysis and inference on the devices. Among them,Qualcomm® AI Hub Models(QAI-Hub-Models) Based on the cloud services provided by QAI-Hub, it supports online quantification, compilation, inference, analysis, and download of models in the model list on cloud devices in a command-line manner.

AI Applications

Qualcomm Model Deployment