Qualcomm Model Deployment
Qualcomm provides the Python library for qai_hub_madels, which allows for easy conversion, quantization, and export of models into BIN files that can be loaded by Qualcomm NPU. It also supports the use of Qualcomm online virtual devices for inference and validation of models.
Preparation work
- Install qai_hub_models
pip3 install qai_hub_models
- Configure API Token
Please register an account on Qualcomm® AI Hub and log in to obtain the user API Token
export PATH=~/.local/bin/:$PATH
qai-hub configure --api_token <API_TOKEN>
Model Core Information
The models supported by qai_hub_madels include Compute Vision, Multinodal, Audio, and Generative AI, which are the main categories. Please refer to the table below for details
Computer Vision
| Model | README |
|---|---|
| Qualcomm Model Download Center | Github repository address |
| Image Classification | |
| Beit | qai_hub_models.models.beit |
| ConvNext-Base | qai_hub_models.models.convnext_base |
| ConvNext-Tiny | qai_hub_models.models.convnext_tiny |
| DLA-102-X | qai_hub_models.models.dla102x |
| DenseNet-121 | qai_hub_models.models.densenet121 |
| EfficientFormer | qai_hub_models.models.efficientformer |
| EfficientNet-B0 | qai_hub_models.models.efficientnet_b0 |
| EfficientNet-B4 | qai_hub_models.models.efficientnet_b4 |
| EfficientNet-V2-s | qai_hub_models.models.efficientnet_v2_s |
| EfficientViT-b2-cls | qai_hub_models.models.efficientvit_b2_cls |
| EfficientViT-l2-cls | qai_hub_models.models.efficientvit_l2_cls |
| GoogLeNet | qai_hub_models.models.googlenet |
| Inception-v3 | qai_hub_models.models.inception_v3 |
| LeViT | qai_hub_models.models.levit |
| MNASNet05 | qai_hub_models.models.mnasnet05 |
| Mobile-VIT | qai_hub_models.models.mobile_vit |
| MobileNet-v2 | qai_hub_models.models.mobilenet_v2 |
| MobileNet-v3-Large | qai_hub_models.models.mobilenet_v3_large |
| MobileNet-v3-Small | qai_hub_models.models.mobilenet_v3_small |
| NASNet | qai_hub_models.models.nasnet |
| RegNet | qai_hub_models.models.regnet |
| ResNeXt101 | qai_hub_models.models.resnext101 |
| ResNeXt50 | qai_hub_models.models.resnext50 |
| ResNet101 | qai_hub_models.models.resnet101 |
| ResNet18 | qai_hub_models.models.resnet18 |
| ResNet50 | qai_hub_models.models.resnet50 |
| Sequencer2D | qai_hub_models.models.sequencer2d |
| Shufflenet-v2 | qai_hub_models.models.shufflenet_v2 |
| SqueezeNet-1.1 | qai_hub_models.models.squeezenet1_1 |
| Swin-Base | qai_hub_models.models.swin_base |
| Swin-Small | qai_hub_models.models.swin_small |
| Swin-Tiny | qai_hub_models.models.swin_tiny |
| VIT | qai_hub_models.models.vit |
| WideResNet50 | qai_hub_models.models.wideresnet50 |
| Image Editing | |
| AOT-GAN | qai_hub_models.models.aotgan |
| LaMa-Dilated | qai_hub_models.models.lama_dilated |
| Image Generation | |
| Simple-Bev | qai_hub_models.models.simple_bev_cam |
| Super Resolution | |
| ESRGAN | qai_hub_models.models.esrgan |
| QuickSRNetLarge | qai_hub_models.models.quicksrnetlarge |
| QuickSRNetMedium | qai_hub_models.models.quicksrnetmedium |
| QuickSRNetSmall | qai_hub_models.models.quicksrnetsmall |
| Real-ESRGAN-General-x4v3 | qai_hub_models.models.real_esrgan_general_x4v3 |
| Real-ESRGAN-x4plus | qai_hub_models.models.real_esrgan_x4plus |
| SESR-M5 | qai_hub_models.models.sesr_m5 |
| XLSR | qai_hub_models.models.xlsr |
| Semantic Segmentation | |
| BGNet | qai_hub_models.models.bgnet |
| BiseNet | qai_hub_models.models.bisenet |
| DDRNet23-Slim | qai_hub_models.models.ddrnet23_slim |
| DeepLabV3-Plus-MobileNet | qai_hub_models.models.deeplabv3_plus_mobilenet |
| DeepLabV3-ResNet50 | qai_hub_models.models.deeplabv3_resnet50 |
| DeepLabXception | qai_hub_models.models.deeplab_xception |
| EfficientViT-l2-seg | qai_hub_models.models.efficientvit_l2_seg |
| FCN-ResNet50 | qai_hub_models.models.fcn_resnet50 |
| FFNet-122NS-LowRes | qai_hub_models.models.ffnet_122ns_lowres |
| FFNet-40S | qai_hub_models.models.ffnet_40s |
| FFNet-54S | qai_hub_models.models.ffnet_54s |
| FFNet-78S | qai_hub_models.models.ffnet_78s |
| FFNet-78S-LowRes | qai_hub_models.models.ffnet_78s_lowres |
| FastSam-S | qai_hub_models.models.fastsam_s |
| FastSam-X | qai_hub_models.models.fastsam_x |
| HRNet-W48-OCR | qai_hub_models.models.hrnet_w48_ocr |
| Mask2Former | qai_hub_models.models.mask2former |
| MediaPipe-Selfie-Segmentation | qai_hub_models.models.mediapipe_selfie |
| MobileSam | qai_hub_models.models.mobilesam |
| PidNet | qai_hub_models.models.pidnet |
| SINet | qai_hub_models.models.sinet |
| SalsaNext | qai_hub_models.models.salsanext |
| Segformer-Base | qai_hub_models.models.segformer_base |
| Segment-Anything-Model-2 | qai_hub_models.models.sam2 |
| Unet-Segmentation | qai_hub_models.models.unet_segmentation |
| YOLOv11-Segmentation | qai_hub_models.models.yolov11_seg |
| YOLOv8-Segmentation | qai_hub_models.models.yolov8_seg |
| Video | Classification |
| ResNet-2Plus1D | qai_hub_models.models.resnet_2plus1d |
| ResNet-3D | qai_hub_models.models.resnet_3d |
| ResNet-Mixed-Convolution | qai_hub_models.models.resnet_mixed |
| Video-MAE | qai_hub_models.models.video_mae |
| Video Generation | |
| First-Order-Motion-Model | qai_hub_models.models.fomm |
| Object Detection | |
| 3D-Deep-BOX | qai_hub_models.models.deepbox |
| Conditional-DETR-ResNet50 | qai_hub_models.models.conditional_detr_resnet50 |
| DETR-ResNet101 | qai_hub_models.models.detr_resnet101 |
| DETR-ResNet101-DC5 | qai_hub_models.models.detr_resnet101_dc5 |
| DETR-ResNet50 | qai_hub_models.models.detr_resnet50 |
| DETR-ResNet50-DC5 | qai_hub_models.models.detr_resnet50_dc5 |
| Facial-Attribute-Detection | qai_hub_models.models.face_attrib_net |
| Lightweight-Face-Detection | qai_hub_models.models.face_det_lite |
| MediaPipe-Face-Detection | qai_hub_models.models.mediapipe_face |
| MediaPipe-Hand-Detection | qai_hub_models.models.mediapipe_hand |
| PPE-Detection | qai_hub_models.models.gear_guard_net |
| Person-Foot-Detection | qai_hub_models.models.foot_track_net |
| RF-DETR | qai_hub_models.models.rf_detr |
| RTMDet | qai_hub_models.models.rtmdet |
| YOLOv10-Detection | qai_hub_models.models.yolov10_det |
| YOLOv11-Detection | qai_hub_models.models.yolov11_det |
| YOLOv8-Detection | qai_hub_models.models.yolov8_det |
| Yolo-X | qai_hub_models.models.yolox |
| Yolo-v3 | qai_hub_models.models.yolov3 |
| Yolo-v5 | qai_hub_models.models.yolov5 |
| Yolo-v6 | qai_hub_models.models.yolov6 |
| Yolo-v7 | qai_hub_models.models.yolov7 |
| Pose Estimation | |
| Facial-Landmark-Detection | qai_hub_models.models.facemap_3dmm |
| HRNetPose | qai_hub_models.models.hrnet_pose |
| LiteHRNet | qai_hub_models.models.litehrnet |
| MediaPipe-Pose-Estimation | qai_hub_models.models.mediapipe_pose |
| Movenet | qai_hub_models.models.movenet |
| Posenet-Mobilenet | qai_hub_models.models.posenet_mobilenet |
| RTMPose-Body2d | qai_hub_models.models.rtmpose_body2d |
| Depth Estimation | |
| Depth-Anything | qai_hub_models.models.depth_anything |
| Depth-Anything-V2 | qai_hub_models.models.depth_anything_v2 |
| Midas-V2 | qai_hub_models.models.midas |
Multimodal
| Model | README |
|---|---|
| EasyOCR | qai_hub_models.models.easyocr |
| Nomic-Embed-Text | qai_hub_models.models.nomic_embed_text |
| OpenAI-Clip | qai_hub_models.models.openai_clip |
| TrOCR | qai_hub_models.models.trocr |
Audio
| Model | README |
|---|---|
| Speech Recognition | |
| HuggingFace-WavLM-Base-Plus | qai_hub_models.models.huggingface_wavlm_base_plus |
| Whisper-Base | qai_hub_models.models.whisper_base |
| Whisper-Large-V3-Turbo | qai_hub_models.models.whisper_large_v3_turbo |
| Whisper-Small | qai_hub_models.models.whisper_small |
| Whisper-Tiny | qai_hub_models.models.whisper_tiny |
| Audio Classification | |
| YamNet | qai_hub_models.models.yamnet |
Generative AI
| Model | README |
|---|---|
| Image Generation | |
| ControlNet-Canny | qai_hub_models.models.controlnet_canny |
| Stable-Diffusion-v1.5 | qai_hub_models.models.stable_diffusion_v1_5 |
| Stable-Diffusion-v2.1 | qai_hub_models.models.stable_diffusion_v2_1 |
| Text Generation | |
| ALLaM-7B | qai_hub_models.models.allam_7b |
| Baichuan2-7B | qai_hub_models.models.baichuan2_7b |
| Falcon3-7B-Instruct | qai_hub_models.models.falcon_v3_7b_instruct |
| IBM-Granite-v3.1-8B-Instruct | qai_hub_models.models.ibm_granite_v3_1_8b_instruct |
| IndusQ-1.1B | qai_hub_models.models.indus_1b |
| JAIS-6p7b-Chat | qai_hub_models.models.jais_6p7b_chat |
| Llama-SEA-LION-v3.5-8B-R | qai_hub_models.models.llama_v3_1_sea_lion_3_5_8b_r |
| Llama-v2-7B-Chat | qai_hub_models.models.llama_v2_7b_chat |
| Llama-v3-8B-Instruct | qai_hub_models.models.llama_v3_8b_instruct |
| Llama-v3.1-8B-Instruct | qai_hub_models.models.llama_v3_1_8b_instruct |
| Llama-v3.2-1B-Instruct | qai_hub_models.models.llama_v3_2_1b_instruct |
| Llama-v3.2-3B-Instruct | qai_hub_models.models.llama_v3_2_3b_instruct |
| Llama3-TAIDE-LX-8B-Chat-Alpha1 | qai_hub_models.models.llama_v3_taide_8b_chat |
| Ministral-3B | qai_hub_models.models.ministral_3b |
| Mistral-3B | qai_hub_models.models.mistral_3b |
| Mistral-7B-Instruct-v0.3 | qai_hub_models.models.mistral_7b_instruct_v0_3 |
| PLaMo-1B | qai_hub_models.models.plamo_1b |
| Phi-3.5-Mini-Instruct | qai_hub_models.models.phi_3_5_mini_instruct |
| Qwen2-7B-Instruct | qai_hub_models.models.qwen2_7b_instruct |
| Qwen2.5-7B-Instruct | qai_hub_models.models.qwen2_5_7b_instruct |
Model compilation
Taking real_esrgan-x4plus as an example
export PRODUCT_CHIP=qualcomm-qcs6490-proxy
python3 -m qai_hub_models.models.real_esrgan_x4plus.export --chipset ${PRODUCT_CHIP} \
--target-runtime qnn_context_binary --height 128 --width 128 --quantize w8a8 \
--num-calibration-samples 10
- --chipset: Specifies the target chip to run on
- --target: Runtime specifies the target runtime
- --Height: Target Model Input Height
- --width: Target model input width
- --quantize: Specifies the quantization method
- --num-calibration-samples: Specifies the number of quantization calibration set images
The above command generates a mode id and a file named real_ esrgan-x4plus. qnd_text-binary, which is the file for running the model on the target chip.
Run demo
python3 -m qai_hub_models.models.real_esrgan_x4plus.demo --eval-mode on-device --hub-model-id <mode-id> --chipset ${PRODUCT_CHIP}
- Please modify the hub-model-id parameter to the final printed hub-model-id of the model compilation result


Use the NPU of the local device for inference verification
Reference NPU Development Guide
Q&A
How to use models to develop app ?
Qualcomm provides the ai-engine-direct-helper SDK, which provides Python and C++ interfaces for users to develop apps, load models, and run inference. For details, please refer to ai-engine-direct-helper。