NPU Application Scenarios

2025-05-20

The QuecPi Alpha single-board computer utilizes NPU resources for applications including image classification, object detection, and semantic segmentation. This document demonstrates image classification with the example of the ResNeXt50 machine learning model on the Imagenet dataset, which can also serve as a backbone for building more complex models for specific use.

Download Quantitative Model

Deploy Model

Deploy the model and labels to QuecPi Alpha, and deploy both of them to the /opt directory via adb or scp.
Model Specifications:

Checkpoint: Imagenet
Input Resolution: 224x224
Parameter Count: 88.7M
Model Size: 87.3 MB

Connect Display

Prepare and connect a display to QuecPi Alpha.

Execute Commands

Run the following command in the device terminal to ensure results appear on the connected display:

export XDG_RUNTIME_DIR=/dev/socket/weston && export WAYLAND_DISPLAY=wayland-1

Execute the following command on the device:

gst-launch-1.0 -e --gst-debug=2 filesrc location=/opt/video11.mp4 ! qtdemux ! queue ! h264parse ! v4l2h264dec capture-io-mode=5 output-io-mode=5 ! queue ! tee name=split split. ! queue ! qtivcomposer name=mixer sink_1::position="<30, 30>" sink_1::dimensions="<640, 360>" ! queue ! waylandsink sync=true fullscreen=true split. ! queue ! qtimlvconverter ! queue ! qtimltflite delegate=external external-delegate-path=libQnnTFLiteDelegate.so external-delegate-options="QNNExternalDelegate,backend_type=htp;" model=/opt/resnext50_quantized.tflite ! queue ! qtimlvclassification threshold=35.0 results=5 module=mobilenet labels=/opt/imagenet_labels.txt extra-operation=softmax constants="Resnetnet,q-offsets=<30.0>,q-scales=<0.06314703077077866>;" ! video/x-raw,format=BGRA,width=256,height=144 ! queue ! mixer.

Command Explanation:

gst-launch-1.0: A GStreamer command-line tool for initiating a GStreamer pipeline. The -e parameter ensures the exit after pipeline completion rather than remaining idle.
--gst-debug=2: Sets debug level to 2 for debugging output.
filesrc location=/opt/video.mp4: Specifies input video file path.
!: A connection symbol that connects the output of the previous element to the input of the next element.
qtdemux: Demultiplexes input video into multiple streams.
queue: Buffers the input data.
h264parse: Parses input data into H.264 format.
v4l2h264dec: Decodes H.264 type data to YUV format.
capture-io-mode=5: Configures the memory access mode of the decoder input port (Capture Side).
output-io-mode=5: Configures the memory access mode of the decoder output port (Output Side).
tee name=split: Splits the input data into two branches.
split.: Primary branch for raw video stream display.
qtivcomposer: Composes multiple video streams.
waylandsink: Displays composited video streams.
split.: Secondary branch for image classification.
qtimlvconverter: Converts input data to model input format .
qtimltflite: Executes model inference.
delegate=external: Using an external delegate (i.e., non-TFLite default CPU backend) to execute model inference, typically employed to invoke hardware accelerators..
external-delegate-path=libQnnTFLiteDelegate.so: Specifies the external delegate path of specified TensorFlow Lite.
external-delegate-options="QNNExternalDelegate,backend_type=htp;": Configures the option of the external delegate; here is an example of HTP (High-Throughput Processing) backend.
model=/opt/resnext50_quantized.tflite: Specifies the model file path.
qtimlvclassification: Processes classification results of the model output.
video/x-raw,format=BGRA,width=256,height=144: Output video specifications including format, width and height.
mixer: Combines the output of the two branches.
threshold=35.0: Sets the confidence threshold to filter out low-confidence classification results.
results=5: Sets the number of classification results to return.
module=mobilenet: Specifies the model identifier.
labels=/opt/imagenet_labels.txt: Specifies the label file path.
extra-operation=softmax: Applies softmax normalization to the model output to get the probability of each class.
constants="Resnetnet,q-offsets=<30.0>,q-scales=<0.06314703077077866>;": Specifies the quantization parameters.
gst-launch-1.0 -e --gst-debug=2: Sets debug level to 2 for debugging output.
video/x-raw,format=BGRA,width=256,height=144: Output video specifications including format, width and height.
Resnetnet,q-offsets=<30.0>,q-scales=<0.06314703077077866>: Model quantization parameters.

Upon execution, the display will show both video stream and classification results.

GPU Performance Test

Preparation of the QuecPi Alpha Single-Board Computer