Skip to content

General Image Classification Pipeline Tutorial

1. Introduction to the General Image Classification Pipeline

Image classification is a technique that assigns images to predefined categories. It is widely applied in object recognition, scene understanding, and automatic annotation. Image classification can identify various objects such as animals, plants, traffic signs, and categorize them based on their features. By leveraging deep learning models, image classification can automatically extract image features and perform accurate classification.This pipeline also offers a flexible service-oriented deployment approach, supporting the use of multiple programming languages on various hardware platforms. Moreover, this pipeline provides the capability for secondary development. You can train and optimize models on your own dataset based on this pipeline, and the trained models can be seamlessly integrated.

The General Image Classification Pipeline includes an image classification module. If you prioritize model accuracy, choose a model with higher accuracy. If you prioritize inference speed, select a model with faster inference. If you prioritize model storage size, choose a model with a smaller storage size.

ModelModel Download Link Top1 Acc(%) GPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
CPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
Model Storage Size (M) Description
CLIP_vit_base_patch16_224Inference Model/Trained Model 85.36 12.84 / 2.82 60.52 / 60.52 306.5 M The general high-precision image classification model of the large visual model CLIP fine-tuned on the ImageNet1k dataset
MobileNetV3_small_x1_0Inference Model/Trained Model 68.2 3.76 / 0.53 5.11 / 1.43 10.5 M MobileNetV3 is a new lightweight network based on NAS proposed by Google in 2019. To further improve performance, the relu and sigmoid activation functions are replaced with hard_swish and hard_sigmoid activation functions, respectively. Additionally, several strategies specifically aimed at reducing the computational load of the network have been introduced.
PP-HGNet_smallInference Model/Trained Model 81.51 5.12 / 1.73 25.01 / 25.01 86.5 M PP-HGNet (High Performance GPU Net) is a high-performance backbone network developed by the Baidu PaddlePaddle Vision Team, specifically optimized for GPU platforms. This network builds upon VOVNet and incorporates a learnable downsampling layer (LDS Layer), integrating the advantages of models such as ResNet_vd and PPHGNet. The model achieves higher accuracy compared to other state-of-the-art (SOTA) models at the same speed on GPU platforms.
PP-HGNetV2-B0Inference Model/Trained Model 77.77 3.83 / 0.57 9.95 / 2.37 21.4 M PP-HGNetV2 (High Performance GPU Network V2) is the next-generation version of PP-HGNet developed by the Baidu PaddlePaddle Vision Team. Building upon PP-HGNet, it has been further optimized and improved. Ultimately, on NVIDIA GPU devices, it achieves an extreme "Accuracy-Latency Balance," with accuracy significantly surpassing other models of the same inference speed.
PP-HGNetV2-B4Inference Model/Trained Model 83.57 5.47 / 1.10 14.42 / 9.89 70.4 M
PP-HGNetV2-B6Inference Model/Trained Model 86.30 12.25 / 3.76 62.29 / 62.29 268.4 M
PP-LCNet_x1_0Inference Model/Trained Model 71.32 2.35 / 0.47 4.03 / 1.35 10.5 M PP-LCNet_x1_0 is designed with a specific backbone network for Intel CPU devices and their acceleration library MKLDNN. Compared to other lightweight state-of-the-art (SOTA) models, this backbone network can further enhance model performance without increasing inference time, ultimately significantly surpassing existing SOTA models.
ResNet50Inference Model/Trained Model 76.5 6.44 / 1.16 15.04 / 11.63 90.8 M The ResNet series of models was proposed in 2015 and won the championship in the ILSVRC2015 competition with a top-5 error rate of 3.57%. The network innovatively introduced the residual structure, and by stacking multiple residual structures, the ResNet network was constructed.
SwinTransformer_tiny_patch4_window7_224Inference Model/Trained Model 81.10 6.66 / 2.15 60.45 / 60.45 100.1 M SwinTransformer is a new type of visual Transformer network that can be used as a general-purpose backbone network in the field of computer vision. SwinTransformer consists of a hierarchical Transformer structure represented by shifted windows. The shifted windows confine the self-attention computation to non-overlapping local windows while allowing cross-window connections, thereby enhancing the network's performance.

❗ The above list features the 9 core models that the image classification module primarily supports. In total, this module supports 80 models. The complete list of models is as follows:

👉Details of Model List
ModelModel Download Link Top-1 Accuracy (%) GPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
CPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
Model Size (M) Description
CLIP_vit_base_patch16_224Inference Model/Trained Model 85.36 12.84 / 2.82 60.52 / 60.52 306.5 M CLIP is an image classification model based on the correlation between vision and language. It adopts contrastive learning and pre-training methods to achieve unsupervised or weakly supervised image classification, especially suitable for large-scale datasets. By mapping images and texts into the same representation space, the model learns general features, exhibiting good generalization ability and interpretability. With relatively good training errors, it performs well in many downstream tasks.
CLIP_vit_large_patch14_224Inference Model/Trained Model 88.1 51.72 / 11.13 238.07 / 238.07 1.04 G
ConvNeXt_base_224Inference Model/Trained Model 83.84 13.18 / 12.14 128.39 / 81.78 313.9 M The ConvNeXt series of models were proposed by Meta in 2022, based on the CNN architecture. This series of models builds upon ResNet, incorporating the advantages of SwinTransformer, including training strategies and network structure optimization ideas, to improve the pure CNN architecture network. It explores the performance limits of convolutional neural networks. The ConvNeXt series of models possesses many advantages of convolutional neural networks, including high inference efficiency and ease of migration to downstream tasks.
ConvNeXt_base_384Inference Model/Trained Model 84.90 32.15 / 30.52 279.36 / 220.35 313.9 M
ConvNeXt_large_224Inference Model/Trained Model 84.26 26.51 / 7.21 213.32 / 157.22 700.7 M
ConvNeXt_large_384Inference Model/Trained Model 85.27 67.07 / 65.26 494.04 / 438.97 700.7 M
ConvNeXt_smallInference Model/Trained Model 83.13 9.05 / 8.21 97.94 / 55.29 178.0 M
ConvNeXt_tinyInference Model/Trained Model 82.03 5.12 / 2.06 63.96 / 29.77 104.1 M
FasterNet-LInference Model/Trained Model 83.5 15.67 / 3.10 52.24 / 52.24 357.1 M FasterNet is a neural network designed to improve runtime speed. Its key improvements are as follows:
1. Re-examined popular operators and found that low FLOPS mainly stem from frequent memory accesses, especially in depthwise convolutions;
2. Proposed Partial Convolution (PConv) to extract image features more efficiently by reducing redundant computations and memory accesses;
3. Launched the FasterNet series of models based on PConv, a new design scheme that achieves significantly higher runtime speeds on various devices without compromising model task performance.
FasterNet-MInference Model/Trained Model 83.0 9.72 / 2.30 35.29 / 35.29 204.6 M
FasterNet-SInference Model/Trained Model 81.3 5.46 / 1.27 20.46 / 18.03 119.3 M
FasterNet-T0Inference Model/Trained Model 71.9 4.18 / 0.60 6.34 / 3.44 15.1 M
FasterNet-T1Inference Model/Trained Model 75.9 4.24 / 0.64 9.57 / 5.20 29.2 M
FasterNet-T2Inference Model/Trained Model 79.1 3.87 / 0.78 11.14 / 9.98 57.4 M
MobileNetV1_x0_5Inference Model/Trained Model 63.5 1.39 / 0.28 2.74 / 1.02 4.8 M MobileNetV1 is a network released by Google in 2017 for mobile devices or embedded devices. This network decomposes traditional convolution operations into depthwise separable convolutions, which are a combination of Depthwise convolution and Pointwise convolution. Compared to traditional convolutional networks, this combination can significantly reduce the number of parameters and computations. Additionally, this network can be used for image classification and other vision tasks.
MobileNetV1_x0_25Inference Model/Trained Model 51.4 1.32 / 0.30 2.04 / 0.58 1.8 M
MobileNetV1_x0_75Inference Model/Trained Model 68.8 1.75 / 0.33 3.41 / 1.57 9.3 M
MobileNetV1_x1_0Inference Model/Trained Model 71.0 1.89 / 0.34 4.01 / 2.17 15.2 M
MobileNetV2_x0_5Inference Model/Trained Model 65.0 3.17 / 0.48 4.52 / 1.35 7.1 M MobileNetV2 is a lightweight network proposed by Google following MobileNetV1. Compared to MobileNetV1, MobileNetV2 introduces Linear bottlenecks and Inverted residual blocks as the basic structure of the network. By stacking these basic modules extensively, the network structure of MobileNetV2 is formed. Finally, it achieves higher classification accuracy with only half the FLOPs of MobileNetV1.
MobileNetV2_x0_25Inference Model/Trained Model 53.2 2.80 / 0.46 3.92 / 0.98 5.5 M
MobileNetV2_x1_0Inference Model/Trained Model 72.2 3.57 / 0.49 5.63 / 2.51 12.6 M
MobileNetV2_x1_5Inference Model/Trained Model 74.1 3.58 / 0.62 8.02 / 4.49 25.0 M
MobileNetV2_x2_0Inference Model/Trained Model 75.2 3.56 / 0.74 10.24 / 6.83 41.2 M
MobileNetV3_large_x0_5Inference Model/Trained Model 69.2 3.79 / 0.62 6.76 / 1.61 9.6 M MobileNetV3 is a NAS-based lightweight network proposed by Google in 2019. To further enhance performance, relu and sigmoid activation functions are replaced with hard_swish and hard_sigmoid activation functions, respectively. Additionally, some improvement strategies specifically designed to reduce network computations are introduced.
MobileNetV3_large_x0_35Inference Model/Trained Model 64.3 3.70 / 0.60 5.54 / 1.41 7.5 M
MobileNetV3_large_x0_75Inference Model/Trained Model 73.1 4.82 / 0.66 7.45 / 2.00 14.0 M
MobileNetV3_large_x1_0Inference Model/Trained Model 75.3 4.86 / 0.68 6.88 / 2.61 19.5 M
MobileNetV3_large_x1_25Inference Model/Trained Model 76.4 5.08 / 0.71 7.37 / 3.58 26.5 M
MobileNetV3_small_x0_5Inference Model/Trained Model 59.2 3.41 / 0.57 5.60 / 1.14 6.8 M
MobileNetV3_small_x0_35Inference Model/Trained Model 53.0 3.49 / 0.60 4.63 / 1.07 6.0 M
MobileNetV3_small_x0_75Inference Model/Trained Model 66.0 3.49 / 0.60 5.19 / 1.28 8.5 M
MobileNetV3_small_x1_0Inference Model/Trained Model 68.2 3.76 / 0.53 5.11 / 1.43 10.5 M
MobileNetV3_small_x1_25Inference Model/Trained Model 70.7 4.23 / 0.58 6.48 / 1.68 13.0 M
MobileNetV4_conv_largeInference Model/Trained Model 83.4 8.33 / 2.24 33.56 / 23.70 125.2 M MobileNetV4 is an efficient architecture specifically designed for mobile devices. Its core lies in the introduction of the UIB (Universal Inverted Bottleneck) module, a unified and flexible structure that integrates IB (Inverted Bottleneck), ConvNeXt, FFN (Feed Forward Network), and the latest ExtraDW (Extra Depthwise) module. Alongside UIB, Mobile MQA, a customized attention block for mobile accelerators, was also introduced, achieving up to 39% significant acceleration. Furthermore, MobileNetV4 introduces a novel Neural Architecture Search (NAS) scheme to enhance the effectiveness of the search process.
MobileNetV4_conv_mediumInference Model/Trained Model 79.9 6.81 / 0.92 12.47 / 6.27 37.6 M
MobileNetV4_conv_smallInference Model/Trained Model 74.6 3.25 / 0.46 4.42 / 1.54 14.7 M
MobileNetV4_hybrid_largeInference Model/Trained Model 83.8 12.27 / 4.18 58.64 / 58.64 145.1 M
MobileNetV4_hybrid_mediumInference Model/Trained Model 80.5 12.08 / 1.34 24.69 / 8.10 42.9 M
PP-HGNet_baseInference Model/Trained Model 85.0 14.10 / 4.19 68.92 / 68.92 249.4 M PP-HGNet (High Performance GPU Net) is a high-performance backbone network developed by Baidu PaddlePaddle's vision team, tailored for GPU platforms. This network combines the fundamentals of VOVNet with learnable downsampling layers (LDS Layer), incorporating the advantages of models such as ResNet_vd and PPHGNet. On GPU platforms, this model achieves higher accuracy compared to other SOTA models at the same speed. Specifically, it outperforms ResNet34-0 by 3.8 percentage points and ResNet50-0 by 2.4 percentage points. Under the same SLSD conditions, it ultimately surpasses ResNet50-D by 4.7 percentage points. Additionally, at the same level of accuracy, its inference speed significantly exceeds that of mainstream Vision Transformers.
PP-HGNet_smallInference Model/Trained Model 81.51 5.12 / 1.73 25.01 / 25.01 86.5 M
PP-HGNet_tinyInference Model/Trained Model 79.83 3.28 / 1.29 16.40 / 15.97 52.4 M
PP-HGNetV2-B0Inference Model/Trained Model 77.77 3.83 / 0.57 9.95 / 2.37 21.4 M PP-HGNetV2 (High Performance GPU Network V2) is the next-generation version of Baidu PaddlePaddle's PP-HGNet, featuring further optimizations and improvements upon its predecessor. It pushes the limits of NVIDIA's "Accuracy-Latency Balance," significantly outperforming other models with similar inference speeds in terms of accuracy. It demonstrates strong performance across various label classification and evaluation scenarios.
PP-HGNetV2-B1Inference Model/Trained Model 79.18 3.87 / 0.62 8.77 / 3.79 22.6 M
PP-HGNetV2-B2Inference Model/Trained Model 81.74 5.73 / 0.86 15.11 / 7.05 39.9 M
PP-HGNetV2-B3Inference Model/Trained Model 82.98 6.26 / 1.01 18.47 / 10.34 57.9 M
PP-HGNetV2-B4Inference Model/Trained Model 83.57 5.47 / 1.10 14.42 / 9.89 70.4 M
PP-HGNetV2-B5Inference Model/Trained Model 84.75 10.24 / 1.96 29.71 / 29.71 140.8 M
PP-HGNetV2-B6Inference Model/Trained Model 86.30 12.25 / 3.76 62.29 / 62.29 268.4 M
PP-LCNet_x0_5Inference Model/Trained Model 63.14 2.28 / 0.42 2.86 / 0.83 6.7 M PP-LCNet is a lightweight backbone network developed by Baidu PaddlePaddle's vision team. It enhances model performance without increasing inference time, significantly surpassing other lightweight SOTA models.
PP-LCNet_x0_25Inference Model/Trained Model 51.86 1.89 / 0.45 2.49 / 0.68 5.5 M
PP-LCNet_x0_35Inference Model/Trained Model 58.09 1.94 / 0.41 2.73 / 0.77 5.9 M
PP-LCNet_x0_75Inference Model/Trained Model 68.18 2.30 / 0.41 2.95 / 1.07 8.4 M
PP-LCNet_x1_0Inference Model/Trained Model 71.32 2.35 / 0.47 4.03 / 1.35 10.5 M
PP-LCNet_x1_5Inference Model/Trained Model 73.71 2.33 / 0.53 4.17 / 2.29 16.0 M
PP-LCNet_x2_0Inference Model/Trained Model 75.18 2.40 / 0.51 5.37 / 3.46 23.2 M
PP-LCNet_x2_5Inference Model/Trained Model 76.60 2.36 / 0.61 6.29 / 5.05 32.1 M
PP-LCNetV2_baseInference Model/Trained Model 77.05 3.33 / 0.55 6.86 / 3.77 23.7 M The PP-LCNetV2 image classification model is the next-generation version of PP-LCNet, self-developed by Baidu PaddlePaddle's vision team. Based on PP-LCNet, it has undergone further optimization and improvements, primarily utilizing re-parameterization strategies to combine depthwise convolutions with varying kernel sizes and optimizing pointwise convolutions, Shortcuts, etc. Without using additional data, the PPLCNetV2_base model achieves over 77% Top-1 Accuracy on the ImageNet dataset for image classification, while maintaining an inference time of less than 4.4 ms on Intel CPU platforms.
PP-LCNetV2_large Inference Model/Trained Model 78.51 4.37 / 0.71 9.43 / 8.07 37.3 M
PP-LCNetV2_smallInference Model/Trained Model 73.97 2.53 / 0.41 5.14 / 1.98 14.6 M
ResNet18_vdInference Model/Trained Model 72.3 2.47 / 0.61 6.97 / 5.15 41.5 M The ResNet series of models were introduced in 2015, winning the ILSVRC2015 competition with a top-5 error rate of 3.57%. This network innovatively proposed residual structures, which are stacked to construct the ResNet network. Experiments have shown that using residual blocks can effectively improve convergence speed and accuracy.
ResNet18 Inference Model/Trained Model 71.0 2.35 / 0.67 6.35 / 4.61 41.5 M
ResNet34_vdInference Model/Trained Model 76.0 4.01 / 1.03 11.99 / 9.86 77.3 M
ResNet34Inference Model/Trained Model 74.6 3.99 / 1.02 12.42 / 9.81 77.3 M
ResNet50_vdInference Model/Trained Model 79.1 6.04 / 1.16 16.08 / 12.07 90.8 M
ResNet50Inference Model/Trained Model 76.5 6.44 / 1.16 15.04 / 11.63 90.8 M
ResNet101_vdInference Model/Trained Model 80.2 11.16 / 2.07 32.14 / 32.14 158.4 M
ResNet101Inference Model/Trained Model 77.6 10.91 / 2.06 31.14 / 22.93 158.4 M
ResNet152_vdInference Model/Trained Model 80.6 15.96 / 2.99 49.33 / 49.33 214.3 M
ResNet152Inference Model/Trained Model 78.3 15.61 / 2.90 47.33 / 36.60 214.2 M
ResNet200_vdInference Model/Trained Model 80.9 24.20 / 3.69 62.62 / 62.62 266.0 M
StarNet-S1Inference Model/Trained Model 73.6 6.33 / 1.98 7.56 / 3.26 11.2 M StarNet focuses on exploring the untapped potential of "star operations" (i.e., element-wise multiplication) in network design. It reveals that star operations can map inputs to high-dimensional, nonlinear feature spaces, a process akin to kernel tricks but without the need to expand the network size. Consequently, StarNet, a simple yet powerful prototype network, is further proposed, demonstrating exceptional performance and low latency under compact network structures and limited computational resources.
StarNet-S2 Inference Model/Trained Model 74.8 4.49 / 1.55 7.38 / 3.38 14.3 M
StarNet-S3Inference Model/Trained Model 77.0 6.70 / 1.62 11.05 / 4.76 22.2 M
StarNet-S4Inference Model/Trained Model 79.0 8.50 / 2.86 15.40 / 6.76 28.9 M
SwinTransformer_base_patch4_window7_224Inference Model/Trained Model 83.37 14.29 / 5.13 130.89 / 130.89 310.5 M SwinTransformer is a novel vision Transformer network that can serve as a general-purpose backbone for computer vision tasks. SwinTransformer consists of a hierarchical Transformer structure represented by shifted windows. Shifted windows restrict self-attention computations to non-overlapping local windows while allowing cross-window connections, thereby enhancing network performance.
SwinTransformer_base_patch4_window12_384Inference Model/Trained Model 84.17 37.74 / 10.10 362.56 / 362.56 311.4 M
SwinTransformer_large_patch4_window7_224Inference Model/Trained Model 86.19 26.48 / 7.94 228.23 / 228.23 694.8 M
SwinTransformer_large_patch4_window12_384Inference Model/Trained Model 87.06 74.72 / 18.16 652.04 / 652.04 696.1 M
SwinTransformer_small_patch4_window7_224Inference Model/Trained Model 83.21 10.37 / 3.90 94.20 / 94.20 175.6 M
SwinTransformer_tiny_patch4_window7_224Inference Model/Trained Model 81.10 6.66 / 2.15 60.45 / 60.45 100.1 M
**Test Environment Description**: - **Performance Test Environment** - **Test Dataset**: ImageNet-1k validation set. - **Hardware Configuration**: - GPU: NVIDIA Tesla T4 - CPU: Intel Xeon Gold 6271C @ 2.60GHz - Other Environments: Ubuntu 20.04 / cuDNN 8.6 / TensorRT 8.5.2.2 - **Inference Mode Description** | Mode | GPU Configuration | CPU Configuration | Acceleration Technology Combination | |-------------|----------------------------------------|-------------------|---------------------------------------------------| | Normal Mode | FP32 Precision / No TRT Acceleration | FP32 Precision / 8 Threads | PaddleInference | | High-Performance Mode | Optimal combination of pre-selected precision types and acceleration strategies | FP32 Precision / 8 Threads | Pre-selected optimal backend (Paddle/OpenVINO/TRT, etc.) |

2. Quick Start

All model pipelines provided by PaddleX can be quickly experienced. You can experience the general image classification pipeline online in the Star River Community, or you can use the command line or Python locally to experience the effects of the general image classification pipeline.

2.1 Online Experience

You can experience online the effects of the general image classification pipeline using the demo images provided by the official platform, for example:

If you are satisfied with the performance of the pipeline, you can directly integrate and deploy it. You can choose to download the deployment package from the cloud, or refer to the methods in Section 2.2 Local Experience for local deployment. If you are not satisfied with the results, you can fine-tune the models in the pipeline using your private data. If you have local hardware resources for training, you can start training directly on your local machine; if not, the Star River Zero-Code Platform provides a one-click training service. You don't need to write any code—just upload your data and start the training task with one click.

2.2 Local Experience

Before using the general image classification pipeline locally, please ensure that you have completed the installation of the PaddleX wheel package according to the PaddleX Local Installation Guide.

2.2.1 Command Line Experience

You can quickly experience the image classification pipeline with a single command. Use the test image and replace --input with your local path for prediction.

paddlex --pipeline image_classification --input general_image_classification_001.jpg --device gpu:0 --save_path ./output/

The relevant parameter descriptions can be found in the parameter explanation section of 2.2.2 Python Script Integration.

{'res': {'input_path': 'general_image_classification_001.jpg', 'page_index': None, 'class_ids': array([296, 170, 356, 258, 248], dtype=int32), 'scores': array([0.62736, 0.03752, 0.03256, 0.0323 , 0.03194], dtype=float32), 'label_names': ['ice bear, polar bear, Ursus Maritimus, Thalarctos maritimus', 'Irish wolfhound', 'weasel', 'Samoyed, Samoyede', 'Eskimo dog, husky']}}

For the explanation of the running result parameters, you can refer to the result interpretation in Section 2.2.2 Integration via Python Script.

The visualization results are saved under save_path, and the visualization result for image classification is as follows:

Note: Due to network issues, the above URL may not be successfully parsed. If you need the content of this web page, please check the validity of the link and try again. If you do not need the content of this link, you can proceed with the integration as described.

2.2.2 Integration via Python Script

  • The above command line is for quick experience and viewing of results. Generally, in projects, integration through code is often required. You can complete the pipeline's fast inference with just a few lines of code. The inference code is as follows:
from paddlex import create_pipeline

pipeline = create_pipeline(pipeline="image_classification")

output = pipeline.predict("general_image_classification_001.jpg")
for res in output:
    res.print() ## 打印预测的结构化输出
    res.save_to_img(save_path="./output/") ## 保存结果可视化图像
    res.save_to_json(save_path="./output/") ## 保存预测的结构化输出

In the above Python script, the following steps are executed:

(1) The general image classification pipeline object is instantiated via create_pipeline(). The specific parameter descriptions are as follows:

Parameter Parameter Description Parameter Type Default Value
pipeline The name of the pipeline or the path to the pipeline configuration file. If it is the name of a pipeline, it must be supported by PaddleX. str None
config Specific configuration information for the pipeline (if set simultaneously with pipeline, it has higher priority than pipeline, and the pipeline name must be consistent with pipeline). dict[str, Any] None
device The device used for pipeline inference. It supports specifying the specific card number of GPUs, such as "gpu:0", other hardware card numbers, such as "npu:0", and CPUs, such as "cpu". str gpu:0
use_hpip Whether to enable high-performance inference. This is only available if the pipeline supports high-performance inference. bool False

(2) The predict() method of the image classification pipeline object is called to perform inference prediction. This method returns a generator. Below are the parameters and their descriptions for the predict() method:

Parameter Parameter Description Parameter Type Options Default Value
input The data to be predicted. It supports multiple input types and is required. Python Var|str|list
  • Python Var: Image data represented by numpy.ndarray.
  • str: Local path of the image file, such as /root/data/img.jpg; URL link, such as the network URL of the image file: Example; Local directory, which should contain images to be predicted, such as /root/data/.
  • List: Elements of the list must be of the above types, such as [numpy.ndarray, numpy.ndarray], ["/root/data/img1.jpg", "/root/data/img2.jpg"], ["/root/data1", "/root/data2"].
None
device The device used for pipeline inference. str|None
  • CPU: Use CPU for inference, such as cpu.
  • GPU: Use the specified GPU for inference, such as gpu:0 for the first GPU.
  • NPU: Use the specified NPU for inference, such as npu:0 for the first NPU.
  • XPU: Use the specified XPU for inference, such as xpu:0 for the first XPU.
  • MLU: Use the specified MLU for inference, such as mlu:0 for the first MLU.
  • DCU: Use the specified DCU for inference, such as dcu:0 for the first DCU.
  • None: If set to None, the default value from the pipeline initialization will be used. During initialization, it will prioritize the local GPU device 0; if unavailable, it will use the CPU.
None
topk The top topk values of the prediction results. If not specified, the default configuration of the official PaddleX model will be used. int
  • int, such as 5, which means printing (returning) the top 5 classes and their corresponding classification probabilities in the prediction results.
5

(3) Process the prediction results. The prediction result for each sample is of type dict, and supports operations such as printing, saving as an image, and saving as a json file:

Method Method Description Parameter Parameter Type Parameter Description Default Value
print() Print the result to the terminal format_json bool Whether to format the output content using JSON indentation True
indent int Specifies the indentation level to beautify the output JSON data, making it more readable. Only effective when format_json is True 4
ensure_ascii bool Controls whether non-ASCII characters are escaped to Unicode. If set to True, all non-ASCII characters will be escaped; False retains the original characters. Only effective when format_json is True False
save_to_json() Save the result as a JSON file save_path str The file path for saving. When a directory is provided, the saved file name matches the input file name None
indent int Specifies the indentation level to beautify the output JSON data, making it more readable. Only effective when format_json is True 4
ensure_ascii bool Controls whether non-ASCII characters are escaped to Unicode. If set to True, all non-ASCII characters will be escaped; False retains the original characters. Only effective when format_json is True False
save_to_img() Save the result as an image file save_path str The file path for saving, supporting both directory and file paths None

For the explanation of the running result parameters, you can refer to the result interpretation in Section 2.2.2 Integration via Python Script.

The visualization results are saved under save_path, and the visualization result for image classification is as follows:

2.2.2 Integration via Python Script

  • The above command line is for quick experience and viewing of results. Generally, in projects, integration through code is often required. You can complete the pipeline's fast inference with just a few lines of code. The inference code is as follows:

from paddlex import create_pipeline

pipeline = create_pipeline(pipeline="image_classification")

output = pipeline.predict("general_image_classification_001.jpg")
for res in output:
    res.print() ## 打印预测的结构化输出
    res.save_to_img(save_path="./output/") ## 保存结果可视化图像
    res.save_to_json(save_path="./output/") ## 保存预测的结构化输出
In the above Python script, the following steps are executed:

(1) The general image classification pipeline object is instantiated via create_pipeline(). The specific parameter descriptions are as follows:

Parameter Parameter Description Parameter Type Default Value
pipeline The name of the pipeline or the path to the pipeline configuration file. If it is the name of a pipeline, it must be supported by PaddleX. str None
device The device used for pipeline inference. It supports specifying the specific card number of GPUs, such as "gpu:0", other hardware card numbers, such as "npu:0", and CPUs, such as "cpu". str gpu:0
use_hpip Whether to enable high-performance inference. This is only available if the pipeline supports high-performance inference. bool False

(2) The predict() method of the image classification pipeline object is called to perform inference prediction. This method returns a generator. Below are the parameters and their descriptions for the predict() method:

Parameter Parameter Description Parameter Type Options Default Value
input The data to be predicted. It supports multiple input types and is required. Python Var|str|list
  • Python Var: Image data represented by numpy.ndarray.
  • str: Local path of the image file, such as /root/data/img.jpg; URL link, such as the network URL of the image file: Example; Local directory, which should contain images to be predicted, such as /root/data/.
  • List: Elements of the list must be of the above types, such as [numpy.ndarray, numpy.ndarray], ["/root/data/img1.jpg", "/root/data/img2.jpg"], ["/root/data1", "/root/data2"].
None
device The device used for pipeline inference. str|None
  • CPU: Use CPU for inference, such as cpu.
  • GPU: Use the specified GPU for inference, such as gpu:0 for the first GPU.
  • NPU: Use the specified NPU for inference, such as npu:0 for the first NPU.
  • XPU: Use the specified XPU for inference, such as xpu:0 for the first XPU.
  • MLU: Use the specified MLU for inference, such as mlu:0 for the first MLU.
  • DCU: Use the specified DCU for inference, such as dcu:0 for the first DCU.
  • None: If set to None, the default value from the pipeline initialization will be used. During initialization, it will prioritize the local GPU device 0; if unavailable, it will use the CPU.
None
topk The top topk values of the prediction results. If not specified, the default configuration of the official PaddleX model will be used. int
  • int, such as 5, which means printing (returning) the top 5 classes and their corresponding classification probabilities in the prediction results.
5

(3) Process the prediction results. The prediction result for each sample is of type dict, and supports operations such as printing, saving as an image, and saving as a json file:

Method Description Parameter Type Description Default Value
print() Print the result to the terminal format_json bool Whether to format the output content using JSON indentation True
indent int Specify the indentation level to beautify the output JSON data, making it more readable. Only effective when format_json is True 4
ensure_ascii bool Control whether to escape non-ASCII characters to Unicode. When set to True, all non-ASCII characters will be escaped; False retains the original characters. Only effective when format_json is True False
save_to_json() Save the result as a JSON file save_path str The file path for saving. When a directory is provided, the saved file name will match the input file name None
indent int Specify the indentation level to beautify the output JSON data, making it more readable. Only effective when format_json is True 4
ensure_ascii bool Control whether to escape non-ASCII characters to Unicode. When set to True, all non-ASCII characters will be escaped; False retains the original characters. Only effective when format_json is True False
save_to_img() Save the result as an image file save_path str The file path for saving, supporting both directory and file paths None
  • Calling the print() method will print the results to the terminal, with the following explanations for the printed content:

    • input_path: (str) The input path of the image to be predicted.
    • page_index: (Union[int, None]) If the input is a PDF file, it indicates the current page number of the PDF; otherwise, it is None.
    • class_ids: (List[numpy.ndarray]) The class IDs of the prediction results.
    • scores: (List[numpy.ndarray]) The confidence scores of the prediction results.
    • label_names: (List[str]) The names of the predicted classes.
  • Calling the save_to_json() method will save the above content to the specified save_path. If a directory is specified, the saved path will be save_path/{your_img_basename}_res.json. If a file is specified, the results will be saved directly to that file. Since JSON files do not support saving NumPy arrays, numpy.array types will be converted to lists.

  • Calling the save_to_img() method will save the visualized results to the specified save_path. If a directory is specified, the saved path will be save_path/{your_img_basename}_res.{your_img_extension}. If a file is specified, the results will be saved directly to that file. (It is not recommended to specify a specific file path directly, as multiple images will be overwritten, leaving only the last one.)

  • Additionally, you can access the visualized image with results and the prediction results through attributes, as follows:

Attribute Description
json Get the prediction results in json format.
img Get the visualized image in dict format.
  • The prediction results obtained through the json attribute are of type dict, with content consistent with what is saved by calling the save_to_json() method.
  • The prediction results returned by the img attribute are of type dict. The key res corresponds to an Image.Image object: a visualized image for displaying classification results.

Additionally, you can obtain the configuration file for the image classification pipeline and load it for prediction. You can run the following command to save the results in my_path:

paddlex --get_pipeline_config image_classification --save_path ./my_path
If you have obtained the configuration file, you can customize the settings for the image classification pipeline by simply modifying the pipeline parameter value in the create_pipeline method to the path of the configuration file. The example is as follows:

from paddlex import create_pipeline

pipeline = create_pipeline(pipeline="./my_path/image_classification.yaml")

output = pipeline.predict(
    input="./general_image_classification_001.jpg",
)
for res in output:
    res.print()
    res.save_to_img("./output/")
    res.save_to_json("./output/")

Note: The parameters in the configuration file are pipeline initialization parameters. If you wish to change the initialization parameters for the general image classification pipeline, you can directly modify the parameters in the configuration file and load it for prediction. Additionally, CLI prediction also supports passing a configuration file, simply specify the path of the configuration file with --pipeline.

3. Development Integration/Deployment

If the pipeline meets your requirements for inference speed and accuracy, you can proceed directly with development integration/deployment.

If you need to integrate the pipeline directly into your Python project, you can refer to the example code in 2.2.2 Python Script Method.

In addition, PaddleX also provides three other deployment methods, detailed as follows:

🚀 High-Performance Inference: In practical production environments, many applications have strict performance requirements (especially response speed) for deployment strategies to ensure efficient system operation and smooth user experience. To this end, PaddleX offers a high-performance inference plugin that deeply optimizes model inference and pre/post-processing to significantly accelerate the end-to-end process. For detailed procedures, please refer to the PaddleX High-Performance Inference Guide.

☁️ Service Deployment: Service deployment is a common form of deployment in practical production environments. By encapsulating inference functionality as a service, clients can access these services via network requests to obtain inference results. PaddleX supports multiple pipeline service deployment solutions. For detailed procedures, please refer to the PaddleX Service Deployment Guide.

Below are the API references for basic service deployment and examples of multi-language service calls:

API Reference

For the main operations provided by the service:

  • The HTTP request method is POST.
  • Both the request body and response body are JSON data (JSON objects).
  • When the request is processed successfully, the response status code is 200, and the attributes of the response body are as follows:
Name Type Description
logId string The UUID of the request.
errorCode integer Error code. Fixed as 0.
errorMsg string Error message. Fixed as "Success".
result object The result of the operation.
  • When the request is not processed successfully, the attributes of the response body are as follows:
Name Type Description
logId string The UUID of the request.
errorCode integer Error code. Same as the response status code.
errorMsg string Error message.

The main operations provided by the service are as follows:

  • infer

Classify the image.

POST /image-classification

  • The attributes of the request body are as follows:
Name Type Description Required
image string The URL of the image file accessible by the server or the Base64-encoded content of the image file. Yes
topk integer | null See the topk parameter description in the predict method documentation. No
  • When the request is processed successfully, the result in the response body has the following properties:
| null
Name Type Description
categories array Image category information.
image string The image classification result. The image is in JPEG format and is encoded in Base64.

Each element in categories is an object with the following properties:

Name Type Description
id integer Category ID.
name string Category name.
score number Category score.

An example of result is as follows:

{
"categories": [
{
"id": 5,
"name": "rabbit",
"score": 0.93
}
],
"image": "xxxxxx"
}
Multi-language Service Call Examples
Python
import base64
import requests

API_URL = "http://localhost:8080/image-classification"  # Service URL
image_path = "./demo.jpg"
output_image_path = "./out.jpg"

# Encode a local image using Base64
with open(image_path, "rb") as file:
    image_bytes = file.read()
    image_data = base64.b64encode(image_bytes).decode("ascii")

payload = {"image": image_data}  # Base64-encoded file content or image URL

# Call the API
response = requests.post(API_URL, json=payload)

# Process the response data
assert response.status_code == 200
result = response.json()["result"]
with open(output_image_path, "wb") as file:
    file.write(base64.b64decode(result["image"]))
print(f"Output image saved at {output_image_path}")
print("\nCategories:")
print(result["categories"])
C++
#include 
#include "cpp-httplib/httplib.h" // https://github.com/Huiyicc/cpp-httplib
#include "nlohmann/json.hpp" // https://github.com/nlohmann/json
#include "base64.hpp" // https://github.com/tobiaslocker/base64

int main() {
    httplib::Client client("localhost:8080");
    const std::string imagePath = "./demo.jpg";
    const std::string outputImagePath = "./out.jpg";

    httplib::Headers headers = {
        {"Content-Type", "application/json"}
    };

    // Encode a local image using Base64
    std::ifstream file(imagePath, std::ios::binary | std::ios::ate);
    std::streamsize size = file.tellg();
    file.seekg(0, std::ios::beg);

    std::vector buffer(size);
    if (!file.read(buffer.data(), size)) {
        std::cerr << "Error reading file." << std::endl;
        return 1;
    }
    std::string bufferStr(reinterpret_cast(buffer.data()), buffer.size());
    std::string encodedImage = base64::to_base64(bufferStr);

    nlohmann::json jsonObj;
    jsonObj["image"] = encodedImage;
    std::string body = jsonObj.dump();

    // Call the API
    auto response = client.Post("/image-classification", headers, body, "application/json");
    // Process the response data
    if (response && response->status == 200) {
        nlohmann::json jsonResponse = nlohmann::json::parse(response->body);
        auto result = jsonResponse["result"];

        encodedImage = result["image"];
        std::string decodedString = base64::from_base64(encodedImage);
        std::vector decodedImage(decodedString.begin(), decodedString.end());
        std::ofstream outputImage(outputImagePath, std::ios::binary | std::ios::out);
        if (outputImage.is_open()) {
            outputImage.write(reinterpret_cast(decodedImage.data()), decodedImage.size());
            outputImage.close();
            std::cout << "Output image saved at " << outputImagePath << std::endl;
        } else {
            std::cerr << "Unable to open file for writing: " << outputImagePath << std::endl;
        }

        auto categories = result["categories"];
        std::cout << "\nCategories:" << std::endl;
        for (const auto& category : categories) {
            std::cout << category << std::endl;
        }
    } else {
        std::cout << "Failed to send HTTP request." << std::endl;
        return 1;
    }

    return 0;
}
Java
import okhttp3.*;
import com.fasterxml.jackson.databind.ObjectMapper;
import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.node.ObjectNode;

import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.util.Base64;

public class Main {
    public static void main(String[] args) throws IOException {
        String API_URL = "http://localhost:8080/image-classification"; // Service URL
        String imagePath = "./demo.jpg"; // Local image
        String outputImagePath = "./out.jpg"; // Output image

        // Encode the local image using Base64
        File file = new File(imagePath);
        byte[] fileContent = java.nio.file.Files.readAllBytes(file.toPath());
        String imageData = Base64.getEncoder().encodeToString(fileContent);

        ObjectMapper objectMapper = new ObjectMapper();
        ObjectNode params = objectMapper.createObjectNode();
        params.put("image", imageData); // Base64-encoded file content or image URL

        // Create an OkHttpClient instance
        OkHttpClient client = new OkHttpClient();
        MediaType JSON = MediaType.Companion.get("application/json; charset=utf-8");
        RequestBody body = RequestBody.Companion.create(params.toString(), JSON);
        Request request = new Request.Builder()
                .url(API_URL)
                .post(body)
                .build();

        // Call the API and process the returned data
        try (Response response = client.newCall(request).execute()) {
            if (response.isSuccessful()) {
                String responseBody = response.body().string();
                JsonNode resultNode = objectMapper.readTree(responseBody);
                JsonNode result = resultNode.get("result");
                String base64Image = result.get("image").asText();
                JsonNode categories = result.get("categories");

                byte[] imageBytes = Base64.getDecoder().decode(base64Image);
                try (FileOutputStream fos = new FileOutputStream(outputImagePath)) {
                    fos.write(imageBytes);
                }
                System.out.println("Output image saved at " + outputImagePath);
                System.out.println("\nCategories: " + categories.toString());
            } else {
                System.err.println("Request failed with code: " + response.code());
            }
        }
    }
}
Go
package main

import (
    "bytes"
    "encoding/base64"
    "encoding/json"
    "fmt"
    "io/ioutil"
    "net/http"
)

func main() {
    API_URL := "http://localhost:8080/image-classification"
    imagePath := "./demo.jpg"
    outputImagePath := "./out.jpg"

    // Encode the local image using Base64
    imageBytes, err := ioutil.ReadFile(imagePath)
    if err != nil {
        fmt.Println("Error reading image file:", err)
        return
    }
    imageData := base64.StdEncoding.EncodeToString(imageBytes)

    payload := map[string]string{"image": imageData} // Base64-encoded file content or image URL
    payloadBytes, err := json.Marshal(payload)
    if err != nil {
        fmt.Println("Error marshaling payload:", err)
        return
    }

    // Call the API
    client := &http.Client{}
    req, err := http.NewRequest("POST", API_URL, bytes.NewBuffer(payloadBytes))
    if err != nil {
        fmt.Println("Error creating request:", err)
        return
    }

    res, err := client.Do(req)
    if err != nil {
        fmt.Println("Error sending request:", err)
        return
    }
    defer res.Body.Close()

    // Process the response data
    body, err := ioutil.ReadAll(res.Body)
    if err != nil {
        fmt.Println("Error reading response body:", err)
        return
    }

    type Response struct {
        Result struct {
            Image      string   `json:"image"`
            Categories []map[string]interface{} `json:"categories"`
        } `json:"result"`
    }

    var respData Response
    err = json.Unmarshal([]byte(string(body)), &respData)
    if err != nil {
        fmt.Println("Error unmarshaling response body:", err)
        return
    }

    outputImageData, err := base64.StdEncoding.DecodeString(respData.Result.Image)
    if err != nil {
        fmt.Println("Error decoding base64 image data:", err)
        return
    }

    err = ioutil.WriteFile(outputImagePath, outputImageData, 0644)
    if err != nil {
        fmt.Println("Error writing image to file:", err)
        return
    }

    fmt.Printf("Image saved at %s\n", outputImagePath)
    fmt.Println("\nCategories:")
    for _, category := range respData.Result.Categories {
        fmt.Println(category)
    }
}
C#
using System;
using System.IO;
using System.Net.Http;
using System.Net.Http.Headers;
using System.Text;
using System.Threading.Tasks;
using Newtonsoft.Json.Linq;

class Program
{
    static readonly string API_URL = "http://localhost:8080/image-classification";
    static readonly string imagePath = "./demo.jpg";
    static readonly string outputImagePath = "./out.jpg";

    static async Task Main(string[] args)
    {
        var httpClient = new HttpClient();

        // Encode a local image using Base64
        byte[] imageBytes = File.ReadAllBytes(imagePath);
        string image_data = Convert.ToBase64String(imageBytes);

        var payload = new JObject{ { "image", image_data } }; // Base64-encoded file content or image URL
        var content = new StringContent(payload.ToString(), Encoding.UTF8, "application/json");

        // Call the API
        HttpResponseMessage response = await httpClient.PostAsync(API_URL, content);
        response.EnsureSuccessStatusCode();

        // Process the response data
        string responseBody = await response.Content.ReadAsStringAsync();
        JObject jsonResponse = JObject.Parse(responseBody);

        string base64Image = jsonResponse["result"]["image"].ToString();
        byte[] outputImageBytes = Convert.FromBase64String(base64Image);

        File.WriteAllBytes(outputImagePath, outputImageBytes);
        Console.WriteLine($"Output image saved at {outputImagePath}");
        Console.WriteLine("\nCategories:");
        Console.WriteLine(jsonResponse["result"]["categories"].ToString());
    }
}
Node.js
const axios = require('axios');
const fs = require('fs');

const API_URL = 'http://localhost:8080/image-classification';
const imagePath = './demo.jpg';
const outputImagePath = './out.jpg';

let config = {
   method: 'POST',
   maxBodyLength: Infinity,
   url: API_URL,
   data: JSON.stringify({
    'image': encodeImageToBase64(imagePath)  // Base64-encoded file content or image URL
  })
};

// Encode the local image using Base64
function encodeImageToBase64(filePath) {
  const bitmap = fs.readFileSync(filePath);
  return Buffer.from(bitmap).toString('base64');
}

// Call the API
axios.request(config)
.then((response) => {
    // Process the returned data
    const result = response.data['result'];
    const imageBuffer = Buffer.from(result['image'], 'base64');
    fs.writeFile(outputImagePath, imageBuffer, (err) => {
      if (err) throw err;
      console.log(`Output image saved at ${outputImagePath}`);
    });
    console.log("\nCategories:");
    console.log(result['categories']);
})
.catch((error) => {
  console.log(error);
});
PHP
<?php

$API_URL = "http://localhost:8080/image-classification"; // Service URL
$image_path = "./demo.jpg";
$output_image_path = "./out.jpg";

// Encode the local image using Base64
$image_data = base64_encode(file_get_contents($image_path));
$payload = array("image" => $image_data); // Base64-encoded file content or image URL

// Call the API
$ch = curl_init($API_URL);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, json_encode($payload));
curl_setopt($ch, CURLOPT_HTTPHEADER, array('Content-Type: application/json'));
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$response = curl_exec($ch);
curl_close($ch);

// Process the response data
$result = json_decode($response, true)["result"];
file_put_contents($output_image_path, base64_decode($result["image"]));
echo "Output image saved at " . $output_image_path . "\n";
echo "\nCategories:\n";
print_r($result["categories"]);
?>


📱 Edge Deployment: Edge deployment is a method that places computing and data processing capabilities directly on the user's device, allowing the device to process data without relying on remote servers. PaddleX supports deploying models on edge devices such as Android. For detailed procedures, please refer to the PaddleX Edge Deployment Guide. You can choose the appropriate deployment method according to your needs to integrate the model pipeline into subsequent AI applications.

4. Secondary Development

If the default model weights provided by the general image classification pipeline are not satisfactory in terms of accuracy or speed in your scenario, you can try to fine-tune the existing model using your own domain-specific or application-specific data to improve the recognition performance of the general image classification pipeline in your scenario.

4.1 Model Fine-Tuning

Since the general image classification pipeline includes an image classification module, if the pipeline's performance does not meet expectations, you need to refer to the fine-tuning tutorial links in the table below for model fine-tuning.

Scenario Fine-Tuning Module Fine-Tuning Reference Link
Inaccurate multi-label classification Multi-label classification module Link

4.2 Model Application

After you complete fine-tuning with your private dataset, you will obtain the local model weight file.

If you need to use the fine-tuned model weights, simply modify the pipeline configuration file by replacing the local path of the fine-tuned model weights to the corresponding position in the pipeline configuration file.

SubModules:
  ImageClassification:
    module_name: image_classification
    model_name: PP-LCNet_x0_5
    model_dir: null # Replace with the path to the fine-tuned image classification model weights
    batch_size: 4
    topk: 5

Subsequently, refer to the command line method or Python script method in the local experience section to load the modified pipeline configuration file.

5. Multi-Hardware Support

PaddleX supports a variety of mainstream hardware devices, including NVIDIA GPU, Kunlunxin XPU, Ascend NPU, and Cambricon MLU. Simply modify the --device parameter to seamlessly switch between different hardware devices.

For example, if you are using Ascend NPU for inference in the general image classification pipeline, the Python command you would use is:

paddlex --pipeline image_classification \
        --input general_image_classification_001.jpg \
        --save_path ./output \
        --device npu:0

If you want to use the general image classification pipeline on a wider variety of hardware, please refer to the PaddleX Multi-Device Usage Guide.

Comments