Skip to content

PaddleX Model List (CPU/GPU)

PaddleX includes multiple production lines, each containing several modules, and each module includes several models. You can choose which models to use based on the benchmark data below. If you prioritize model accuracy, choose models with higher accuracy. If you prioritize model inference speed, choose models with faster inference speed. If you prioritize model storage size, choose models with smaller storage size.

Image Classification Module

Model Name Top1 Acc (%) GPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
CPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
Model Storage Size yaml File Model Download Link
CLIP_vit_base_patch16_224 85.36 12.84 / 2.82 60.52 / 60.52 306.5 M CLIP_vit_base_patch16_224.yaml Inference Model/Training Model
CLIP_vit_large_patch14_224 88.1 51.72 / 11.13 238.07 / 238.07 1.04 G CLIP_vit_large_patch14_224.yaml Inference Model/Training Model
ConvNeXt_base_224 83.84 13.18 / 12.14 128.39 / 81.78 313.9 M ConvNeXt_base_224.yaml Inference Model/Training Model
ConvNeXt_base_384 84.90 32.15 / 30.52 279.36 / 220.35 313.9 M ConvNeXt_base_384.yaml Inference Model/Training Model
ConvNeXt_large_224 84.26 26.51 / 7.21 213.32 / 157.22 700.7 M ConvNeXt_large_224.yaml Inference Model/Training Model
ConvNeXt_large_384 85.27 67.07 / 65.26 494.04 / 438.97 700.7 M ConvNeXt_large_384.yaml Inference Model/Training Model
ConvNeXt_small 83.13 9.05 / 8.21 97.94 / 55.29 178.0 M ConvNeXt_small.yaml Inference Model/Training Model
ConvNeXt_tiny 82.03 5.12 / 2.06 63.96 / 29.77 101.4 M ConvNeXt_tiny.yaml Inference Model/Training Model
FasterNet-L 83.5 15.67 / 3.10 52.24 / 52.24 357.1 M FasterNet-L.yaml Inference Model/Training Model
FasterNet-M 83.0 9.72 / 2.30 35.29 / 35.29 204.6 M FasterNet-M.yaml Inference Model/Training Model
FasterNet-S 81.3 5.46 / 1.27 20.46 / 18.03 119.3 M FasterNet-S.yaml Inference Model/Training Model
FasterNet-T0 71.9 4.18 / 0.60 6.34 / 3.44 15.1 M FasterNet-T0.yaml Inference Model/Training Model
FasterNet-T1 75.9 4.24 / 0.64 9.57 / 5.20 29.2 M FasterNet-T1.yaml Inference Model/Training Model
FasterNet-T2 79.1 3.87 / 0.78 11.14 / 9.98 57.4 M FasterNet-T2.yaml Inference Model/Training Model
MobileNetV1_x0_5 63.5 1.39 / 0.28 2.74 / 1.02 4.8 M MobileNetV1_x0_5.yaml Inference Model/Training Model
MobileNetV1_x0_25 51.4 1.32 / 0.30 2.04 / 0.58 1.8 M MobileNetV1_x0_25.yaml Inference Model/Training Model
MobileNetV1_x0_75 68.8 1.75 / 0.33 3.41 / 1.57 9.3 M MobileNetV1_x0_75.yaml Inference Model/Training Model
MobileNetV1_x1_0 71.0 1.89 / 0.34 4.01 / 2.17 15.2 M MobileNetV1_x1_0.yaml Inference Model/Training Model
MobileNetV2_x0_5 65.0 3.17 / 0.48 4.52 / 1.35 7.1 M MobileNetV2_x0_5.yaml Inference Model/Training Model
MobileNetV2_x0_25 53.2 2.80 / 0.46 3.92 / 0.98 5.5 M MobileNetV2_x0_25.yaml Inference Model/Training Model
MobileNetV2_x1_0 72.2 3.57 / 0.49 5.63 / 2.51 12.6 M MobileNetV2_x1_0.yaml Inference Model/Training Model
MobileNetV2_x1_5 74.1 3.58 / 0.62 8.02 / 4.49 25.0 M MobileNetV2_x1_5.yaml Inference Model/Training Model
MobileNetV2_x2_0 75.2 3.56 / 0.74 10.24 / 6.83 41.2 M MobileNetV2_x2_0.yaml Inference Model/Training Model
MobileNetV3_large_x0_5 69.2 3.79 / 0.62 6.76 / 1.61 9.6 M MobileNetV3_large_x0_5.yaml Inference Model/Training Model
MobileNetV3_large_x0_35 64.3 3.70 / 0.60 5.54 / 1.41 7.5 M MobileNetV3_large_x0_35.yaml Inference Model/Training Model
MobileNetV3_large_x0_75 73.1 4.82 / 0.66 7.45 / 2.00 14.0 M MobileNetV3_large_x0_75.yaml Inference Model/Training Model
MobileNetV3_large_x1_0 75.3 4.86 / 0.68 6.88 / 2.61 19.5 M MobileNetV3_large_x1_0.yaml Inference Model/Training Model
MobileNetV3_large_x1_25 76.4 5.08 / 0.71 7.37 / 3.58 26.5 M MobileNetV3_large_x1_25.yaml Inference Model/Training Model
MobileNetV3_small_x0_5 59.2 3.41 / 0.57 5.60 / 1.14 6.8 M MobileNetV3_small_x0_5.yaml Inference Model/Training Model
MobileNetV3_small_x0_35 53.0 3.49 / 0.60 4.63 / 1.07 6.0 M MobileNetV3_small_x0_35.yaml Inference Model/Training Model
MobileNetV3_small_x0_75 66.0 3.49 / 0.60 5.19 / 1.28 8.5 M MobileNetV3_small_x0_75.yaml Inference Model/Training Model
MobileNetV3_small_x1_0 68.2 3.76 / 0.53 5.11 / 1.43 10.5 M MobileNetV3_small_x1_0.yaml Inference Model/Training Model
MobileNetV3_small_x1_25 70.7 4.23 / 0.58 6.48 / 1.68 13.0 M MobileNetV3_small_x1_25.yaml Inference Model/Training Model
MobileNetV4_conv_large 83.4 8.33 / 2.24 33.56 / 23.70 125.2 M MobileNetV4_conv_large.yaml Inference Model/Training Model
MobileNetV4_conv_medium 79.9 6.81 / 0.92 12.47 / 6.27 37.6 M MobileNetV4_conv_medium.yaml Inference Model/Training Model
MobileNetV4_conv_small 74.6 3.25 / 0.46 4.42 / 1.54 14.7 M MobileNetV4_conv_small.yaml Inference Model/Training Model
MobileNetV4_hybrid_large 83.8 12.27 / 4.18 58.64 / 58.64 145.1 M MobileNetV4_hybrid_large.yaml Inference Model/Training Model
MobileNetV4_hybrid_medium 80.5 12.08 / 1.34 24.69 / 8.10 42.9 M MobileNetV4_hybrid_medium.yaml Inference Model/Training Model
PP-HGNet_base 85.0 14.10 / 4.19 68.92 / 68.92 249.4 M PP-HGNet_base.yaml Inference Model/Training Model
PP-HGNet_small 81.51 5.12 / 1.73 25.01 / 25.01 86.5 M PP-HGNet_small.yaml Inference Model/Training Model
PP-HGNet_tiny 79.83 3.28 / 1.29 16.40 / 15.97 52.4 M PP-HGNet_tiny.yaml Inference Model/Training Model
PP-HGNetV2-B0 77.77 3.83 / 0.57 9.95 / 2.37 21.4 M PP-HGNetV2-B0.yaml Inference Model/Training Model
PP-HGNetV2-B1 79.18 3.87 / 0.62 8.77 / 3.79 22.6 M PP-HGNetV2-B1.yaml Inference Model/Training Model
PP-HGNetV2-B2 81.74 5.73 / 0.86 15.11 / 7.05 39.9 M PP-HGNetV2-B2.yaml Inference Model/Training Model
PP-HGNetV2-B3 82.98 6.26 / 1.01 18.47 / 10.34 57.9 M PP-HGNetV2-B3.yaml Inference Model/Training Model
PP-HGNetV2-B4 83.57 5.47 / 1.10 14.42 / 9.89 70.4 M PP-HGNetV2-B4.yaml Inference Model/Training Model
PP-HGNetV2-B5 84.75 10.24 / 1.96 29.71 / 29.71 140.8 M PP-HGNetV2-B5.yaml Inference Model/Training Model
PP-HGNetV2-B6 86.30 12.25 / 3.76 62.29 / 62.29 268.4 M PP-HGNetV2-B6.yaml Inference Model/Training Model
PP-LCNet_x0_5 63.14 2.28 / 0.42 2.86 / 0.83 6.7 M PP-LCNet_x0_5.yaml Inference Model/Training Model
PP-LCNet_x0_25 51.86 1.89 / 0.45 2.49 / 0.68 5.5 M PP-LCNet_x0_25.yaml Inference Model/Training Model
PP-LCNet_x0_35 58.09 1.94 / 0.41 2.73 / 0.77 5.9 M PP-LCNet_x0_35.yaml Inference Model/Training Model
PP-LCNet_x0_75 68.18 2.30 / 0.41 2.95 / 1.07 8.4 M PP-LCNet_x0_75.yaml Inference Model/Training Model
PP-LCNet_x1_0 71.32 2.35 / 0.47 4.03 / 1.35 10.5 M PP-LCNet_x1_0.yaml Inference Model/Training Model
PP-LCNet_x1_5 73.71 2.33 / 0.53 4.17 / 2.29 16.0 M PP-LCNet_x1_5.yaml Inference Model/Training Model
PP-LCNet_x2_0 75.18 2.40 / 0.51 5.37 / 3.46 23.2 M PP-LCNet_x2_0.yaml Inference Model/Training Model
PP-LCNet_x2_5 76.60 2.36 / 0.61 6.29 / 5.05 32.1 M PP-LCNet_x2_5.yaml Inference Model/Training Model
PP-LCNetV2_base 77.05 3.33 / 0.55 6.86 / 3.77 23.7 M PP-LCNetV2_base.yaml Inference Model/Training Model
PP-LCNetV2_large 78.51 4.37 / 0.71 9.43 / 8.07 37.3 M PP-LCNetV2_large.yaml Inference Model/Training Model
PP-LCNetV2_small 73.97 2.53 / 0.41 5.14 / 1.98 14.6 M PP-LCNetV2_small.yaml Inference Model/Training Model
ResNet18_vd 72.3 2.47 / 0.61 6.97 / 5.15 41.5 M ResNet18_vd.yaml Inference Model/Training Model
ResNet18 71.0 2.35 / 0.67 6.35 / 4.61 41.5 M ResNet18.yaml Inference Model/Training Model
ResNet34_vd 76.0 4.01 / 1.03 11.99 / 9.86 77.3 M ResNet34_vd.yaml Inference Model/Training Model
ResNet34 74.6 3.99 / 1.02 12.42 / 9.81 77.3 M ResNet34.yaml Inference Model/Training Model
ResNet50_vd 79.1 6.04 / 1.16 16.08 / 12.07 90.8 M ResNet50_vd.yaml Inference Model/Training Model
ResNet50 76.5 6.44 / 1.16 15.04 / 11.63 90.8 M ResNet50.yaml Inference Model/Training Model
ResNet101_vd 80.2 11.16 / 2.07 32.14 / 32.14 158.4 M ResNet101_vd.yaml Inference Model/Training Model
ResNet101 77.6 10.91 / 2.06 31.14 / 22.93 158.7 M ResNet101.yaml Inference Model/Training Model
ResNet152_vd 80.6 15.96 / 2.99 49.33 / 49.33 214.3 M ResNet152_vd.yaml Inference Model/Training Model
ResNet152 78.3 15.61 / 2.90 47.33 / 36.60 214.2 M ResNet152.yaml Inference Model/Training Model
ResNet200_vd 80.9 24.20 / 3.69 62.62 / 62.62 266.0 M ResNet200_vd.yaml Inference Model/Training Model
StarNet-S1 73.6 6.33 / 1.98 7.56 / 3.26 11.2 M StarNet-S1.yaml Inference Model/Training Model
StarNet-S2 74.8 4.49 / 1.55 7.38 / 3.38 14.3 M StarNet-S2.yaml Inference Model/Training Model
StarNet-S3 77.0 6.70 / 1.62 11.05 / 4.76 22.2 M StarNet-S3.yaml Inference Model/Training Model
StarNet-S4 79.0 8.50 / 2.86 15.40 / 6.76 28.9 M StarNet-S4.yaml Inference Model/Training Model
SwinTransformer_base_patch4_window7_224 83.37 14.29 / 5.13 130.89 / 130.89 310.5 M SwinTransformer_base_patch4_window7_224.yaml Inference Model/Training Model
SwinTransformer_base_patch4_window12_384 84.17 37.74 / 10.10 362.56 / 362.56 311.4 M SwinTransformer_base_patch4_window12_384.yaml Inference Model/Training Model
SwinTransformer_large_patch4_window7_224 86.19 26.48 / 7.94 228.23 / 228.23 694.8 M SwinTransformer_large_patch4_window7_224.yaml Inference Model/Training Model
SwinTransformer_large_patch4_window12_384 87.06 74.72 / 18.16 652.04 / 652.04 696.1 M SwinTransformer_large_patch4_window12_384.yaml Inference Model/Training Model
SwinTransformer_small_patch4_window7_224 83.21 10.37 / 3.90 94.20 / 94.20 175.6 M SwinTransformer_small_patch4_window7_224.yaml Inference Model/Training Model
SwinTransformer_tiny_patch4_window7_224 81.10 6.66 / 2.15 60.45 / 60.45 100.1 M SwinTransformer_tiny_patch4_window7_224.yaml Inference Model/Training Model

Note: The above accuracy metrics are based on the ImageNet-1k validation set Top1 Acc.

Image Multi-label Classification Module

Model Name mAP (%) GPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
CPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
Model Storage Size yaml File Model Download Link
CLIP_vit_base_patch16_448_ML 89.15 54.75 / 14.30 280.23 / 280.23 325.6 M CLIP_vit_base_patch16_448_ML.yaml Inference Model/Training Model
PP-HGNetV2-B0_ML 80.98 6.47 / 1.38 21.56 / 13.69 39.6 M PP-HGNetV2-B0_ML.yaml Inference Model/Training Model
PP-HGNetV2-B4_ML 87.96 9.63 / 2.79 43.98 / 36.63 88.5 M PP-HGNetV2-B4_ML.yaml Inference Model/Training Model
PP-HGNetV2-B6_ML 91.06 37.07 / 9.43 188.58 / 188.58 286.5 M PP-HGNetV2-B6_ML.yaml Inference Model/Training Model
PP-LCNet_x1_0_ML 77.96 4.04 / 1.15 11.76 / 8.32 29.4 M PP-LCNet_x1_0_ML.yaml Inference Model/Training Model
ResNet50_ML 83.42 12.12 / 3.27 51.79 / 44.36 108.9 M ResNet50_ML.yaml Inference Model/Training Model

Note: The above accuracy metrics are for the multi-label classification task mAP on COCO2017.

Pedestrian Attribute Module

Model Name mA (%) GPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
CPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
Model Size yaml File Model Download Link
PP-LCNet_x1_0_pedestrian_attribute 92.2 2.35 / 0.49 3.17 / 1.25 6.7 M PP-LCNet_x1_0_pedestrian_attribute.yaml Inference Model/Training Model

Note: The above accuracy metrics are for the internal PaddleX dataset mA.

Vehicle Attribute Module

Model Name mA (%) GPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
CPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
Model Storage Size yaml File Model Download Link
PP-LCNet_x1_0_vehicle_attribute 91.7 2.32 / 2.32 3.22 / 1.26 6.7 M PP-LCNet_x1_0_vehicle_attribute.yaml Inference Model/Training Model

Note: The above accuracy metrics are based on the VeRi dataset mA.

Image Feature Module

Model Name recall@1 (%) GPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
CPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
Model Storage Size yaml File Model Download Link
PP-ShiTuV2_rec 84.2 3.48 / 0.55 8.04 / 4.04 16.3 M PP-ShiTuV2_rec.yaml Inference Model/Training Model
PP-ShiTuV2_rec_CLIP_vit_base 88.69 12.94 / 2.88 58.36 / 58.36 306.6 M PP-ShiTuV2_rec_CLIP_vit_base.yaml Inference Model/Training Model
PP-ShiTuV2_rec_CLIP_vit_large 91.03 51.65 / 11.18 255.78 / 255.78 1.05 G PP-ShiTuV2_rec_CLIP_vit_large.yaml Inference Model/Training Model

Note: The above accuracy metrics are based on the AliProducts recall@1.

Document Orientation Classification Module

Model Name Top-1 Acc (%) GPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
CPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
Model Storage Size yaml File Model Download Link
PP-LCNet_x1_0_doc_ori 99.06 2.31 / 0.43 3.37 / 1.27 7 PP-LCNet_x1_0_doc_ori.yaml Inference Model/Training Model

Note: The above accuracy metrics are based on the Top-1 Acc of the internal dataset of PaddleX.

Face Feature Module

Model Name Output Feature Dimension Acc (%)
AgeDB-30/CFP-FP/LFW
GPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
CPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
Model Storage Size (M) yaml File Model Download Link
MobileFaceNet 128 96.28/96.71/99.58 3.16 / 0.48 6.49 / 6.49 4.1 MobileFaceNet.yaml Inference Model/Training Model
ResNet50_face 512 98.12/98.56/99.77 5.68 / 1.09 14.96 / 11.90 87.2 ResNet50_face.yaml Inference Model/Training Model

Note: The above accuracy metrics are measured on the AgeDB-30, CFP-FP, and LFW datasets.

Main Body Detection Module

Model Name mAP (%) GPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
CPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
Model Storage Size yaml File Model Download Link
PP-ShiTuV2_det 41.5 12.79 / 4.51 44.14 / 44.14 27.54 PP-ShiTuV2_det.yaml Inference Model/Training Model

Note: The above accuracy metrics are based on the PaddleClas Main Body Detection Dataset mAP(0.5:0.95).

Object Detection Module

Model Name mAP (%) GPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
CPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
Model Storage Size yaml File Model Download Link
Cascade-FasterRCNN-ResNet50-FPN 41.1 135.92 / 135.92 - 245.4 M Cascade-FasterRCNN-ResNet50-FPN.yaml Inference Model/Training Model
Cascade-FasterRCNN-ResNet50-vd-SSLDv2-FPN 45.0 138.23 / 138.23 - 246.2 M Cascade-FasterRCNN-ResNet50-vd-SSLDv2-FPN.yaml Inference Model/Training Model
CenterNet-DLA-34 37.6 - - 75.4 M CenterNet-DLA-34.yaml Inference Model/Training Model
CenterNet-ResNet50 38.9 - - 319.7 M CenterNet-ResNet50.yaml Inference Model/Training Model
DETR-R50 42.3 62.91 / 17.33 392.63 / 392.63 159.3 M DETR-R50.yaml Inference Model/Training Model
FasterRCNN-ResNet34-FPN 37.8 83.33 / 31.64 - 137.5 M FasterRCNN-ResNet34-FPN.yaml Inference Model/Training Model
FasterRCNN-ResNet50-FPN 38.4 107.08 / 35.40 - 148.1 M FasterRCNN-ResNet50-FPN.yaml Inference Model/Training Model
FasterRCNN-ResNet50-vd-FPN 39.5 109.36 / 36.00 - 148.1 M FasterRCNN-ResNet50-vd-FPN.yaml Inference Model/Training Model
FasterRCNN-ResNet50-vd-SSLDv2-FPN 41.4 109.06 / 36.19 - 148.1 M FasterRCNN-ResNet50-vd-SSLDv2-FPN.yaml Inference Model/Training Model
FasterRCNN-ResNet50 36.7 496.33 / 109.12 - 120.2 M FasterRCNN-ResNet50.yaml Inference Model/Training Model
FasterRCNN-ResNet101-FPN 41.4 148.21 / 42.21 - 216.3 M FasterRCNN-ResNet101-FPN.yaml Inference Model/Training Model
FasterRCNN-ResNet101 39.0 538.58 / 120.88 - 188.1 M FasterRCNN-ResNet101.yaml Inference Model/Training Model
FasterRCNN-ResNeXt101-vd-FPN 43.4 258.01 / 58.25 - 360.6 M FasterRCNN-ResNeXt101-vd-FPN.yaml Inference Model/Training Model
FasterRCNN-Swin-Tiny-FPN 42.6 - - 159.8 M FasterRCNN-Swin-Tiny-FPN.yaml Inference Model/Training Model
FCOS-ResNet50 39.6 106.13 / 28.32 721.79 / 721.79 124.2 M FCOS-ResNet50.yaml Inference Model/Training Model
PicoDet-L 42.6 14.68 / 5.81 47.32 / 47.32 20.9 M PicoDet-L.yaml Inference Model/Training Model
PicoDet-M 37.5 9.62 / 3.23 23.75 / 14.88 16.8 M PicoDet-M.yaml Inference Model/Training Model
PicoDet-S 29.1 7.98 / 2.33 14.82 / 5.60 4.4 M PicoDet-S.yaml Inference Model/Training Model
PicoDet-XS 26.2 9.66 / 2.75 19.15 / 7.24 5.7 M PicoDet-XS.yaml Inference Model/Training Model
PP-YOLOE_plus-L 52.9 33.55 / 10.46 189.05 / 189.05 185.3 M PP-YOLOE_plus-L.yaml Inference Model/Training Model
PP-YOLOE_plus-M 49.8 19.52 / 7.46 113.36 / 113.36 83.2 M PP-YOLOE_plus-M.yaml Inference Model/Training Model
PP-YOLOE_plus-S 43.7 12.16 / 4.58 73.86 / 52.90 28.3 M PP-YOLOE_plus-S.yaml Inference Model/Training Model
PP-YOLOE_plus-X 54.7 58.87 / 15.84 292.93 / 292.93 349.4 M PP-YOLOE_plus-X.yaml Inference Model/Training Model
RT-DETR-H 56.3 115.92 / 28.16 971.32 / 971.32 435.8 M RT-DETR-H.yaml Inference Model/Training Model
RT-DETR-L 53.0 35.00 / 10.45 495.51 / 495.51 113.7 M RT-DETR-L.yaml Inference Model/Training Model
RT-DETR-R18 46.5 20.21 / 6.23 266.01 / 266.01 70.7 M RT-DETR-R18.yaml Inference Model/Training Model
RT-DETR-R50 53.1 42.14 / 11.31 523.97 / 523.97 149.1 M RT-DETR-R50.yaml Inference Model/Training Model
RT-DETR-X 54.8 61.24 / 15.83 647.08 / 647.08 232.9 M RT-DETR-X.yaml Inference Model/Training Model
YOLOv3-DarkNet53 39.1 41.58 / 10.10 158.78 / 158.78 219.7 M YOLOv3-DarkNet53.yaml Inference Model/Training Model
YOLOv3-MobileNetV3 31.4 16.53 / 5.70 60.44 / 60.44 83.8 M YOLOv3-MobileNetV3.yaml Inference Model/Training Model
YOLOv3-ResNet50_vd_DCN 40.6 32.91 / 10.07 225.72 / 224.32 163.0 M YOLOv3-ResNet50_vd_DCN.yaml Inference Model/Training Model
YOLOX-L 50.1 121.19 / 13.55 295.38 / 274.15 192.5 M YOLOX-L.yaml Inference Model/Training Model
YOLOX-M 46.9 87.19 / 10.09 183.95 / 172.67 90.0 M YOLOX-M.yaml Inference Model/Training Model
YOLOX-N 26.1 53.31 / 45.02 69.69 / 59.18 3.4M YOLOX-N.yaml Inference Model/Training Model
YOLOX-S 40.4 129.52 / 13.19 181.39 / 179.01 32.0 M YOLOX-S.yaml Inference Model/Training Model
YOLOX-T 32.9 66.81 / 61.31 92.30 / 83.90 18.1 M YOLOX-T.yaml Inference Model/Training Model
YOLOX-X 51.8 156.40 / 20.17 480.14 / 454.35 351.5 M YOLOX-X.yaml Inference Model/Training Model

Note: The above accuracy metrics are based on the COCO2017 validation set mAP(0.5:0.95).

Small Object Detection Module

Model Name mAP (%) GPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
CPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
Model Size yaml File Model Download Links
PP-YOLOE_plus_SOD-S 25.1 135.68 / 122.94 188.09 / 107.74 77.3 M PP-YOLOE_plus_SOD-S.yaml Inference Model/Training Model
PP-YOLOE_plus_SOD-L 31.9 57.1448 1006.98 325.0 M PP-YOLOE_plus_SOD-L.yaml Inference Model/Training Model PP-YOLOE_plus_SOD-largesize-L 42.7 458.521 11172.7 340.5 M PP-YOLOE_plus_SOD-largesize-L.yaml Inference Model/Training Model

Note: The above accuracy metrics are based on the validation set mAP(0.5:0.95) of VisDrone-DET.

Open-Vocabulary Object Detection

Model mAP(0.5:0.95) mAP(0.5) GPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
CPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
Model Size (M) Model Download Link
GroundingDINO-T 49.4 64.4 253.72 1807.4 658.3 Inference Model

Note: The above accuracy metrics are based on the COCO val2017 validation set mAP(0.5:0.95). All models' GPU inference times are based on an NVIDIA Tesla T4 machine with FP32 precision. CPU inference speeds are based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision.

Open Vocabulary Segmentation

Model GPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
CPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
Model Storage Size (M) Model Download Link
SAM-H_box 144.9 33920.7 2433.7 Inference Model
SAM-H_point 144.9 33920.7 2433.7 Inference Model

Note: All model GPU inference times are based on NVIDIA Tesla T4, with precision type FP32. CPU inference speed is based on Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz, with 8 threads, and precision type FP32.

Rotated Object Detection

Model mAP(%) GPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
CPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
Model Storage Size (M) yaml File Model Download Link
PP-YOLOE-R-L 78.14 20.7039 157.942 211.0 M PP-YOLOE-R.yaml Inference Model/Training Model

Note: The above accuracy metrics are based on the DOTA validation set mAP(0.5:0.95). All model GPU inference times are based on NVIDIA TRX2080 Ti, with precision type F16. CPU inference speed is based on Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz, with 8 threads, and precision type FP32.

Pedestrian Detection Module

Model Name mAP (%) GPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
CPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
Model Storage Size yaml File Model Download Link
PP-YOLOE-L_human 48.0 33.27 / 9.19 173.72 / 173.72 196.1 M PP-YOLOE-L_human.yaml Inference Model/Training Model
PP-YOLOE-S_human 42.5 9.94 / 3.42 54.48 / 46.52 28.8 M PP-YOLOE-S_human.yaml Inference Model/Training Model

Note: The above accuracy metrics are based on the validation set mAP(0.5:0.95) of CrowdHuman.

Vehicle Detection Module

Model Name mAP (%) GPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
CPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
Model Storage Size yaml File Model Download Link
PP-YOLOE-L_vehicle 63.9 32.84 / 9.03 176.60 / 176.60 196.1 M PP-YOLOE-L_vehicle.yaml Inference Model/Training Model
PP-YOLOE-S_vehicle 61.3 9.79 / 3.48 54.14 / 46.69 28.8 M PP-YOLOE-S_vehicle.yaml Inference Model/Training Model

Note: The above precision metrics are based on the validation set mAP(0.5:0.95) of PPVehicle

Face Detection Module

Model Name AP (%)
Easy/Medium/Hard
GPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
CPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
Model Storage Size yaml File Model Download Link
BlazeFace 77.7/73.4/49.5 60.34 / 54.76 84.18 / 84.18 0.447 M BlazeFace.yaml Inference Model/Training Model
BlazeFace-FPN-SSH 83.2/80.5/60.5 69.29 / 63.42 86.96 / 86.96 0.606 M BlazeFace-FPN-SSH.yaml Inference Model/Training Model
PicoDet_LCNet_x2_5_face 93.7/90.7/68.1 35.37 / 12.88 126.24 / 126.24 28.9 M PicoDet_LCNet_x2_5_face.yaml Inference Model/Training Model
PP-YOLOE_plus-S_face 93.9/91.8/79.8 22.54 / 8.33 138.67 / 138.67 26.5 M PP-YOLOE_plus-S_face Inference Model/Training Model

Note: The above precision metrics are evaluated on the WIDER-FACE validation set with an input size of 640x640.

Anomaly Detection Module

Model Name mIoU GPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
CPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
Model Storage Size yaml File Model Download Link
STFPM 0.9901 2.97 / 1.57 38.86 / 13.24 22.5 M STFPM.yaml Inference Model/Training Model

Note: The above precision metrics are the average anomaly scores on the validation set of MVTec AD.

Model Scheme Input Size AP(0.5:0.95) GPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
CPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
Model Storage Size (M) yaml File Model Download Link
PP-TinyPose_128x96 Top-Down 128*96 58.4 4.9 PP-TinyPose_128x96.yaml Inference Model/Training Model
PP-TinyPose_256x192 Top-Down 256*192 68.3 4.9 PP-TinyPose_256x192.yaml Inference Model/Training Model

Note: The above accuracy metrics are based on the COCO dataset AP(0.5:0.95), with detection boxes obtained from ground truth annotations. All GPU inference times are based on an NVIDIA Tesla T4 machine, with precision type FP32. CPU inference speeds are based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz, with 8 threads and precision type FP32.

3D Multi-modal Fusion Detection Module

Model mAP(%) NDS yaml File Model Download Link
BEVFusion 53.9 60.9 BEVFusion.yaml Inference Model/Training Model

Note: The above accuracy metrics are based on the nuscenes validation set with mAP(0.5:0.95) and NDS 60.9, and the precision type is FP32.

Semantic Segmentation Module

Model Name mloU(%) GPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
CPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
Model Storage Size yaml File Model Download Link
Deeplabv3_Plus-R50 80.36 503.51 / 122.30 3543.91 / 3543.91 94.9 M Deeplabv3_Plus-R50.yaml Inference Model/Training Model
Deeplabv3_Plus-R101 81.10 803.79 / 175.45 5136.21 / 5136.21 162.5 M Deeplabv3_Plus-R101.yaml Inference Model/Training Model
Deeplabv3-R50 79.90 647.56 / 121.67 3803.09 / 3803.09 138.3 M Deeplabv3-R50.yaml Inference Model/Training Model
Deeplabv3-R101 80.85 950.43 / 178.50 5517.14 / 5517.14 205.9 M Deeplabv3-R101.yaml Inference Model/Training Model
OCRNet_HRNet-W18 80.67 286.12 / 80.76 1794.03 / 1794.03 43.1 M OCRNet_HRNet-W18.yaml Inference Model/Training Model
OCRNet_HRNet-W48 82.15 627.36 / 170.76 3531.61 / 3531.61 249.8 M OCRNet_HRNet-W48.yaml Inference Model/Training Model
PP-LiteSeg-T 73.10 30.16 / 14.03 420.07 / 235.01 28.5 M PP-LiteSeg-T.yaml Inference Model/Training Model
PP-LiteSeg-B 75.25 40.92 / 20.18 494.32 / 310.34 47.0 M PP-LiteSeg-B.yaml Inference Model/Training Model
SegFormer-B0 (slice) 76.73 11.1946 268.929 13.2 M SegFormer-B0.yaml Inference Model/Training Model
SegFormer-B1 (slice) 78.35 17.9998 403.393 48.5 M SegFormer-B1.yaml Inference Model/Training Model
SegFormer-B2 (slice) 81.60 48.0371 1248.52 96.9 M SegFormer-B2.yaml Inference Model/Training Model
SegFormer-B3 (slice) 82.47 64.341 1666.35 167.3 M SegFormer-B3.yaml Inference Model/Training Model
SegFormer-B4 (slice) 82.38 82.4336 1995.42 226.7 M SegFormer-B4.yaml Inference Model/Training Model
SegFormer-B5 (slice) 82.58 97.3717 2420.19 229.7 M SegFormer-B5.yaml Inference Model/Training Model

Note: The above accuracy metrics are based on the Cityscapes dataset mIoU.

Model Name mIoU (%) GPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
CPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
Model Storage Size yaml File Model Download Link
SeaFormer_base(slice) 40.92 24.4073 397.574 30.8 M SeaFormer_base.yaml Inference Model/Training Model
SeaFormer_large (slice) 43.66 27.8123 550.464 49.8 M SeaFormer_large.yaml Inference Model/Training Model
SeaFormer_small (slice) 38.73 19.2295 358.343 14.3 M SeaFormer_small.yaml Inference Model/Training Model
SeaFormer_tiny (slice) 34.58 13.9496 330.132 6.1M SeaFormer_tiny.yaml Inference Model/Training Model

Note: The above accuracy metrics are based on the ADE20k dataset. "Slice" indicates that the input images have been cropped.

Instance Segmentation Module

Model Name Mask AP GPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
CPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
Model Storage Size yaml File Model Download Link
Mask-RT-DETR-H 50.6 172.36 / 172.36 1615.75 / 1615.75 449.9 M Mask-RT-DETR-H.yaml Inference Model/Training Model
Mask-RT-DETR-L 45.7 88.18 / 88.18 1090.84 / 1090.84 113.6 M Mask-RT-DETR-L.yaml Inference Model/Training Model
Mask-RT-DETR-M 42.7 78.69 / 78.69 - 66.6 M Mask-RT-DETR-M.yaml Inference Model/Training Model
Mask-RT-DETR-S 41.0 33.5007 - 51.8 M Mask-RT-DETR-S.yaml Inference Model/Training Model
Mask-RT-DETR-X 47.5 114.16 / 114.16 1240.92 / 1240.92 237.5 M Mask-RT-DETR-X.yaml Inference Model/Training Model
Cascade-MaskRCNN-ResNet50-FPN 36.3 141.69 / 141.69 - 254.8 M Cascade-MaskRCNN-ResNet50-FPN.yaml Inference Model/Training Model
Cascade-MaskRCNN-ResNet50-vd-SSLDv2-FPN 39.1 147.62 / 147.62 - 254.7 M Cascade-MaskRCNN-ResNet50-vd-SSLDv2-FPN.yaml Inference Model/Training Model
MaskRCNN-ResNet50-FPN 35.6 118.30 / 118.30 - 157.5 M MaskRCNN-ResNet50-FPN.yaml Inference Model/Training Model
MaskRCNN-ResNet50-vd-FPN 36.4 118.34 / 118.34 - 157.5 M MaskRCNN-ResNet50-vd-FPN.yaml Inference Model/Training Model
MaskRCNN-ResNet50 32.8 228.83 / 228.83 - 127.8 M MaskRCNN-ResNet50.yaml Inference Model/Training Model
MaskRCNN-ResNet101-FPN 36.6 148.14 / 148.14 - 225.4 M MaskRCNN-ResNet101-FPN.yaml Inference Model/Training Model
MaskRCNN-ResNet101-vd-FPN 38.1 151.12 / 151.12 - 225.1 M MaskRCNN-ResNet101-vd-FPN.yaml Inference Model/Training Model
MaskRCNN-ResNeXt101-vd-FPN 39.5 237.55 / 237.55 - 370.0 M MaskRCNN-ResNeXt101-vd-FPN.yaml Inference Model/Training Model
PP-YOLOE_seg-S 32.5 - - 31.5 M PP-YOLOE_seg-S.yaml Inference Model/Training Model
SOLOv2 35.5 - - 179.1 M SOLOv2.yaml Inference Model/Training Model

Note: The above accuracy metrics are based on the Mask AP(0.5:0.95) on the COCO2017 validation set.

Text Detection Module

Model Detection Hmean (%) GPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
CPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
Model Storage Size (M) yaml File Model Download Link
PP-OCRv4_server_det 82.56 83.34 / 80.91 442.58 / 442.58 109 PP-OCRv4_server_det.yaml Inference Model/Training Model
PP-OCRv4_mobile_det 77.35 8.79 / 3.13 51.00 / 28.58 4.7 PP-OCRv4_mobile_det.yaml Inference Model/Training Model
PP-OCRv3_mobile_det 78.68 8.44 / 2.91 27.87 / 27.87 2.1 PP-OCRv3_mobile_det.yaml Inference Model/Training Model
PP-OCRv3_server_det 80.11 65.41 / 13.67 305.07 / 305.07 102.1 PP-OCRv3_server_det.yaml Inference Model/Training Model

Note: The evaluation dataset for the above accuracy metrics is the self-built Chinese and English dataset of PaddleOCR, covering multiple scenarios such as street view, web images, documents, and handwriting, with 593 images for text recognition. The GPU inference time for all models is based on an NVIDIA Tesla T4 machine with FP32 precision type, while the CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision type.

Seal Text Detection Module

Model Name Detection Hmean (%) GPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
CPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
Model Storage Size yaml File Model Download Link
PP-OCRv4_mobile_seal_det 96.47 7.82 / 3.09 48.28 / 23.97 4.7M PP-OCRv4_mobile_seal_det.yaml Inference Model/Training Model
PP-OCRv4_server_seal_det 98.21 74.75 / 67.72 382.55 / 382.55 108.3 M PP-OCRv4_server_seal_det.yaml Inference Model/Training Model

Note: The evaluation set for the above precision metrics is the seal dataset built by PaddleX, which includes 500 seal images.

Text Recognition Module

  • Chinese Text Recognition Models
Model Recognition Avg Accuracy(%) GPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
CPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
Model Storage Size (M) yaml File Model Download Link
PP-OCRv4_server_rec_doc 81.53 6.65 / 2.38 32.92 / 32.92 74.7 M PP-OCRv4_server_rec_doc.yaml Inference Model/Training Model
PP-OCRv4_mobile_rec 78.74 4.82 / 1.20 16.74 / 4.64 10.6 M PP-OCRv4_mobile_rec.yaml Inference Model/Training Model
PP-OCRv4_server_rec 80.61 6.58 / 2.43 33.17 / 33.17 71.2 M PP-OCRv4_server_rec.yaml Inference Model/Training Model
PP-OCRv3_mobile_rec 72.96 5.87 / 1.19 9.07 / 4.28 9.2 M PP-OCRv3_mobile_rec.yaml Inference Model/Training Model

Note: The evaluation set for the above accuracy metrics is a Chinese dataset built by PaddleOCR, covering multiple scenarios such as street view, web images, documents, and handwriting, with 8367 images for text recognition. All models' GPU inference times are based on an NVIDIA Tesla T4 machine with FP32 precision, while CPU inference speeds are based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision.

Model Recognition Avg Accuracy(%) GPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
CPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
Model Storage Size (M) yaml File Model Download Link
ch_SVTRv2_rec 68.81 8.08 / 2.74 50.17 / 42.50 73.9 M ch_SVTRv2_rec.yaml Inference Model/Training Model

Note: The evaluation dataset for the above accuracy metrics is the PaddleOCR Algorithm Model Challenge - Task 1: OCR End-to-End Recognition Task Leaderboard A. All model GPU inference times are based on an NVIDIA Tesla T4 machine with FP32 precision. CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision.

Model Recognition Avg Accuracy(%) GPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
CPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
Model Storage Size (M) yaml File Model Download Link
ch_RepSVTR_rec 65.07 5.93 / 1.62 20.73 / 7.32 22.1 M ch_RepSVTR_rec.yaml Inference Model/Training Model

Note: The evaluation dataset for the above accuracy metrics is the PaddleOCR Algorithm Model Challenge - Task 1: OCR End-to-End Recognition Task Leaderboard B. All model GPU inference times are based on an NVIDIA Tesla T4 machine with FP32 precision. CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision.

English Recognition Model

Model Recognition Avg Accuracy(%) GPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
CPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
Model Storage Size (M) yaml File Model Download Link
en_PP-OCRv4_mobile_rec 70.39 4.81 / 0.75 16.10 / 5.31 6.8 M en_PP-OCRv4_mobile_rec.yaml Inference Model/Training Model
en_PP-OCRv3_mobile_rec 70.69 5.44 / 0.75 8.65 / 5.57 7.8 M en_PP-OCRv3_mobile_rec.yaml Inference Model/Training Model

Note: The evaluation set for the above accuracy metrics is an English dataset built by PaddleX. All model GPU inference times are based on NVIDIA Tesla T4 machines, with precision type FP32. CPU inference speed is based on Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz, with 8 threads, and precision type FP32.

Multilingual Recognition Model

Model Recognition Avg Accuracy(%) GPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
CPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
Model Storage Size (M) yaml File Model Download Link
korean_PP-OCRv3_mobile_rec 60.21 5.40 / 0.97 9.11 / 4.05 8.6 M korean_PP-OCRv3_mobile_rec.yaml Inference Model/Training Model
japan_PP-OCRv3_mobile_rec 45.69 5.70 / 1.02 8.48 / 4.07 8.8 M japan_PP-OCRv3_mobile_rec.yaml Inference Model/Training Model
chinese_cht_PP-OCRv3_mobile_rec 82.06 5.90 / 1.28 9.28 / 4.34 9.7 M chinese_cht_PP-OCRv3_mobile_rec.yaml Inference Model/Training Model
te_PP-OCRv3_mobile_rec 95.88 5.42 / 0.82 8.10 / 6.91 7.8 M te_PP-OCRv3_mobile_rec.yaml Inference Model/Training Model
ka_PP-OCRv3_mobile_rec 96.96 5.25 / 0.79 9.09 / 3.86 8.0 M ka_PP-OCRv3_mobile_rec.yaml Inference Model/Training Model
ta_PP-OCRv3_mobile_rec 76.83 5.23 / 0.75 10.13 / 4.30 8.0 M ta_PP-OCRv3_mobile_rec.yaml Inference Model/Training Model
latin_PP-OCRv3_mobile_rec 76.93 5.20 / 0.79 8.83 / 7.15 7.8 M latin_PP-OCRv3_mobile_rec.yaml Inference Model/Training Model
arabic_PP-OCRv3_mobile_rec 73.55 5.35 / 0.79 8.80 / 4.56 7.8 M arabic_PP-OCRv3_mobile_rec.yaml Inference Model/Training Model
cyrillic_PP-OCRv3_mobile_rec 94.28 5.23 / 0.76 8.89 / 3.88 7.9 M cyrillic_PP-OCRv3_mobile_rec.yaml Inference Model/Training Model
devanagari_PP-OCRv3_mobile_rec 96.44 5.22 / 0.79 8.56 / 4.06 7.9 M devanagari_PP-OCRv3_mobile_rec.yaml Inference Model/Training Model

Note: The evaluation set for the above accuracy metrics is a multi-language dataset built by PaddleX. All model GPU inference times are based on NVIDIA Tesla T4 machines, with precision type FP32. CPU inference speed is based on Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz, with 8 threads, and precision type FP32.

Formula Recognition Module

Model Avg-BLEU(%) GPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
CPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
Model Storage Size (M) yaml File Model Download Link
UniMERNet 86.13 2266.96/- -/- 1.4 G UniMERNet.yaml Inference Model/Training Model
PP-FormulaNet-S 87.12 202.25/- -/- 167.9 M PP-FormulaNet-S.yaml Inference Model/Training Model
PP-FormulaNet-L 92.13 1976.52/- -/- 535.2 M PP-FormulaNet-L.yaml Inference Model/Training Model
LaTeX_OCR_rec 71.63 -/- -/- 89.7 M LaTeX_OCR_rec.yaml Inference Model/Training Model

Note: The above accuracy metrics are measured from the internal formula recognition test set of PaddleX. The BLEU score of LaTeX_OCR_rec on the LaTeX-OCR formula recognition test set is 0.8821. All model GPU inference times are based on Tesla V100 GPUs, with precision type FP32.

Table Structure Recognition Module

Model Accuracy (%) GPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
CPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
Model Storage Size (M) yaml File Model Download Link
SLANet 59.52 103.08 / 103.08 197.99 / 197.99 6.9 M SLANet.yaml Inference Model/Training Model
SLANet_plus 63.69 140.29 / 140.29 195.39 / 195.39 6.9 M SLANet_plus.yaml Inference Model/Training Model
SLANeXt_wired 69.65 -- -- -- SLANeXt_wired.yaml Inference Model/Training Model
SLANeXt_wireless SLANeXt_wireless.yaml Inference Model/Training Model

Note: The above accuracy metrics are measured from the high-difficulty Chinese table recognition dataset built internally by PaddleX. All model GPU inference times are based on an NVIDIA Tesla T4 machine with FP32 precision type. CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision type.

Table Cell Detection Module

ModelModel Download Link mAP(%) GPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
CPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
Model Storage Size (M) Introduction
RT-DETR-L_wired_table_cell_det Inference Model/Training Model 82.7 35.00 / 10.45 495.51 / 495.51 124M RT-DETR is the first real-time end-to-end object detection model. The Baidu PaddlePaddle Vision Team, based on RT-DETR-L as the base model, has completed pretraining on a self-built table cell detection dataset, achieving good performance for both wired and wireless table cell detection.
RT-DETR-L_wireless_table_cell_det Inference Model/Training Model

Note: The above accuracy metrics are measured from the internal table cell detection dataset of PaddleX. All model GPU inference times are based on an NVIDIA Tesla T4 machine, with precision type FP32. CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz, with 8 threads, and precision type FP32.

Table Classification Module

Model Top1 Acc(%) GPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
CPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
Model Storage Size (M) yaml File Model Download Link
PP-LCNet_x1_0_table_cls -- -- -- -- PP-LCNet_x1_0_table_cls.yaml Inference Model/Training Model

Note: The above accuracy metrics are measured from the internal table classification dataset built by PaddleX. All model GPU inference times are based on an NVIDIA Tesla T4 machine with FP32 precision. CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision.

Text Image Unwarping Module

Model Name MS-SSIM (%) GPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
CPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
Model Storage Size yaml File Model Download Link
UVDoc 54.40 16.27 / 7.76 176.97 / 80.60 30.3 M UVDoc.yaml Inference Model/Training Model

Note: The above accuracy metrics are measured from the image unwarping dataset built by PaddleX.

Layout Detection Module

  • Table Layout Detection Model
Model mAP(0.5) (%) GPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
CPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
Model Storage Size (M) yaml File Model Download Link
PicoDet_layout_1x_table 97.5 8.02 / 3.09 23.70 / 20.41 7.4 M PicoDet_layout_1x_table.yaml Inference Model/Training Model

Note: The evaluation set for the above accuracy metrics is the layout table area detection dataset built by PaddleOCR, which contains 7835 images of document types with tables in both Chinese and English. The GPU inference time is based on an NVIDIA Tesla T4 machine with FP32 precision, and the CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision. 3 types of layout detection models, including tables, images, and seals

Model mAP(0.5) (%) GPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
CPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
Model Storage Size (M) yaml File Model Download Link
PicoDet-S_layout_3cls 88.2 8.99 / 2.22 16.11 / 8.73 4.8 PicoDet-S_layout_3cls.yaml Inference Model/Training Model
PicoDet-L_layout_3cls 89.0 13.05 / 4.50 41.30 / 41.30 22.6 PicoDet-L_layout_3cls.yaml Inference Model/Training Model
RT-DETR-H_layout_3cls 95.8 114.93 / 27.71 947.56 / 947.56 470.1 RT-DETR-H_layout_3cls.yaml Inference Model/Training Model

Note: The evaluation dataset for the above accuracy metrics is the layout area detection dataset built by PaddleOCR, which includes 1,154 common types of document images such as Chinese and English papers, magazines, and research reports. The GPU inference time is based on an NVIDIA Tesla T4 machine with FP32 precision, and the CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision.

  • 5-class English document layout detection model, including text, title, table, image, and list
Model mAP(0.5) (%) GPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
CPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
Model Size (M) yaml File Model Download Link
PicoDet_layout_1x 97.8 9.03 / 3.10 25.82 / 20.70 7.4 PicoDet_layout_1x.yaml Inference Model/Training Model

Note: The evaluation dataset for the above accuracy metrics is the PubLayNet evaluation dataset, which contains 11,245 images of English documents. The GPU inference time is based on an NVIDIA Tesla T4 machine with FP32 precision. The CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision.

  • 17-class layout detection model, including 17 common layout categories: paragraph title, image, text, number, abstract, content, figure title, formula, table, table title, reference, document title, footnote, header, algorithm, footer, and seal
Model mAP(0.5) (%) GPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
CPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
Model Size (M) yaml File Model Download Link
PicoDet-S_layout_17cls 87.4 9.11 / 2.12 15.42 / 9.12 4.8 PicoDet-S_layout_17cls.yaml Inference Model/Training Model
PicoDet-L_layout_17cls 89.0 17.2 160.2 22.6 PicoDet-L_layout_17cls.yaml Inference Model/Training Model RT-DETR-H_layout_17cls 98.3 115.1 3827.2 470.2 RT-DETR-H_layout_17cls.yaml Inference Model/Training Model

Note: The evaluation set for the above accuracy metrics is the layout area detection dataset built by PaddleOCR, which includes 892 images of common document types such as Chinese and English papers, magazines, and research reports. The GPU inference time is based on an NVIDIA Tesla T4 machine with FP32 precision. The CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision.

Document Image Orientation Classification Module

Model Top-1 Acc (%) GPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
CPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
Model Size (M) yaml File Model Download Link
PP-LCNet_x1_0_doc_ori 99.06 2.31 / 0.43 3.37 / 1.27 7 PP-LCNet_x1_0_doc_ori.yaml Inference Model/Training Model

Note: The evaluation set for the above accuracy metrics is a self-built dataset covering multiple scenarios such as documents and certificates, with 1000 images. The GPU inference time is based on an NVIDIA Tesla T4 machine with FP32 precision. The CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision.

Text Line Orientation Classification Module

Model Top-1 Acc (%) GPU Inference Time (ms)
[Standard Mode / High-Performance Mode]
CPU Inference Time (ms)
[Standard Mode / High-Performance Mode]
Model Storage Size (M) YAML File Model Download Link
PP-LCNet_x1_0_doc_ori 99.06 2.31 / 0.43 3.37 / 1.27 7 PP-LCNet_x0_25_textline_ori.yaml Inference Model/Training Model

Note: The evaluation dataset for the above accuracy metrics is a self-built dataset covering multiple scenarios such as certificates and documents, with 1,000 images. The GPU inference time is based on an NVIDIA Tesla T4 machine with FP32 precision, and the CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision.

Time Series Forecasting Module

Model Name mse mae Model Storage Size yaml File Model Download Link
DLinear 0.382 0.394 72 K DLinear.yaml Inference Model/Training Model
NLinear 0.386 0.392 40 K NLinear.yaml Inference Model/Training Model
Nonstationary 0.600 0.515 55.5 M Nonstationary.yaml Inference Model/Training Model
PatchTST 0.379 0.391 2.0 M PatchTST.yaml Inference Model/Training Model
RLinear 0.385 0.392 40 K RLinear.yaml Inference Model/Training Model
TiDE 0.407 0.414 31.7 M TiDE.yaml Inference Model/Training Model
TimesNet 0.416 0.429 4.9 M TimesNet.yaml Inference Model/Training Model

Note: The above accuracy metrics are measured from the ETTH1 dataset (evaluation results on the test.csv test set).

Time Series Anomaly Detection Module

Model Name Precision Recall F1 Score Model Storage Size YAML File Model Download Link
AutoEncoder_ad 99.36 84.36 91.25 52 K AutoEncoder_ad.yaml Inference Model/Training Model
DLinear_ad 98.98 93.96 96.41 112 K DLinear_ad.yaml Inference Model/Training Model
Nonstationary_ad 98.55 88.95 93.51 1.8 M Nonstationary_ad.yaml Inference Model/Training Model
PatchTST_ad 98.78 90.70 94.57 320 K PatchTST_ad.yaml Inference Model/Training Model

Note: The above precision metrics are measured from the PSM dataset.

Time Series Classification Module

Model Name acc(%) Model Storage Size yaml File Model Download Link
TimesNet_cls 87.5 792 K TimesNet_cls.yaml Inference Model/Training Model

Note: The above accuracy metrics are measured from the UWaveGestureLibrary dataset.

>Note: The GPU inference time for all models above is based on an NVIDIA Tesla T4 machine with FP32 precision. The CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision.

Multilingual Speech Recognition Module

Model Training Data Model Size Word Error Rate YAML File Model Download Link
whisper_large 680kh 5.8G 2.7 (Librispeech) whisper_large.yaml Inference Model
whisper_medium 680kh 2.9G - whisper_medium.yaml Inference Model
whisper_small 680kh 923M - whisper_small.yaml Inference Model
whisper_base 680kh 277M - whisper_base.yaml Inference Model
whisper_tiny 680kh 145M - whisper_small.yaml Inference Model

Video Classification Module

Model Top1 Acc(%) Model Storage Size (M) yaml File Model Download Link
PP-TSM-R50_8frames_uniform 74.36 93.4 M PP-TSM-R50_8frames_uniform.yaml Inference Model/Training Model
PP-TSMv2-LCNetV2_8frames_uniform 71.71 22.5 M PP-TSMv2-LCNetV2_8frames_uniform.yaml Inference Model/Training Model
PP-TSMv2-LCNetV2_16frames_uniform 73.11 22.5 M PP-TSMv2-LCNetV2_16frames_uniform.yaml Inference Model/Training Model

Note: The above accuracy metrics are based on the K400 validation set Top1 Acc.

Video Detection Module

Model Frame-mAP(@ IoU 0.5) Model Storage Size (M) yaml File Model Download Link
YOWO 80.94 462.891M YOWO.yaml Inference Model/Training Model

Note: The above accuracy metrics are based on the test dataset UCF101-24, using the Frame-mAP (@ IoU 0.5) metric. All model GPU inference times are based on an NVIDIA Tesla T4 machine with FP32 precision. CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision.

Comments