PaddleX Model List (CPU/GPU)¶
PaddleX includes multiple production lines, each containing several modules, and each module includes several models. You can choose which models to use based on the benchmark data below. If you prioritize model accuracy, choose models with higher accuracy. If you prioritize model inference speed, choose models with faster inference speed. If you prioritize model storage size, choose models with smaller storage size.
Image Classification Module¶
Model Name | Top1 Acc (%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size | yaml File | Model Download Link |
---|---|---|---|---|---|---|
CLIP_vit_base_patch16_224 | 85.36 | 12.84 / 2.82 | 60.52 / 60.52 | 306.5 M | CLIP_vit_base_patch16_224.yaml | Inference Model/Training Model |
CLIP_vit_large_patch14_224 | 88.1 | 51.72 / 11.13 | 238.07 / 238.07 | 1.04 G | CLIP_vit_large_patch14_224.yaml | Inference Model/Training Model |
ConvNeXt_base_224 | 83.84 | 13.18 / 12.14 | 128.39 / 81.78 | 313.9 M | ConvNeXt_base_224.yaml | Inference Model/Training Model |
ConvNeXt_base_384 | 84.90 | 32.15 / 30.52 | 279.36 / 220.35 | 313.9 M | ConvNeXt_base_384.yaml | Inference Model/Training Model |
ConvNeXt_large_224 | 84.26 | 26.51 / 7.21 | 213.32 / 157.22 | 700.7 M | ConvNeXt_large_224.yaml | Inference Model/Training Model |
ConvNeXt_large_384 | 85.27 | 67.07 / 65.26 | 494.04 / 438.97 | 700.7 M | ConvNeXt_large_384.yaml | Inference Model/Training Model |
ConvNeXt_small | 83.13 | 9.05 / 8.21 | 97.94 / 55.29 | 178.0 M | ConvNeXt_small.yaml | Inference Model/Training Model |
ConvNeXt_tiny | 82.03 | 5.12 / 2.06 | 63.96 / 29.77 | 101.4 M | ConvNeXt_tiny.yaml | Inference Model/Training Model |
FasterNet-L | 83.5 | 15.67 / 3.10 | 52.24 / 52.24 | 357.1 M | FasterNet-L.yaml | Inference Model/Training Model |
FasterNet-M | 83.0 | 9.72 / 2.30 | 35.29 / 35.29 | 204.6 M | FasterNet-M.yaml | Inference Model/Training Model |
FasterNet-S | 81.3 | 5.46 / 1.27 | 20.46 / 18.03 | 119.3 M | FasterNet-S.yaml | Inference Model/Training Model |
FasterNet-T0 | 71.9 | 4.18 / 0.60 | 6.34 / 3.44 | 15.1 M | FasterNet-T0.yaml | Inference Model/Training Model |
FasterNet-T1 | 75.9 | 4.24 / 0.64 | 9.57 / 5.20 | 29.2 M | FasterNet-T1.yaml | Inference Model/Training Model |
FasterNet-T2 | 79.1 | 3.87 / 0.78 | 11.14 / 9.98 | 57.4 M | FasterNet-T2.yaml | Inference Model/Training Model |
MobileNetV1_x0_5 | 63.5 | 1.39 / 0.28 | 2.74 / 1.02 | 4.8 M | MobileNetV1_x0_5.yaml | Inference Model/Training Model |
MobileNetV1_x0_25 | 51.4 | 1.32 / 0.30 | 2.04 / 0.58 | 1.8 M | MobileNetV1_x0_25.yaml | Inference Model/Training Model |
MobileNetV1_x0_75 | 68.8 | 1.75 / 0.33 | 3.41 / 1.57 | 9.3 M | MobileNetV1_x0_75.yaml | Inference Model/Training Model |
MobileNetV1_x1_0 | 71.0 | 1.89 / 0.34 | 4.01 / 2.17 | 15.2 M | MobileNetV1_x1_0.yaml | Inference Model/Training Model |
MobileNetV2_x0_5 | 65.0 | 3.17 / 0.48 | 4.52 / 1.35 | 7.1 M | MobileNetV2_x0_5.yaml | Inference Model/Training Model |
MobileNetV2_x0_25 | 53.2 | 2.80 / 0.46 | 3.92 / 0.98 | 5.5 M | MobileNetV2_x0_25.yaml | Inference Model/Training Model |
MobileNetV2_x1_0 | 72.2 | 3.57 / 0.49 | 5.63 / 2.51 | 12.6 M | MobileNetV2_x1_0.yaml | Inference Model/Training Model |
MobileNetV2_x1_5 | 74.1 | 3.58 / 0.62 | 8.02 / 4.49 | 25.0 M | MobileNetV2_x1_5.yaml | Inference Model/Training Model |
MobileNetV2_x2_0 | 75.2 | 3.56 / 0.74 | 10.24 / 6.83 | 41.2 M | MobileNetV2_x2_0.yaml | Inference Model/Training Model |
MobileNetV3_large_x0_5 | 69.2 | 3.79 / 0.62 | 6.76 / 1.61 | 9.6 M | MobileNetV3_large_x0_5.yaml | Inference Model/Training Model |
MobileNetV3_large_x0_35 | 64.3 | 3.70 / 0.60 | 5.54 / 1.41 | 7.5 M | MobileNetV3_large_x0_35.yaml | Inference Model/Training Model |
MobileNetV3_large_x0_75 | 73.1 | 4.82 / 0.66 | 7.45 / 2.00 | 14.0 M | MobileNetV3_large_x0_75.yaml | Inference Model/Training Model |
MobileNetV3_large_x1_0 | 75.3 | 4.86 / 0.68 | 6.88 / 2.61 | 19.5 M | MobileNetV3_large_x1_0.yaml | Inference Model/Training Model |
MobileNetV3_large_x1_25 | 76.4 | 5.08 / 0.71 | 7.37 / 3.58 | 26.5 M | MobileNetV3_large_x1_25.yaml | Inference Model/Training Model |
MobileNetV3_small_x0_5 | 59.2 | 3.41 / 0.57 | 5.60 / 1.14 | 6.8 M | MobileNetV3_small_x0_5.yaml | Inference Model/Training Model |
MobileNetV3_small_x0_35 | 53.0 | 3.49 / 0.60 | 4.63 / 1.07 | 6.0 M | MobileNetV3_small_x0_35.yaml | Inference Model/Training Model |
MobileNetV3_small_x0_75 | 66.0 | 3.49 / 0.60 | 5.19 / 1.28 | 8.5 M | MobileNetV3_small_x0_75.yaml | Inference Model/Training Model |
MobileNetV3_small_x1_0 | 68.2 | 3.76 / 0.53 | 5.11 / 1.43 | 10.5 M | MobileNetV3_small_x1_0.yaml | Inference Model/Training Model |
MobileNetV3_small_x1_25 | 70.7 | 4.23 / 0.58 | 6.48 / 1.68 | 13.0 M | MobileNetV3_small_x1_25.yaml | Inference Model/Training Model |
MobileNetV4_conv_large | 83.4 | 8.33 / 2.24 | 33.56 / 23.70 | 125.2 M | MobileNetV4_conv_large.yaml | Inference Model/Training Model |
MobileNetV4_conv_medium | 79.9 | 6.81 / 0.92 | 12.47 / 6.27 | 37.6 M | MobileNetV4_conv_medium.yaml | Inference Model/Training Model |
MobileNetV4_conv_small | 74.6 | 3.25 / 0.46 | 4.42 / 1.54 | 14.7 M | MobileNetV4_conv_small.yaml | Inference Model/Training Model |
MobileNetV4_hybrid_large | 83.8 | 12.27 / 4.18 | 58.64 / 58.64 | 145.1 M | MobileNetV4_hybrid_large.yaml | Inference Model/Training Model |
MobileNetV4_hybrid_medium | 80.5 | 12.08 / 1.34 | 24.69 / 8.10 | 42.9 M | MobileNetV4_hybrid_medium.yaml | Inference Model/Training Model |
PP-HGNet_base | 85.0 | 14.10 / 4.19 | 68.92 / 68.92 | 249.4 M | PP-HGNet_base.yaml | Inference Model/Training Model |
PP-HGNet_small | 81.51 | 5.12 / 1.73 | 25.01 / 25.01 | 86.5 M | PP-HGNet_small.yaml | Inference Model/Training Model |
PP-HGNet_tiny | 79.83 | 3.28 / 1.29 | 16.40 / 15.97 | 52.4 M | PP-HGNet_tiny.yaml | Inference Model/Training Model |
PP-HGNetV2-B0 | 77.77 | 3.83 / 0.57 | 9.95 / 2.37 | 21.4 M | PP-HGNetV2-B0.yaml | Inference Model/Training Model |
PP-HGNetV2-B1 | 79.18 | 3.87 / 0.62 | 8.77 / 3.79 | 22.6 M | PP-HGNetV2-B1.yaml | Inference Model/Training Model |
PP-HGNetV2-B2 | 81.74 | 5.73 / 0.86 | 15.11 / 7.05 | 39.9 M | PP-HGNetV2-B2.yaml | Inference Model/Training Model |
PP-HGNetV2-B3 | 82.98 | 6.26 / 1.01 | 18.47 / 10.34 | 57.9 M | PP-HGNetV2-B3.yaml | Inference Model/Training Model |
PP-HGNetV2-B4 | 83.57 | 5.47 / 1.10 | 14.42 / 9.89 | 70.4 M | PP-HGNetV2-B4.yaml | Inference Model/Training Model |
PP-HGNetV2-B5 | 84.75 | 10.24 / 1.96 | 29.71 / 29.71 | 140.8 M | PP-HGNetV2-B5.yaml | Inference Model/Training Model |
PP-HGNetV2-B6 | 86.30 | 12.25 / 3.76 | 62.29 / 62.29 | 268.4 M | PP-HGNetV2-B6.yaml | Inference Model/Training Model |
PP-LCNet_x0_5 | 63.14 | 2.28 / 0.42 | 2.86 / 0.83 | 6.7 M | PP-LCNet_x0_5.yaml | Inference Model/Training Model |
PP-LCNet_x0_25 | 51.86 | 1.89 / 0.45 | 2.49 / 0.68 | 5.5 M | PP-LCNet_x0_25.yaml | Inference Model/Training Model |
PP-LCNet_x0_35 | 58.09 | 1.94 / 0.41 | 2.73 / 0.77 | 5.9 M | PP-LCNet_x0_35.yaml | Inference Model/Training Model |
PP-LCNet_x0_75 | 68.18 | 2.30 / 0.41 | 2.95 / 1.07 | 8.4 M | PP-LCNet_x0_75.yaml | Inference Model/Training Model |
PP-LCNet_x1_0 | 71.32 | 2.35 / 0.47 | 4.03 / 1.35 | 10.5 M | PP-LCNet_x1_0.yaml | Inference Model/Training Model |
PP-LCNet_x1_5 | 73.71 | 2.33 / 0.53 | 4.17 / 2.29 | 16.0 M | PP-LCNet_x1_5.yaml | Inference Model/Training Model |
PP-LCNet_x2_0 | 75.18 | 2.40 / 0.51 | 5.37 / 3.46 | 23.2 M | PP-LCNet_x2_0.yaml | Inference Model/Training Model |
PP-LCNet_x2_5 | 76.60 | 2.36 / 0.61 | 6.29 / 5.05 | 32.1 M | PP-LCNet_x2_5.yaml | Inference Model/Training Model |
PP-LCNetV2_base | 77.05 | 3.33 / 0.55 | 6.86 / 3.77 | 23.7 M | PP-LCNetV2_base.yaml | Inference Model/Training Model |
PP-LCNetV2_large | 78.51 | 4.37 / 0.71 | 9.43 / 8.07 | 37.3 M | PP-LCNetV2_large.yaml | Inference Model/Training Model |
PP-LCNetV2_small | 73.97 | 2.53 / 0.41 | 5.14 / 1.98 | 14.6 M | PP-LCNetV2_small.yaml | Inference Model/Training Model |
ResNet18_vd | 72.3 | 2.47 / 0.61 | 6.97 / 5.15 | 41.5 M | ResNet18_vd.yaml | Inference Model/Training Model |
ResNet18 | 71.0 | 2.35 / 0.67 | 6.35 / 4.61 | 41.5 M | ResNet18.yaml | Inference Model/Training Model |
ResNet34_vd | 76.0 | 4.01 / 1.03 | 11.99 / 9.86 | 77.3 M | ResNet34_vd.yaml | Inference Model/Training Model |
ResNet34 | 74.6 | 3.99 / 1.02 | 12.42 / 9.81 | 77.3 M | ResNet34.yaml | Inference Model/Training Model |
ResNet50_vd | 79.1 | 6.04 / 1.16 | 16.08 / 12.07 | 90.8 M | ResNet50_vd.yaml | Inference Model/Training Model |
ResNet50 | 76.5 | 6.44 / 1.16 | 15.04 / 11.63 | 90.8 M | ResNet50.yaml | Inference Model/Training Model |
ResNet101_vd | 80.2 | 11.16 / 2.07 | 32.14 / 32.14 | 158.4 M | ResNet101_vd.yaml | Inference Model/Training Model |
ResNet101 | 77.6 | 10.91 / 2.06 | 31.14 / 22.93 | 158.7 M | ResNet101.yaml | Inference Model/Training Model |
ResNet152_vd | 80.6 | 15.96 / 2.99 | 49.33 / 49.33 | 214.3 M | ResNet152_vd.yaml | Inference Model/Training Model |
ResNet152 | 78.3 | 15.61 / 2.90 | 47.33 / 36.60 | 214.2 M | ResNet152.yaml | Inference Model/Training Model |
ResNet200_vd | 80.9 | 24.20 / 3.69 | 62.62 / 62.62 | 266.0 M | ResNet200_vd.yaml | Inference Model/Training Model |
StarNet-S1 | 73.6 | 6.33 / 1.98 | 7.56 / 3.26 | 11.2 M | StarNet-S1.yaml | Inference Model/Training Model |
StarNet-S2 | 74.8 | 4.49 / 1.55 | 7.38 / 3.38 | 14.3 M | StarNet-S2.yaml | Inference Model/Training Model |
StarNet-S3 | 77.0 | 6.70 / 1.62 | 11.05 / 4.76 | 22.2 M | StarNet-S3.yaml | Inference Model/Training Model |
StarNet-S4 | 79.0 | 8.50 / 2.86 | 15.40 / 6.76 | 28.9 M | StarNet-S4.yaml | Inference Model/Training Model |
SwinTransformer_base_patch4_window7_224 | 83.37 | 14.29 / 5.13 | 130.89 / 130.89 | 310.5 M | SwinTransformer_base_patch4_window7_224.yaml | Inference Model/Training Model |
SwinTransformer_base_patch4_window12_384 | 84.17 | 37.74 / 10.10 | 362.56 / 362.56 | 311.4 M | SwinTransformer_base_patch4_window12_384.yaml | Inference Model/Training Model |
SwinTransformer_large_patch4_window7_224 | 86.19 | 26.48 / 7.94 | 228.23 / 228.23 | 694.8 M | SwinTransformer_large_patch4_window7_224.yaml | Inference Model/Training Model |
SwinTransformer_large_patch4_window12_384 | 87.06 | 74.72 / 18.16 | 652.04 / 652.04 | 696.1 M | SwinTransformer_large_patch4_window12_384.yaml | Inference Model/Training Model |
SwinTransformer_small_patch4_window7_224 | 83.21 | 10.37 / 3.90 | 94.20 / 94.20 | 175.6 M | SwinTransformer_small_patch4_window7_224.yaml | Inference Model/Training Model |
SwinTransformer_tiny_patch4_window7_224 | 81.10 | 6.66 / 2.15 | 60.45 / 60.45 | 100.1 M | SwinTransformer_tiny_patch4_window7_224.yaml | Inference Model/Training Model |
Note: The above accuracy metrics are based on the ImageNet-1k validation set Top1 Acc.
Image Multi-label Classification Module¶
Model Name | mAP (%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size | yaml File | Model Download Link |
---|---|---|---|---|---|---|
CLIP_vit_base_patch16_448_ML | 89.15 | 54.75 / 14.30 | 280.23 / 280.23 | 325.6 M | CLIP_vit_base_patch16_448_ML.yaml | Inference Model/Training Model |
PP-HGNetV2-B0_ML | 80.98 | 6.47 / 1.38 | 21.56 / 13.69 | 39.6 M | PP-HGNetV2-B0_ML.yaml | Inference Model/Training Model |
PP-HGNetV2-B4_ML | 87.96 | 9.63 / 2.79 | 43.98 / 36.63 | 88.5 M | PP-HGNetV2-B4_ML.yaml | Inference Model/Training Model |
PP-HGNetV2-B6_ML | 91.06 | 37.07 / 9.43 | 188.58 / 188.58 | 286.5 M | PP-HGNetV2-B6_ML.yaml | Inference Model/Training Model |
PP-LCNet_x1_0_ML | 77.96 | 4.04 / 1.15 | 11.76 / 8.32 | 29.4 M | PP-LCNet_x1_0_ML.yaml | Inference Model/Training Model |
ResNet50_ML | 83.42 | 12.12 / 3.27 | 51.79 / 44.36 | 108.9 M | ResNet50_ML.yaml | Inference Model/Training Model |
Note: The above accuracy metrics are for the multi-label classification task mAP on COCO2017.
Pedestrian Attribute Module¶
Model Name | mA (%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Size | yaml File | Model Download Link |
---|---|---|---|---|---|---|
PP-LCNet_x1_0_pedestrian_attribute | 92.2 | 2.35 / 0.49 | 3.17 / 1.25 | 6.7 M | PP-LCNet_x1_0_pedestrian_attribute.yaml | Inference Model/Training Model |
Note: The above accuracy metrics are for the internal PaddleX dataset mA.
Vehicle Attribute Module¶
Model Name | mA (%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size | yaml File | Model Download Link |
---|---|---|---|---|---|---|
PP-LCNet_x1_0_vehicle_attribute | 91.7 | 2.32 / 2.32 | 3.22 / 1.26 | 6.7 M | PP-LCNet_x1_0_vehicle_attribute.yaml | Inference Model/Training Model |
Note: The above accuracy metrics are based on the VeRi dataset mA.
Image Feature Module¶
Model Name | recall@1 (%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size | yaml File | Model Download Link |
---|---|---|---|---|---|---|
PP-ShiTuV2_rec | 84.2 | 3.48 / 0.55 | 8.04 / 4.04 | 16.3 M | PP-ShiTuV2_rec.yaml | Inference Model/Training Model |
PP-ShiTuV2_rec_CLIP_vit_base | 88.69 | 12.94 / 2.88 | 58.36 / 58.36 | 306.6 M | PP-ShiTuV2_rec_CLIP_vit_base.yaml | Inference Model/Training Model |
PP-ShiTuV2_rec_CLIP_vit_large | 91.03 | 51.65 / 11.18 | 255.78 / 255.78 | 1.05 G | PP-ShiTuV2_rec_CLIP_vit_large.yaml | Inference Model/Training Model |
Note: The above accuracy metrics are based on the AliProducts recall@1.
Document Orientation Classification Module¶
Model Name | Top-1 Acc (%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size | yaml File | Model Download Link |
---|---|---|---|---|---|---|
PP-LCNet_x1_0_doc_ori | 99.06 | 2.31 / 0.43 | 3.37 / 1.27 | 7 | PP-LCNet_x1_0_doc_ori.yaml | Inference Model/Training Model |
Note: The above accuracy metrics are based on the Top-1 Acc of the internal dataset of PaddleX.
Face Feature Module¶
Model Name | Output Feature Dimension | Acc (%) AgeDB-30/CFP-FP/LFW |
GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (M) | yaml File | Model Download Link |
---|---|---|---|---|---|---|---|
MobileFaceNet | 128 | 96.28/96.71/99.58 | 3.16 / 0.48 | 6.49 / 6.49 | 4.1 | MobileFaceNet.yaml | Inference Model/Training Model |
ResNet50_face | 512 | 98.12/98.56/99.77 | 5.68 / 1.09 | 14.96 / 11.90 | 87.2 | ResNet50_face.yaml | Inference Model/Training Model |
Note: The above accuracy metrics are measured on the AgeDB-30, CFP-FP, and LFW datasets.
Main Body Detection Module¶
Model Name | mAP (%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size | yaml File | Model Download Link |
---|---|---|---|---|---|---|
PP-ShiTuV2_det | 41.5 | 12.79 / 4.51 | 44.14 / 44.14 | 27.54 | PP-ShiTuV2_det.yaml | Inference Model/Training Model |
Note: The above accuracy metrics are based on the PaddleClas Main Body Detection Dataset mAP(0.5:0.95).
Object Detection Module¶
Model Name | mAP (%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size | yaml File | Model Download Link |
---|---|---|---|---|---|---|
Cascade-FasterRCNN-ResNet50-FPN | 41.1 | 135.92 / 135.92 | - | 245.4 M | Cascade-FasterRCNN-ResNet50-FPN.yaml | Inference Model/Training Model |
Cascade-FasterRCNN-ResNet50-vd-SSLDv2-FPN | 45.0 | 138.23 / 138.23 | - | 246.2 M | Cascade-FasterRCNN-ResNet50-vd-SSLDv2-FPN.yaml | Inference Model/Training Model |
CenterNet-DLA-34 | 37.6 | - | - | 75.4 M | CenterNet-DLA-34.yaml | Inference Model/Training Model |
CenterNet-ResNet50 | 38.9 | - | - | 319.7 M | CenterNet-ResNet50.yaml | Inference Model/Training Model |
DETR-R50 | 42.3 | 62.91 / 17.33 | 392.63 / 392.63 | 159.3 M | DETR-R50.yaml | Inference Model/Training Model |
FasterRCNN-ResNet34-FPN | 37.8 | 83.33 / 31.64 | - | 137.5 M | FasterRCNN-ResNet34-FPN.yaml | Inference Model/Training Model |
FasterRCNN-ResNet50-FPN | 38.4 | 107.08 / 35.40 | - | 148.1 M | FasterRCNN-ResNet50-FPN.yaml | Inference Model/Training Model |
FasterRCNN-ResNet50-vd-FPN | 39.5 | 109.36 / 36.00 | - | 148.1 M | FasterRCNN-ResNet50-vd-FPN.yaml | Inference Model/Training Model |
FasterRCNN-ResNet50-vd-SSLDv2-FPN | 41.4 | 109.06 / 36.19 | - | 148.1 M | FasterRCNN-ResNet50-vd-SSLDv2-FPN.yaml | Inference Model/Training Model |
FasterRCNN-ResNet50 | 36.7 | 496.33 / 109.12 | - | 120.2 M | FasterRCNN-ResNet50.yaml | Inference Model/Training Model |
FasterRCNN-ResNet101-FPN | 41.4 | 148.21 / 42.21 | - | 216.3 M | FasterRCNN-ResNet101-FPN.yaml | Inference Model/Training Model |
FasterRCNN-ResNet101 | 39.0 | 538.58 / 120.88 | - | 188.1 M | FasterRCNN-ResNet101.yaml | Inference Model/Training Model |
FasterRCNN-ResNeXt101-vd-FPN | 43.4 | 258.01 / 58.25 | - | 360.6 M | FasterRCNN-ResNeXt101-vd-FPN.yaml | Inference Model/Training Model |
FasterRCNN-Swin-Tiny-FPN | 42.6 | - | - | 159.8 M | FasterRCNN-Swin-Tiny-FPN.yaml | Inference Model/Training Model |
FCOS-ResNet50 | 39.6 | 106.13 / 28.32 | 721.79 / 721.79 | 124.2 M | FCOS-ResNet50.yaml | Inference Model/Training Model |
PicoDet-L | 42.6 | 14.68 / 5.81 | 47.32 / 47.32 | 20.9 M | PicoDet-L.yaml | Inference Model/Training Model |
PicoDet-M | 37.5 | 9.62 / 3.23 | 23.75 / 14.88 | 16.8 M | PicoDet-M.yaml | Inference Model/Training Model |
PicoDet-S | 29.1 | 7.98 / 2.33 | 14.82 / 5.60 | 4.4 M | PicoDet-S.yaml | Inference Model/Training Model |
PicoDet-XS | 26.2 | 9.66 / 2.75 | 19.15 / 7.24 | 5.7 M | PicoDet-XS.yaml | Inference Model/Training Model |
PP-YOLOE_plus-L | 52.9 | 33.55 / 10.46 | 189.05 / 189.05 | 185.3 M | PP-YOLOE_plus-L.yaml | Inference Model/Training Model |
PP-YOLOE_plus-M | 49.8 | 19.52 / 7.46 | 113.36 / 113.36 | 83.2 M | PP-YOLOE_plus-M.yaml | Inference Model/Training Model |
PP-YOLOE_plus-S | 43.7 | 12.16 / 4.58 | 73.86 / 52.90 | 28.3 M | PP-YOLOE_plus-S.yaml | Inference Model/Training Model |
PP-YOLOE_plus-X | 54.7 | 58.87 / 15.84 | 292.93 / 292.93 | 349.4 M | PP-YOLOE_plus-X.yaml | Inference Model/Training Model |
RT-DETR-H | 56.3 | 115.92 / 28.16 | 971.32 / 971.32 | 435.8 M | RT-DETR-H.yaml | Inference Model/Training Model |
RT-DETR-L | 53.0 | 35.00 / 10.45 | 495.51 / 495.51 | 113.7 M | RT-DETR-L.yaml | Inference Model/Training Model |
RT-DETR-R18 | 46.5 | 20.21 / 6.23 | 266.01 / 266.01 | 70.7 M | RT-DETR-R18.yaml | Inference Model/Training Model |
RT-DETR-R50 | 53.1 | 42.14 / 11.31 | 523.97 / 523.97 | 149.1 M | RT-DETR-R50.yaml | Inference Model/Training Model |
RT-DETR-X | 54.8 | 61.24 / 15.83 | 647.08 / 647.08 | 232.9 M | RT-DETR-X.yaml | Inference Model/Training Model |
YOLOv3-DarkNet53 | 39.1 | 41.58 / 10.10 | 158.78 / 158.78 | 219.7 M | YOLOv3-DarkNet53.yaml | Inference Model/Training Model |
YOLOv3-MobileNetV3 | 31.4 | 16.53 / 5.70 | 60.44 / 60.44 | 83.8 M | YOLOv3-MobileNetV3.yaml | Inference Model/Training Model |
YOLOv3-ResNet50_vd_DCN | 40.6 | 32.91 / 10.07 | 225.72 / 224.32 | 163.0 M | YOLOv3-ResNet50_vd_DCN.yaml | Inference Model/Training Model |
YOLOX-L | 50.1 | 121.19 / 13.55 | 295.38 / 274.15 | 192.5 M | YOLOX-L.yaml | Inference Model/Training Model |
YOLOX-M | 46.9 | 87.19 / 10.09 | 183.95 / 172.67 | 90.0 M | YOLOX-M.yaml | Inference Model/Training Model |
YOLOX-N | 26.1 | 53.31 / 45.02 | 69.69 / 59.18 | 3.4M | YOLOX-N.yaml | Inference Model/Training Model |
YOLOX-S | 40.4 | 129.52 / 13.19 | 181.39 / 179.01 | 32.0 M | YOLOX-S.yaml | Inference Model/Training Model |
YOLOX-T | 32.9 | 66.81 / 61.31 | 92.30 / 83.90 | 18.1 M | YOLOX-T.yaml | Inference Model/Training Model |
YOLOX-X | 51.8 | 156.40 / 20.17 | 480.14 / 454.35 | 351.5 M | YOLOX-X.yaml | Inference Model/Training Model |
Note: The above accuracy metrics are based on the COCO2017 validation set mAP(0.5:0.95).
Small Object Detection Module¶
Model Name | mAP (%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Size | yaml File | Model Download Links |
---|---|---|---|---|---|---|
PP-YOLOE_plus_SOD-S | 25.1 | 135.68 / 122.94 | 188.09 / 107.74 | 77.3 M | PP-YOLOE_plus_SOD-S.yaml | Inference Model/Training Model |
Note: The above accuracy metrics are based on the validation set mAP(0.5:0.95) of VisDrone-DET.
Open-Vocabulary Object Detection¶
Model | mAP(0.5:0.95) | mAP(0.5) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Size (M) | Model Download Link |
---|---|---|---|---|---|---|
GroundingDINO-T | 49.4 | 64.4 | 253.72 | 1807.4 | 658.3 | Inference Model |
Note: The above accuracy metrics are based on the COCO val2017 validation set mAP(0.5:0.95). All models' GPU inference times are based on an NVIDIA Tesla T4 machine with FP32 precision. CPU inference speeds are based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision.
Open Vocabulary Segmentation¶
Model | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (M) | Model Download Link |
---|---|---|---|---|
SAM-H_box | 144.9 | 33920.7 | 2433.7 | Inference Model |
SAM-H_point | 144.9 | 33920.7 | 2433.7 | Inference Model |
Note: All model GPU inference times are based on NVIDIA Tesla T4, with precision type FP32. CPU inference speed is based on Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz, with 8 threads, and precision type FP32.
Rotated Object Detection¶
Model | mAP(%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (M) | yaml File | Model Download Link |
---|---|---|---|---|---|---|
PP-YOLOE-R-L | 78.14 | 20.7039 | 157.942 | 211.0 M | PP-YOLOE-R.yaml | Inference Model/Training Model |
Note: The above accuracy metrics are based on the DOTA validation set mAP(0.5:0.95). All model GPU inference times are based on NVIDIA TRX2080 Ti, with precision type F16. CPU inference speed is based on Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz, with 8 threads, and precision type FP32.
Pedestrian Detection Module¶
Model Name | mAP (%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size | yaml File | Model Download Link |
---|---|---|---|---|---|---|
PP-YOLOE-L_human | 48.0 | 33.27 / 9.19 | 173.72 / 173.72 | 196.1 M | PP-YOLOE-L_human.yaml | Inference Model/Training Model |
PP-YOLOE-S_human | 42.5 | 9.94 / 3.42 | 54.48 / 46.52 | 28.8 M | PP-YOLOE-S_human.yaml | Inference Model/Training Model |
Note: The above accuracy metrics are based on the validation set mAP(0.5:0.95) of CrowdHuman.
Vehicle Detection Module¶
Model Name | mAP (%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size | yaml File | Model Download Link |
---|---|---|---|---|---|---|
PP-YOLOE-L_vehicle | 63.9 | 32.84 / 9.03 | 176.60 / 176.60 | 196.1 M | PP-YOLOE-L_vehicle.yaml | Inference Model/Training Model |
PP-YOLOE-S_vehicle | 61.3 | 9.79 / 3.48 | 54.14 / 46.69 | 28.8 M | PP-YOLOE-S_vehicle.yaml | Inference Model/Training Model |
Note: The above precision metrics are based on the validation set mAP(0.5:0.95) of PPVehicle
Face Detection Module¶
Model Name | AP (%) Easy/Medium/Hard |
GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size | yaml File | Model Download Link |
---|---|---|---|---|---|---|
BlazeFace | 77.7/73.4/49.5 | 60.34 / 54.76 | 84.18 / 84.18 | 0.447 M | BlazeFace.yaml | Inference Model/Training Model |
BlazeFace-FPN-SSH | 83.2/80.5/60.5 | 69.29 / 63.42 | 86.96 / 86.96 | 0.606 M | BlazeFace-FPN-SSH.yaml | Inference Model/Training Model |
PicoDet_LCNet_x2_5_face | 93.7/90.7/68.1 | 35.37 / 12.88 | 126.24 / 126.24 | 28.9 M | PicoDet_LCNet_x2_5_face.yaml | Inference Model/Training Model |
PP-YOLOE_plus-S_face | 93.9/91.8/79.8 | 22.54 / 8.33 | 138.67 / 138.67 | 26.5 M | PP-YOLOE_plus-S_face | Inference Model/Training Model |
Note: The above precision metrics are evaluated on the WIDER-FACE validation set with an input size of 640x640.
Anomaly Detection Module¶
Model Name | mIoU | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size | yaml File | Model Download Link |
---|---|---|---|---|---|---|
STFPM | 0.9901 | 2.97 / 1.57 | 38.86 / 13.24 | 22.5 M | STFPM.yaml | Inference Model/Training Model |
Note: The above precision metrics are the average anomaly scores on the validation set of MVTec AD.
Model | Scheme | Input Size | AP(0.5:0.95) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (M) | yaml File | Model Download Link |
---|---|---|---|---|---|---|---|---|
PP-TinyPose_128x96 | Top-Down | 128*96 | 58.4 | 4.9 | PP-TinyPose_128x96.yaml | Inference Model/Training Model | ||
PP-TinyPose_256x192 | Top-Down | 256*192 | 68.3 | 4.9 | PP-TinyPose_256x192.yaml | Inference Model/Training Model |
Note: The above accuracy metrics are based on the COCO dataset AP(0.5:0.95), with detection boxes obtained from ground truth annotations. All GPU inference times are based on an NVIDIA Tesla T4 machine, with precision type FP32. CPU inference speeds are based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz, with 8 threads and precision type FP32.
3D Multi-modal Fusion Detection Module¶
Model | mAP(%) | NDS | yaml File | Model Download Link |
---|---|---|---|---|
BEVFusion | 53.9 | 60.9 | BEVFusion.yaml | Inference Model/Training Model |
Note: The above accuracy metrics are based on the nuscenes validation set with mAP(0.5:0.95) and NDS 60.9, and the precision type is FP32.
Semantic Segmentation Module¶
Model Name | mloU(%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size | yaml File | Model Download Link |
---|---|---|---|---|---|---|
Deeplabv3_Plus-R50 | 80.36 | 503.51 / 122.30 | 3543.91 / 3543.91 | 94.9 M | Deeplabv3_Plus-R50.yaml | Inference Model/Training Model |
Deeplabv3_Plus-R101 | 81.10 | 803.79 / 175.45 | 5136.21 / 5136.21 | 162.5 M | Deeplabv3_Plus-R101.yaml | Inference Model/Training Model |
Deeplabv3-R50 | 79.90 | 647.56 / 121.67 | 3803.09 / 3803.09 | 138.3 M | Deeplabv3-R50.yaml | Inference Model/Training Model |
Deeplabv3-R101 | 80.85 | 950.43 / 178.50 | 5517.14 / 5517.14 | 205.9 M | Deeplabv3-R101.yaml | Inference Model/Training Model |
OCRNet_HRNet-W18 | 80.67 | 286.12 / 80.76 | 1794.03 / 1794.03 | 43.1 M | OCRNet_HRNet-W18.yaml | Inference Model/Training Model |
OCRNet_HRNet-W48 | 82.15 | 627.36 / 170.76 | 3531.61 / 3531.61 | 249.8 M | OCRNet_HRNet-W48.yaml | Inference Model/Training Model |
PP-LiteSeg-T | 73.10 | 30.16 / 14.03 | 420.07 / 235.01 | 28.5 M | PP-LiteSeg-T.yaml | Inference Model/Training Model |
PP-LiteSeg-B | 75.25 | 40.92 / 20.18 | 494.32 / 310.34 | 47.0 M | PP-LiteSeg-B.yaml | Inference Model/Training Model |
SegFormer-B0 (slice) | 76.73 | 11.1946 | 268.929 | 13.2 M | SegFormer-B0.yaml | Inference Model/Training Model |
SegFormer-B1 (slice) | 78.35 | 17.9998 | 403.393 | 48.5 M | SegFormer-B1.yaml | Inference Model/Training Model |
SegFormer-B2 (slice) | 81.60 | 48.0371 | 1248.52 | 96.9 M | SegFormer-B2.yaml | Inference Model/Training Model |
SegFormer-B3 (slice) | 82.47 | 64.341 | 1666.35 | 167.3 M | SegFormer-B3.yaml | Inference Model/Training Model |
SegFormer-B4 (slice) | 82.38 | 82.4336 | 1995.42 | 226.7 M | SegFormer-B4.yaml | Inference Model/Training Model |
SegFormer-B5 (slice) | 82.58 | 97.3717 | 2420.19 | 229.7 M | SegFormer-B5.yaml | Inference Model/Training Model |
Note: The above accuracy metrics are based on the Cityscapes dataset mIoU.
Model Name | mIoU (%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size | yaml File | Model Download Link |
---|---|---|---|---|---|---|
SeaFormer_base(slice) | 40.92 | 24.4073 | 397.574 | 30.8 M | SeaFormer_base.yaml | Inference Model/Training Model |
SeaFormer_large (slice) | 43.66 | 27.8123 | 550.464 | 49.8 M | SeaFormer_large.yaml | Inference Model/Training Model |
SeaFormer_small (slice) | 38.73 | 19.2295 | 358.343 | 14.3 M | SeaFormer_small.yaml | Inference Model/Training Model |
SeaFormer_tiny (slice) | 34.58 | 13.9496 | 330.132 | 6.1M | SeaFormer_tiny.yaml | Inference Model/Training Model |
Note: The above accuracy metrics are based on the ADE20k dataset. "Slice" indicates that the input images have been cropped.
Instance Segmentation Module¶
Model Name | Mask AP | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size | yaml File | Model Download Link |
---|---|---|---|---|---|---|
Mask-RT-DETR-H | 50.6 | 172.36 / 172.36 | 1615.75 / 1615.75 | 449.9 M | Mask-RT-DETR-H.yaml | Inference Model/Training Model |
Mask-RT-DETR-L | 45.7 | 88.18 / 88.18 | 1090.84 / 1090.84 | 113.6 M | Mask-RT-DETR-L.yaml | Inference Model/Training Model |
Mask-RT-DETR-M | 42.7 | 78.69 / 78.69 | - | 66.6 M | Mask-RT-DETR-M.yaml | Inference Model/Training Model |
Mask-RT-DETR-S | 41.0 | 33.5007 | - | 51.8 M | Mask-RT-DETR-S.yaml | Inference Model/Training Model |
Mask-RT-DETR-X | 47.5 | 114.16 / 114.16 | 1240.92 / 1240.92 | 237.5 M | Mask-RT-DETR-X.yaml | Inference Model/Training Model |
Cascade-MaskRCNN-ResNet50-FPN | 36.3 | 141.69 / 141.69 | - | 254.8 M | Cascade-MaskRCNN-ResNet50-FPN.yaml | Inference Model/Training Model |
Cascade-MaskRCNN-ResNet50-vd-SSLDv2-FPN | 39.1 | 147.62 / 147.62 | - | 254.7 M | Cascade-MaskRCNN-ResNet50-vd-SSLDv2-FPN.yaml | Inference Model/Training Model |
MaskRCNN-ResNet50-FPN | 35.6 | 118.30 / 118.30 | - | 157.5 M | MaskRCNN-ResNet50-FPN.yaml | Inference Model/Training Model |
MaskRCNN-ResNet50-vd-FPN | 36.4 | 118.34 / 118.34 | - | 157.5 M | MaskRCNN-ResNet50-vd-FPN.yaml | Inference Model/Training Model |
MaskRCNN-ResNet50 | 32.8 | 228.83 / 228.83 | - | 127.8 M | MaskRCNN-ResNet50.yaml | Inference Model/Training Model |
MaskRCNN-ResNet101-FPN | 36.6 | 148.14 / 148.14 | - | 225.4 M | MaskRCNN-ResNet101-FPN.yaml | Inference Model/Training Model |
MaskRCNN-ResNet101-vd-FPN | 38.1 | 151.12 / 151.12 | - | 225.1 M | MaskRCNN-ResNet101-vd-FPN.yaml | Inference Model/Training Model |
MaskRCNN-ResNeXt101-vd-FPN | 39.5 | 237.55 / 237.55 | - | 370.0 M | MaskRCNN-ResNeXt101-vd-FPN.yaml | Inference Model/Training Model |
PP-YOLOE_seg-S | 32.5 | - | - | 31.5 M | PP-YOLOE_seg-S.yaml | Inference Model/Training Model |
SOLOv2 | 35.5 | - | - | 179.1 M | SOLOv2.yaml | Inference Model/Training Model |
Note: The above accuracy metrics are based on the Mask AP(0.5:0.95) on the COCO2017 validation set.
Text Detection Module¶
Model | Detection Hmean (%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (M) | yaml File | Model Download Link |
---|---|---|---|---|---|---|
PP-OCRv4_server_det | 82.56 | 83.34 / 80.91 | 442.58 / 442.58 | 109 | PP-OCRv4_server_det.yaml | Inference Model/Training Model |
PP-OCRv4_mobile_det | 77.35 | 8.79 / 3.13 | 51.00 / 28.58 | 4.7 | PP-OCRv4_mobile_det.yaml | Inference Model/Training Model |
PP-OCRv3_mobile_det | 78.68 | 8.44 / 2.91 | 27.87 / 27.87 | 2.1 | PP-OCRv3_mobile_det.yaml | Inference Model/Training Model |
PP-OCRv3_server_det | 80.11 | 65.41 / 13.67 | 305.07 / 305.07 | 102.1 | PP-OCRv3_server_det.yaml | Inference Model/Training Model |
Note: The evaluation dataset for the above accuracy metrics is the self-built Chinese and English dataset of PaddleOCR, covering multiple scenarios such as street view, web images, documents, and handwriting, with 593 images for text recognition. The GPU inference time for all models is based on an NVIDIA Tesla T4 machine with FP32 precision type, while the CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision type.
Seal Text Detection Module¶
Model Name | Detection Hmean (%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size | yaml File | Model Download Link |
---|---|---|---|---|---|---|
PP-OCRv4_mobile_seal_det | 96.47 | 7.82 / 3.09 | 48.28 / 23.97 | 4.7M | PP-OCRv4_mobile_seal_det.yaml | Inference Model/Training Model |
PP-OCRv4_server_seal_det | 98.21 | 74.75 / 67.72 | 382.55 / 382.55 | 108.3 M | PP-OCRv4_server_seal_det.yaml | Inference Model/Training Model |
Note: The evaluation set for the above precision metrics is the seal dataset built by PaddleX, which includes 500 seal images.
Text Recognition Module¶
- Chinese Text Recognition Models
Model | Recognition Avg Accuracy(%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (M) | yaml File | Model Download Link |
---|---|---|---|---|---|---|
PP-OCRv4_server_rec_doc | 81.53 | 6.65 / 2.38 | 32.92 / 32.92 | 74.7 M | PP-OCRv4_server_rec_doc.yaml | Inference Model/Training Model |
PP-OCRv4_mobile_rec | 78.74 | 4.82 / 1.20 | 16.74 / 4.64 | 10.6 M | PP-OCRv4_mobile_rec.yaml | Inference Model/Training Model |
PP-OCRv4_server_rec | 80.61 | 6.58 / 2.43 | 33.17 / 33.17 | 71.2 M | PP-OCRv4_server_rec.yaml | Inference Model/Training Model |
PP-OCRv3_mobile_rec | 72.96 | 5.87 / 1.19 | 9.07 / 4.28 | 9.2 M | PP-OCRv3_mobile_rec.yaml | Inference Model/Training Model |
Note: The evaluation set for the above accuracy metrics is a Chinese dataset built by PaddleOCR, covering multiple scenarios such as street view, web images, documents, and handwriting, with 8367 images for text recognition. All models' GPU inference times are based on an NVIDIA Tesla T4 machine with FP32 precision, while CPU inference speeds are based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision.
Model | Recognition Avg Accuracy(%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (M) | yaml File | Model Download Link |
---|---|---|---|---|---|---|
ch_SVTRv2_rec | 68.81 | 8.08 / 2.74 | 50.17 / 42.50 | 73.9 M | ch_SVTRv2_rec.yaml | Inference Model/Training Model |
Note: The evaluation dataset for the above accuracy metrics is the PaddleOCR Algorithm Model Challenge - Task 1: OCR End-to-End Recognition Task Leaderboard A. All model GPU inference times are based on an NVIDIA Tesla T4 machine with FP32 precision. CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision.
Model | Recognition Avg Accuracy(%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (M) | yaml File | Model Download Link |
---|---|---|---|---|---|---|
ch_RepSVTR_rec | 65.07 | 5.93 / 1.62 | 20.73 / 7.32 | 22.1 M | ch_RepSVTR_rec.yaml | Inference Model/Training Model |
Note: The evaluation dataset for the above accuracy metrics is the PaddleOCR Algorithm Model Challenge - Task 1: OCR End-to-End Recognition Task Leaderboard B. All model GPU inference times are based on an NVIDIA Tesla T4 machine with FP32 precision. CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision.
English Recognition Model
Model | Recognition Avg Accuracy(%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (M) | yaml File | Model Download Link |
---|---|---|---|---|---|---|
en_PP-OCRv4_mobile_rec | 70.39 | 4.81 / 0.75 | 16.10 / 5.31 | 6.8 M | en_PP-OCRv4_mobile_rec.yaml | Inference Model/Training Model |
en_PP-OCRv3_mobile_rec | 70.69 | 5.44 / 0.75 | 8.65 / 5.57 | 7.8 M | en_PP-OCRv3_mobile_rec.yaml | Inference Model/Training Model |
Note: The evaluation set for the above accuracy metrics is an English dataset built by PaddleX. All model GPU inference times are based on NVIDIA Tesla T4 machines, with precision type FP32. CPU inference speed is based on Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz, with 8 threads, and precision type FP32.
Multilingual Recognition Model
Model | Recognition Avg Accuracy(%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (M) | yaml File | Model Download Link |
---|---|---|---|---|---|---|
korean_PP-OCRv3_mobile_rec | 60.21 | 5.40 / 0.97 | 9.11 / 4.05 | 8.6 M | korean_PP-OCRv3_mobile_rec.yaml | Inference Model/Training Model |
japan_PP-OCRv3_mobile_rec | 45.69 | 5.70 / 1.02 | 8.48 / 4.07 | 8.8 M | japan_PP-OCRv3_mobile_rec.yaml | Inference Model/Training Model |
chinese_cht_PP-OCRv3_mobile_rec | 82.06 | 5.90 / 1.28 | 9.28 / 4.34 | 9.7 M | chinese_cht_PP-OCRv3_mobile_rec.yaml | Inference Model/Training Model |
te_PP-OCRv3_mobile_rec | 95.88 | 5.42 / 0.82 | 8.10 / 6.91 | 7.8 M | te_PP-OCRv3_mobile_rec.yaml | Inference Model/Training Model |
ka_PP-OCRv3_mobile_rec | 96.96 | 5.25 / 0.79 | 9.09 / 3.86 | 8.0 M | ka_PP-OCRv3_mobile_rec.yaml | Inference Model/Training Model |
ta_PP-OCRv3_mobile_rec | 76.83 | 5.23 / 0.75 | 10.13 / 4.30 | 8.0 M | ta_PP-OCRv3_mobile_rec.yaml | Inference Model/Training Model |
latin_PP-OCRv3_mobile_rec | 76.93 | 5.20 / 0.79 | 8.83 / 7.15 | 7.8 M | latin_PP-OCRv3_mobile_rec.yaml | Inference Model/Training Model |
arabic_PP-OCRv3_mobile_rec | 73.55 | 5.35 / 0.79 | 8.80 / 4.56 | 7.8 M | arabic_PP-OCRv3_mobile_rec.yaml | Inference Model/Training Model |
cyrillic_PP-OCRv3_mobile_rec | 94.28 | 5.23 / 0.76 | 8.89 / 3.88 | 7.9 M | cyrillic_PP-OCRv3_mobile_rec.yaml | Inference Model/Training Model |
devanagari_PP-OCRv3_mobile_rec | 96.44 | 5.22 / 0.79 | 8.56 / 4.06 | 7.9 M | devanagari_PP-OCRv3_mobile_rec.yaml | Inference Model/Training Model |
Note: The evaluation set for the above accuracy metrics is a multi-language dataset built by PaddleX. All model GPU inference times are based on NVIDIA Tesla T4 machines, with precision type FP32. CPU inference speed is based on Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz, with 8 threads, and precision type FP32.
Formula Recognition Module¶
Model | Avg-BLEU(%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (M) | yaml File | Model Download Link | UniMERNet | 86.13 | 2266.96/- | -/- | 1.4 G | UniMERNet.yaml | Inference Model/Training Model |
---|---|---|---|---|---|---|
PP-FormulaNet-S | 87.12 | 202.25/- | -/- | 167.9 M | PP-FormulaNet-S.yaml | Inference Model/Training Model | PP-FormulaNet-L | 92.13 | 1976.52/- | -/- | 535.2 M | PP-FormulaNet-L.yaml | Inference Model/Training Model |
LaTeX_OCR_rec | 71.63 | -/- | -/- | 89.7 M | LaTeX_OCR_rec.yaml | Inference Model/Training Model |
Note: The above accuracy metrics are measured from the internal formula recognition test set of PaddleX. The BLEU score of LaTeX_OCR_rec on the LaTeX-OCR formula recognition test set is 0.8821. All model GPU inference times are based on Tesla V100 GPUs, with precision type FP32.
Table Structure Recognition Module¶
Model | Accuracy (%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (M) | yaml File | Model Download Link |
---|---|---|---|---|---|---|
SLANet | 59.52 | 103.08 / 103.08 | 197.99 / 197.99 | 6.9 M | SLANet.yaml | Inference Model/Training Model |
SLANet_plus | 63.69 | 140.29 / 140.29 | 195.39 / 195.39 | 6.9 M | SLANet_plus.yaml | Inference Model/Training Model |
SLANeXt_wired | 69.65 | -- | -- | -- | SLANeXt_wired.yaml | Inference Model/Training Model |
SLANeXt_wireless | SLANeXt_wireless.yaml | Inference Model/Training Model |
Note: The above accuracy metrics are measured from the high-difficulty Chinese table recognition dataset built internally by PaddleX. All model GPU inference times are based on an NVIDIA Tesla T4 machine with FP32 precision type. CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision type.
Table Cell Detection Module¶
Model | Model Download Link | mAP(%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (M) | Introduction |
---|---|---|---|---|---|---|
RT-DETR-L_wired_table_cell_det | Inference Model/Training Model | 82.7 | 35.00 / 10.45 | 495.51 / 495.51 | 124M | RT-DETR is the first real-time end-to-end object detection model. The Baidu PaddlePaddle Vision Team, based on RT-DETR-L as the base model, has completed pretraining on a self-built table cell detection dataset, achieving good performance for both wired and wireless table cell detection. |
RT-DETR-L_wireless_table_cell_det | Inference Model/Training Model |
Note: The above accuracy metrics are measured from the internal table cell detection dataset of PaddleX. All model GPU inference times are based on an NVIDIA Tesla T4 machine, with precision type FP32. CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz, with 8 threads, and precision type FP32.
Table Classification Module¶
Model | Top1 Acc(%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (M) | yaml File | Model Download Link |
---|---|---|---|---|---|---|
PP-LCNet_x1_0_table_cls | -- | -- | -- | -- | PP-LCNet_x1_0_table_cls.yaml | Inference Model/Training Model |
Note: The above accuracy metrics are measured from the internal table classification dataset built by PaddleX. All model GPU inference times are based on an NVIDIA Tesla T4 machine with FP32 precision. CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision.
Text Image Unwarping Module¶
Model Name | MS-SSIM (%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size | yaml File | Model Download Link |
---|---|---|---|---|---|---|
UVDoc | 54.40 | 16.27 / 7.76 | 176.97 / 80.60 | 30.3 M | UVDoc.yaml | Inference Model/Training Model |
Note: The above accuracy metrics are measured from the image unwarping dataset built by PaddleX.
Layout Detection Module¶
- Table Layout Detection Model
Model | mAP(0.5) (%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (M) | yaml File | Model Download Link |
---|---|---|---|---|---|---|
PicoDet_layout_1x_table | 97.5 | 8.02 / 3.09 | 23.70 / 20.41 | 7.4 M | PicoDet_layout_1x_table.yaml | Inference Model/Training Model |
Note: The evaluation set for the above accuracy metrics is the layout table area detection dataset built by PaddleOCR, which contains 7835 images of document types with tables in both Chinese and English. The GPU inference time is based on an NVIDIA Tesla T4 machine with FP32 precision, and the CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision. 3 types of layout detection models, including tables, images, and seals
Model | mAP(0.5) (%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (M) | yaml File | Model Download Link |
---|---|---|---|---|---|---|
PicoDet-S_layout_3cls | 88.2 | 8.99 / 2.22 | 16.11 / 8.73 | 4.8 | PicoDet-S_layout_3cls.yaml | Inference Model/Training Model |
PicoDet-L_layout_3cls | 89.0 | 13.05 / 4.50 | 41.30 / 41.30 | 22.6 | PicoDet-L_layout_3cls.yaml | Inference Model/Training Model |
RT-DETR-H_layout_3cls | 95.8 | 114.93 / 27.71 | 947.56 / 947.56 | 470.1 | RT-DETR-H_layout_3cls.yaml | Inference Model/Training Model |
Note: The evaluation dataset for the above accuracy metrics is the layout area detection dataset built by PaddleOCR, which includes 1,154 common types of document images such as Chinese and English papers, magazines, and research reports. The GPU inference time is based on an NVIDIA Tesla T4 machine with FP32 precision, and the CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision.
- 5-class English document layout detection model, including text, title, table, image, and list
Model | mAP(0.5) (%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Size (M) | yaml File | Model Download Link |
---|---|---|---|---|---|---|
PicoDet_layout_1x | 97.8 | 9.03 / 3.10 | 25.82 / 20.70 | 7.4 | PicoDet_layout_1x.yaml | Inference Model/Training Model |
Note: The evaluation dataset for the above accuracy metrics is the PubLayNet evaluation dataset, which contains 11,245 images of English documents. The GPU inference time is based on an NVIDIA Tesla T4 machine with FP32 precision. The CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision.
- 17-class layout detection model, including 17 common layout categories: paragraph title, image, text, number, abstract, content, figure title, formula, table, table title, reference, document title, footnote, header, algorithm, footer, and seal
Model | mAP(0.5) (%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Size (M) | yaml File | Model Download Link |
---|---|---|---|---|---|---|
PicoDet-S_layout_17cls | 87.4 | 9.11 / 2.12 | 15.42 / 9.12 | 4.8 | PicoDet-S_layout_17cls.yaml | Inference Model/Training Model |
Note: The evaluation set for the above accuracy metrics is the layout area detection dataset built by PaddleOCR, which includes 892 images of common document types such as Chinese and English papers, magazines, and research reports. The GPU inference time is based on an NVIDIA Tesla T4 machine with FP32 precision. The CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision.
Document Image Orientation Classification Module¶
Model | Top-1 Acc (%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Size (M) | yaml File | Model Download Link |
---|---|---|---|---|---|---|
PP-LCNet_x1_0_doc_ori | 99.06 | 2.31 / 0.43 | 3.37 / 1.27 | 7 | PP-LCNet_x1_0_doc_ori.yaml | Inference Model/Training Model |
Note: The evaluation set for the above accuracy metrics is a self-built dataset covering multiple scenarios such as documents and certificates, with 1000 images. The GPU inference time is based on an NVIDIA Tesla T4 machine with FP32 precision. The CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision.
Text Line Orientation Classification Module¶
Model | Top-1 Acc (%) | GPU Inference Time (ms) [Standard Mode / High-Performance Mode] |
CPU Inference Time (ms) [Standard Mode / High-Performance Mode] |
Model Storage Size (M) | YAML File | Model Download Link |
---|---|---|---|---|---|---|
PP-LCNet_x1_0_doc_ori | 99.06 | 2.31 / 0.43 | 3.37 / 1.27 | 7 | PP-LCNet_x0_25_textline_ori.yaml | Inference Model/Training Model |
Note: The evaluation dataset for the above accuracy metrics is a self-built dataset covering multiple scenarios such as certificates and documents, with 1,000 images. The GPU inference time is based on an NVIDIA Tesla T4 machine with FP32 precision, and the CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision.
Time Series Forecasting Module¶
Model Name | mse | mae | Model Storage Size | yaml File | Model Download Link |
---|---|---|---|---|---|
DLinear | 0.382 | 0.394 | 72 K | DLinear.yaml | Inference Model/Training Model |
NLinear | 0.386 | 0.392 | 40 K | NLinear.yaml | Inference Model/Training Model |
Nonstationary | 0.600 | 0.515 | 55.5 M | Nonstationary.yaml | Inference Model/Training Model |
PatchTST | 0.379 | 0.391 | 2.0 M | PatchTST.yaml | Inference Model/Training Model |
RLinear | 0.385 | 0.392 | 40 K | RLinear.yaml | Inference Model/Training Model |
TiDE | 0.407 | 0.414 | 31.7 M | TiDE.yaml | Inference Model/Training Model |
TimesNet | 0.416 | 0.429 | 4.9 M | TimesNet.yaml | Inference Model/Training Model |
Note: The above accuracy metrics are measured from the ETTH1 dataset (evaluation results on the test.csv test set).
Time Series Anomaly Detection Module¶
Model Name | Precision | Recall | F1 Score | Model Storage Size | YAML File | Model Download Link |
---|---|---|---|---|---|---|
AutoEncoder_ad | 99.36 | 84.36 | 91.25 | 52 K | AutoEncoder_ad.yaml | Inference Model/Training Model |
DLinear_ad | 98.98 | 93.96 | 96.41 | 112 K | DLinear_ad.yaml | Inference Model/Training Model |
Nonstationary_ad | 98.55 | 88.95 | 93.51 | 1.8 M | Nonstationary_ad.yaml | Inference Model/Training Model |
PatchTST_ad | 98.78 | 90.70 | 94.57 | 320 K | PatchTST_ad.yaml | Inference Model/Training Model |
Note: The above precision metrics are measured from the PSM dataset.
Time Series Classification Module¶
Model Name | acc(%) | Model Storage Size | yaml File | Model Download Link |
---|---|---|---|---|
TimesNet_cls | 87.5 | 792 K | TimesNet_cls.yaml | Inference Model/Training Model |
Note: The above accuracy metrics are measured from the UWaveGestureLibrary dataset.
>Note: The GPU inference time for all models above is based on an NVIDIA Tesla T4 machine with FP32 precision. The CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision.
Multilingual Speech Recognition Module¶
Model | Training Data | Model Size | Word Error Rate | YAML File | Model Download Link |
---|---|---|---|---|---|
whisper_large | 680kh | 5.8G | 2.7 (Librispeech) | whisper_large.yaml | Inference Model |
whisper_medium | 680kh | 2.9G | - | whisper_medium.yaml | Inference Model |
whisper_small | 680kh | 923M | - | whisper_small.yaml | Inference Model |
whisper_base | 680kh | 277M | - | whisper_base.yaml | Inference Model |
whisper_tiny | 680kh | 145M | - | whisper_small.yaml | Inference Model |
Video Classification Module¶
Model | Top1 Acc(%) | Model Storage Size (M) | yaml File | Model Download Link |
---|---|---|---|---|
PP-TSM-R50_8frames_uniform | 74.36 | 93.4 M | PP-TSM-R50_8frames_uniform.yaml | Inference Model/Training Model |
PP-TSMv2-LCNetV2_8frames_uniform | 71.71 | 22.5 M | PP-TSMv2-LCNetV2_8frames_uniform.yaml | Inference Model/Training Model |
PP-TSMv2-LCNetV2_16frames_uniform | 73.11 | 22.5 M | PP-TSMv2-LCNetV2_16frames_uniform.yaml | Inference Model/Training Model |
Note: The above accuracy metrics are based on the K400 validation set Top1 Acc.
Video Detection Module¶
Model | Frame-mAP(@ IoU 0.5) | Model Storage Size (M) | yaml File | Model Download Link |
---|---|---|---|---|
YOWO | 80.94 | 462.891M | YOWO.yaml | Inference Model/Training Model |
Note: The above accuracy metrics are based on the test dataset UCF101-24, using the Frame-mAP (@ IoU 0.5) metric. All model GPU inference times are based on an NVIDIA Tesla T4 machine with FP32 precision. CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision.