PaddleX Model List (CPU/GPU)¶
PaddleX includes multiple pipelines, each containing several modules, and each module includes several models. You can choose which models to use based on the benchmark data below. If you prioritize model accuracy, choose models with higher accuracy. If you prioritize model inference speed, choose models with faster inference speed. If you prioritize model storage size, choose models with smaller storage size.
Image Classification Module¶
Model Name | Top1 Acc (%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size | yaml File | Model Download Link |
---|---|---|---|---|---|---|
CLIP_vit_base_patch16_224 | 85.36 | 12.84 / 2.82 | 60.52 / 60.52 | 306.5 M | CLIP_vit_base_patch16_224.yaml | Inference Model/Training Model |
CLIP_vit_large_patch14_224 | 88.1 | 51.72 / 11.13 | 238.07 / 238.07 | 1.04 G | CLIP_vit_large_patch14_224.yaml | Inference Model/Training Model |
ConvNeXt_base_224 | 83.84 | 13.18 / 12.14 | 128.39 / 81.78 | 313.9 M | ConvNeXt_base_224.yaml | Inference Model/Training Model |
ConvNeXt_base_384 | 84.90 | 32.15 / 30.52 | 279.36 / 220.35 | 313.9 M | ConvNeXt_base_384.yaml | Inference Model/Training Model |
ConvNeXt_large_224 | 84.26 | 26.51 / 7.21 | 213.32 / 157.22 | 700.7 M | ConvNeXt_large_224.yaml | Inference Model/Training Model |
ConvNeXt_large_384 | 85.27 | 67.07 / 65.26 | 494.04 / 438.97 | 700.7 M | ConvNeXt_large_384.yaml | Inference Model/Training Model |
ConvNeXt_small | 83.13 | 9.05 / 8.21 | 97.94 / 55.29 | 178.0 M | ConvNeXt_small.yaml | Inference Model/Training Model |
ConvNeXt_tiny | 82.03 | 5.12 / 2.06 | 63.96 / 29.77 | 101.4 M | ConvNeXt_tiny.yaml | Inference Model/Training Model |
FasterNet-L | 83.5 | 15.67 / 3.10 | 52.24 / 52.24 | 357.1 M | FasterNet-L.yaml | Inference Model/Training Model |
FasterNet-M | 83.0 | 9.72 / 2.30 | 35.29 / 35.29 | 204.6 M | FasterNet-M.yaml | Inference Model/Training Model |
FasterNet-S | 81.3 | 5.46 / 1.27 | 20.46 / 18.03 | 119.3 M | FasterNet-S.yaml | Inference Model/Training Model |
FasterNet-T0 | 71.9 | 4.18 / 0.60 | 6.34 / 3.44 | 15.1 M | FasterNet-T0.yaml | Inference Model/Training Model |
FasterNet-T1 | 75.9 | 4.24 / 0.64 | 9.57 / 5.20 | 29.2 M | FasterNet-T1.yaml | Inference Model/Training Model |
FasterNet-T2 | 79.1 | 3.87 / 0.78 | 11.14 / 9.98 | 57.4 M | FasterNet-T2.yaml | Inference Model/Training Model |
MobileNetV1_x0_5 | 63.5 | 1.39 / 0.28 | 2.74 / 1.02 | 4.8 M | MobileNetV1_x0_5.yaml | Inference Model/Training Model |
MobileNetV1_x0_25 | 51.4 | 1.32 / 0.30 | 2.04 / 0.58 | 1.8 M | MobileNetV1_x0_25.yaml | Inference Model/Training Model |
MobileNetV1_x0_75 | 68.8 | 1.75 / 0.33 | 3.41 / 1.57 | 9.3 M | MobileNetV1_x0_75.yaml | Inference Model/Training Model |
MobileNetV1_x1_0 | 71.0 | 1.89 / 0.34 | 4.01 / 2.17 | 15.2 M | MobileNetV1_x1_0.yaml | Inference Model/Training Model |
MobileNetV2_x0_5 | 65.0 | 3.17 / 0.48 | 4.52 / 1.35 | 7.1 M | MobileNetV2_x0_5.yaml | Inference Model/Training Model |
MobileNetV2_x0_25 | 53.2 | 2.80 / 0.46 | 3.92 / 0.98 | 5.5 M | MobileNetV2_x0_25.yaml | Inference Model/Training Model |
MobileNetV2_x1_0 | 72.2 | 3.57 / 0.49 | 5.63 / 2.51 | 12.6 M | MobileNetV2_x1_0.yaml | Inference Model/Training Model |
MobileNetV2_x1_5 | 74.1 | 3.58 / 0.62 | 8.02 / 4.49 | 25.0 M | MobileNetV2_x1_5.yaml | Inference Model/Training Model |
MobileNetV2_x2_0 | 75.2 | 3.56 / 0.74 | 10.24 / 6.83 | 41.2 M | MobileNetV2_x2_0.yaml | Inference Model/Training Model |
MobileNetV3_large_x0_5 | 69.2 | 3.79 / 0.62 | 6.76 / 1.61 | 9.6 M | MobileNetV3_large_x0_5.yaml | Inference Model/Training Model |
MobileNetV3_large_x0_35 | 64.3 | 3.70 / 0.60 | 5.54 / 1.41 | 7.5 M | MobileNetV3_large_x0_35.yaml | Inference Model/Training Model |
MobileNetV3_large_x0_75 | 73.1 | 4.82 / 0.66 | 7.45 / 2.00 | 14.0 M | MobileNetV3_large_x0_75.yaml | Inference Model/Training Model |
MobileNetV3_large_x1_0 | 75.3 | 4.86 / 0.68 | 6.88 / 2.61 | 19.5 M | MobileNetV3_large_x1_0.yaml | Inference Model/Training Model |
MobileNetV3_large_x1_25 | 76.4 | 5.08 / 0.71 | 7.37 / 3.58 | 26.5 M | MobileNetV3_large_x1_25.yaml | Inference Model/Training Model |
MobileNetV3_small_x0_5 | 59.2 | 3.41 / 0.57 | 5.60 / 1.14 | 6.8 M | MobileNetV3_small_x0_5.yaml | Inference Model/Training Model |
MobileNetV3_small_x0_35 | 53.0 | 3.49 / 0.60 | 4.63 / 1.07 | 6.0 M | MobileNetV3_small_x0_35.yaml | Inference Model/Training Model |
MobileNetV3_small_x0_75 | 66.0 | 3.49 / 0.60 | 5.19 / 1.28 | 8.5 M | MobileNetV3_small_x0_75.yaml | Inference Model/Training Model |
MobileNetV3_small_x1_0 | 68.2 | 3.76 / 0.53 | 5.11 / 1.43 | 10.5 M | MobileNetV3_small_x1_0.yaml | Inference Model/Training Model |
MobileNetV3_small_x1_25 | 70.7 | 4.23 / 0.58 | 6.48 / 1.68 | 13.0 M | MobileNetV3_small_x1_25.yaml | Inference Model/Training Model |
MobileNetV4_conv_large | 83.4 | 8.33 / 2.24 | 33.56 / 23.70 | 125.2 M | MobileNetV4_conv_large.yaml | Inference Model/Training Model |
MobileNetV4_conv_medium | 79.9 | 6.81 / 0.92 | 12.47 / 6.27 | 37.6 M | MobileNetV4_conv_medium.yaml | Inference Model/Training Model |
MobileNetV4_conv_small | 74.6 | 3.25 / 0.46 | 4.42 / 1.54 | 14.7 M | MobileNetV4_conv_small.yaml | Inference Model/Training Model |
MobileNetV4_hybrid_large | 83.8 | 12.27 / 4.18 | 58.64 / 58.64 | 145.1 M | MobileNetV4_hybrid_large.yaml | Inference Model/Training Model |
MobileNetV4_hybrid_medium | 80.5 | 12.08 / 1.34 | 24.69 / 8.10 | 42.9 M | MobileNetV4_hybrid_medium.yaml | Inference Model/Training Model |
PP-HGNet_base | 85.0 | 14.10 / 4.19 | 68.92 / 68.92 | 249.4 M | PP-HGNet_base.yaml | Inference Model/Training Model |
PP-HGNet_small | 81.51 | 5.12 / 1.73 | 25.01 / 25.01 | 86.5 M | PP-HGNet_small.yaml | Inference Model/Training Model |
PP-HGNet_tiny | 79.83 | 3.28 / 1.29 | 16.40 / 15.97 | 52.4 M | PP-HGNet_tiny.yaml | Inference Model/Training Model |
PP-HGNetV2-B0 | 77.77 | 3.83 / 0.57 | 9.95 / 2.37 | 21.4 M | PP-HGNetV2-B0.yaml | Inference Model/Training Model |
PP-HGNetV2-B1 | 79.18 | 3.87 / 0.62 | 8.77 / 3.79 | 22.6 M | PP-HGNetV2-B1.yaml | Inference Model/Training Model |
PP-HGNetV2-B2 | 81.74 | 5.73 / 0.86 | 15.11 / 7.05 | 39.9 M | PP-HGNetV2-B2.yaml | Inference Model/Training Model |
PP-HGNetV2-B3 | 82.98 | 6.26 / 1.01 | 18.47 / 10.34 | 57.9 M | PP-HGNetV2-B3.yaml | Inference Model/Training Model |
PP-HGNetV2-B4 | 83.57 | 5.47 / 1.10 | 14.42 / 9.89 | 70.4 M | PP-HGNetV2-B4.yaml | Inference Model/Training Model |
PP-HGNetV2-B5 | 84.75 | 10.24 / 1.96 | 29.71 / 29.71 | 140.8 M | PP-HGNetV2-B5.yaml | Inference Model/Training Model |
PP-HGNetV2-B6 | 86.30 | 12.25 / 3.76 | 62.29 / 62.29 | 268.4 M | PP-HGNetV2-B6.yaml | Inference Model/Training Model |
PP-LCNet_x0_5 | 63.14 | 2.28 / 0.42 | 2.86 / 0.83 | 6.7 M | PP-LCNet_x0_5.yaml | Inference Model/Training Model |
PP-LCNet_x0_25 | 51.86 | 1.89 / 0.45 | 2.49 / 0.68 | 5.5 M | PP-LCNet_x0_25.yaml | Inference Model/Training Model |
PP-LCNet_x0_35 | 58.09 | 1.94 / 0.41 | 2.73 / 0.77 | 5.9 M | PP-LCNet_x0_35.yaml | Inference Model/Training Model |
PP-LCNet_x0_75 | 68.18 | 2.30 / 0.41 | 2.95 / 1.07 | 8.4 M | PP-LCNet_x0_75.yaml | Inference Model/Training Model |
PP-LCNet_x1_0 | 71.32 | 2.35 / 0.47 | 4.03 / 1.35 | 10.5 M | PP-LCNet_x1_0.yaml | Inference Model/Training Model |
PP-LCNet_x1_5 | 73.71 | 2.33 / 0.53 | 4.17 / 2.29 | 16.0 M | PP-LCNet_x1_5.yaml | Inference Model/Training Model |
PP-LCNet_x2_0 | 75.18 | 2.40 / 0.51 | 5.37 / 3.46 | 23.2 M | PP-LCNet_x2_0.yaml | Inference Model/Training Model |
PP-LCNet_x2_5 | 76.60 | 2.36 / 0.61 | 6.29 / 5.05 | 32.1 M | PP-LCNet_x2_5.yaml | Inference Model/Training Model |
PP-LCNetV2_base | 77.05 | 3.33 / 0.55 | 6.86 / 3.77 | 23.7 M | PP-LCNetV2_base.yaml | Inference Model/Training Model |
PP-LCNetV2_large | 78.51 | 4.37 / 0.71 | 9.43 / 8.07 | 37.3 M | PP-LCNetV2_large.yaml | Inference Model/Training Model |
PP-LCNetV2_small | 73.97 | 2.53 / 0.41 | 5.14 / 1.98 | 14.6 M | PP-LCNetV2_small.yaml | Inference Model/Training Model |
ResNet18_vd | 72.3 | 2.47 / 0.61 | 6.97 / 5.15 | 41.5 M | ResNet18_vd.yaml | Inference Model/Training Model |
ResNet18 | 71.0 | 2.35 / 0.67 | 6.35 / 4.61 | 41.5 M | ResNet18.yaml | Inference Model/Training Model |
ResNet34_vd | 76.0 | 4.01 / 1.03 | 11.99 / 9.86 | 77.3 M | ResNet34_vd.yaml | Inference Model/Training Model |
ResNet34 | 74.6 | 3.99 / 1.02 | 12.42 / 9.81 | 77.3 M | ResNet34.yaml | Inference Model/Training Model |
ResNet50_vd | 79.1 | 6.04 / 1.16 | 16.08 / 12.07 | 90.8 M | ResNet50_vd.yaml | Inference Model/Training Model |
ResNet50 | 76.5 | 6.44 / 1.16 | 15.04 / 11.63 | 90.8 M | ResNet50.yaml | Inference Model/Training Model |
ResNet101_vd | 80.2 | 11.16 / 2.07 | 32.14 / 32.14 | 158.4 M | ResNet101_vd.yaml | Inference Model/Training Model |
ResNet101 | 77.6 | 10.91 / 2.06 | 31.14 / 22.93 | 158.7 M | ResNet101.yaml | Inference Model/Training Model |
ResNet152_vd | 80.6 | 15.96 / 2.99 | 49.33 / 49.33 | 214.3 M | ResNet152_vd.yaml | Inference Model/Training Model |
ResNet152 | 78.3 | 15.61 / 2.90 | 47.33 / 36.60 | 214.2 M | ResNet152.yaml | Inference Model/Training Model |
ResNet200_vd | 80.9 | 24.20 / 3.69 | 62.62 / 62.62 | 266.0 M | ResNet200_vd.yaml | Inference Model/Training Model |
StarNet-S1 | 73.6 | 6.33 / 1.98 | 7.56 / 3.26 | 11.2 M | StarNet-S1.yaml | Inference Model/Training Model |
StarNet-S2 | 74.8 | 4.49 / 1.55 | 7.38 / 3.38 | 14.3 M | StarNet-S2.yaml | Inference Model/Training Model |
StarNet-S3 | 77.0 | 6.70 / 1.62 | 11.05 / 4.76 | 22.2 M | StarNet-S3.yaml | Inference Model/Training Model |
StarNet-S4 | 79.0 | 8.50 / 2.86 | 15.40 / 6.76 | 28.9 M | StarNet-S4.yaml | Inference Model/Training Model |
SwinTransformer_base_patch4_window7_224 | 83.37 | 14.29 / 5.13 | 130.89 / 130.89 | 310.5 M | SwinTransformer_base_patch4_window7_224.yaml | Inference Model/Training Model |
SwinTransformer_base_patch4_window12_384 | 84.17 | 37.74 / 10.10 | 362.56 / 362.56 | 311.4 M | SwinTransformer_base_patch4_window12_384.yaml | Inference Model/Training Model |
SwinTransformer_large_patch4_window7_224 | 86.19 | 26.48 / 7.94 | 228.23 / 228.23 | 694.8 M | SwinTransformer_large_patch4_window7_224.yaml | Inference Model/Training Model |
SwinTransformer_large_patch4_window12_384 | 87.06 | 74.72 / 18.16 | 652.04 / 652.04 | 696.1 M | SwinTransformer_large_patch4_window12_384.yaml | Inference Model/Training Model |
SwinTransformer_small_patch4_window7_224 | 83.21 | 10.37 / 3.90 | 94.20 / 94.20 | 175.6 M | SwinTransformer_small_patch4_window7_224.yaml | Inference Model/Training Model |
SwinTransformer_tiny_patch4_window7_224 | 81.10 | 6.66 / 2.15 | 60.45 / 60.45 | 100.1 M | SwinTransformer_tiny_patch4_window7_224.yaml | Inference Model/Training Model |
Note: The above accuracy metrics are based on the ImageNet-1k validation set Top1 Acc.
Image Multi-label Classification Module¶
Model Name | mAP (%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size | yaml File | Model Download Link |
---|---|---|---|---|---|---|
CLIP_vit_base_patch16_448_ML | 89.15 | 54.75 / 14.30 | 280.23 / 280.23 | 325.6 M | CLIP_vit_base_patch16_448_ML.yaml | Inference Model/Training Model |
PP-HGNetV2-B0_ML | 80.98 | 6.47 / 1.38 | 21.56 / 13.69 | 39.6 M | PP-HGNetV2-B0_ML.yaml | Inference Model/Training Model |
PP-HGNetV2-B4_ML | 87.96 | 9.63 / 2.79 | 43.98 / 36.63 | 88.5 M | PP-HGNetV2-B4_ML.yaml | Inference Model/Training Model |
PP-HGNetV2-B6_ML | 91.06 | 37.07 / 9.43 | 188.58 / 188.58 | 286.5 M | PP-HGNetV2-B6_ML.yaml | Inference Model/Training Model |
PP-LCNet_x1_0_ML | 77.96 | 4.04 / 1.15 | 11.76 / 8.32 | 29.4 M | PP-LCNet_x1_0_ML.yaml | Inference Model/Training Model |
ResNet50_ML | 83.42 | 12.12 / 3.27 | 51.79 / 44.36 | 108.9 M | ResNet50_ML.yaml | Inference Model/Training Model |
Note: The above accuracy metrics are for the multi-label classification task mAP on COCO2017.
Pedestrian Attribute Module¶
Model Name | mA (%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Size | yaml File | Model Download Link |
---|---|---|---|---|---|---|
PP-LCNet_x1_0_pedestrian_attribute | 92.2 | 2.35 / 0.49 | 3.17 / 1.25 | 6.7 M | PP-LCNet_x1_0_pedestrian_attribute.yaml | Inference Model/Training Model |
Note: The above accuracy metrics are for the internal PaddleX dataset mA.
Vehicle Attribute Module¶
Model Name | mA (%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size | yaml File | Model Download Link |
---|---|---|---|---|---|---|
PP-LCNet_x1_0_vehicle_attribute | 91.7 | 2.32 / 2.32 | 3.22 / 1.26 | 6.7 M | PP-LCNet_x1_0_vehicle_attribute.yaml | Inference Model/Training Model |
Note: The above accuracy metrics are based on the VeRi dataset mA.
Image Feature Module¶
Model Name | recall@1 (%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size | yaml File | Model Download Link |
---|---|---|---|---|---|---|
PP-ShiTuV2_rec | 84.2 | 3.48 / 0.55 | 8.04 / 4.04 | 16.3 M | PP-ShiTuV2_rec.yaml | Inference Model/Training Model |
PP-ShiTuV2_rec_CLIP_vit_base | 88.69 | 12.94 / 2.88 | 58.36 / 58.36 | 306.6 M | PP-ShiTuV2_rec_CLIP_vit_base.yaml | Inference Model/Training Model |
PP-ShiTuV2_rec_CLIP_vit_large | 91.03 | 51.65 / 11.18 | 255.78 / 255.78 | 1.05 G | PP-ShiTuV2_rec_CLIP_vit_large.yaml | Inference Model/Training Model |
Note: The above accuracy metrics are based on the AliProducts recall@1.
Document Orientation Classification Module¶
Model Name | Top-1 Acc (%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size | yaml File | Model Download Link |
---|---|---|---|---|---|---|
PP-LCNet_x1_0_doc_ori | 99.06 | 2.31 / 0.43 | 3.37 / 1.27 | 7 | PP-LCNet_x1_0_doc_ori.yaml | Inference Model/Training Model |
Note: The above accuracy metrics are based on the Top-1 Acc of the internal dataset of PaddleX.
Face Feature Module¶
Model Name | Output Feature Dimension | Acc (%) AgeDB-30/CFP-FP/LFW |
GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (M) | yaml File | Model Download Link |
---|---|---|---|---|---|---|---|
MobileFaceNet | 128 | 96.28/96.71/99.58 | 3.16 / 0.48 | 6.49 / 6.49 | 4.1 | MobileFaceNet.yaml | Inference Model/Training Model |
ResNet50_face | 512 | 98.12/98.56/99.77 | 5.68 / 1.09 | 14.96 / 11.90 | 87.2 | ResNet50_face.yaml | Inference Model/Training Model |
Note: The above accuracy metrics are measured on the AgeDB-30, CFP-FP, and LFW datasets.
Main Body Detection Module¶
Model Name | mAP (%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size | yaml File | Model Download Link |
---|---|---|---|---|---|---|
PP-ShiTuV2_det | 41.5 | 12.79 / 4.51 | 44.14 / 44.14 | 27.54 | PP-ShiTuV2_det.yaml | Inference Model/Training Model |
Note: The above accuracy metrics are based on the PaddleClas Main Body Detection Dataset mAP(0.5:0.95).
Object Detection Module¶
Model Name | mAP (%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size | yaml File | Model Download Link |
---|---|---|---|---|---|---|
Cascade-FasterRCNN-ResNet50-FPN | 41.1 | 135.92 / 135.92 | - | 245.4 M | Cascade-FasterRCNN-ResNet50-FPN.yaml | Inference Model/Training Model |
Cascade-FasterRCNN-ResNet50-vd-SSLDv2-FPN | 45.0 | 138.23 / 138.23 | - | 246.2 M | Cascade-FasterRCNN-ResNet50-vd-SSLDv2-FPN.yaml | Inference Model/Training Model |
CenterNet-DLA-34 | 37.6 | - | - | 75.4 M | CenterNet-DLA-34.yaml | Inference Model/Training Model |
CenterNet-ResNet50 | 38.9 | - | - | 319.7 M | CenterNet-ResNet50.yaml | Inference Model/Training Model |
DETR-R50 | 42.3 | 62.91 / 17.33 | 392.63 / 392.63 | 159.3 M | DETR-R50.yaml | Inference Model/Training Model |
FasterRCNN-ResNet34-FPN | 37.8 | 83.33 / 31.64 | - | 137.5 M | FasterRCNN-ResNet34-FPN.yaml | Inference Model/Training Model |
FasterRCNN-ResNet50-FPN | 38.4 | 107.08 / 35.40 | - | 148.1 M | FasterRCNN-ResNet50-FPN.yaml | Inference Model/Training Model |
FasterRCNN-ResNet50-vd-FPN | 39.5 | 109.36 / 36.00 | - | 148.1 M | FasterRCNN-ResNet50-vd-FPN.yaml | Inference Model/Training Model |
FasterRCNN-ResNet50-vd-SSLDv2-FPN | 41.4 | 109.06 / 36.19 | - | 148.1 M | FasterRCNN-ResNet50-vd-SSLDv2-FPN.yaml | Inference Model/Training Model |
FasterRCNN-ResNet50 | 36.7 | 496.33 / 109.12 | - | 120.2 M | FasterRCNN-ResNet50.yaml | Inference Model/Training Model |
FasterRCNN-ResNet101-FPN | 41.4 | 148.21 / 42.21 | - | 216.3 M | FasterRCNN-ResNet101-FPN.yaml | Inference Model/Training Model |
FasterRCNN-ResNet101 | 39.0 | 538.58 / 120.88 | - | 188.1 M | FasterRCNN-ResNet101.yaml | Inference Model/Training Model |
FasterRCNN-ResNeXt101-vd-FPN | 43.4 | 258.01 / 58.25 | - | 360.6 M | FasterRCNN-ResNeXt101-vd-FPN.yaml | Inference Model/Training Model |
FasterRCNN-Swin-Tiny-FPN | 42.6 | - | - | 159.8 M | FasterRCNN-Swin-Tiny-FPN.yaml | Inference Model/Training Model |
FCOS-ResNet50 | 39.6 | 106.13 / 28.32 | 721.79 / 721.79 | 124.2 M | FCOS-ResNet50.yaml | Inference Model/Training Model |
PicoDet-L | 42.6 | 14.68 / 5.81 | 47.32 / 47.32 | 20.9 M | PicoDet-L.yaml | Inference Model/Training Model |
PicoDet-M | 37.5 | 9.62 / 3.23 | 23.75 / 14.88 | 16.8 M | PicoDet-M.yaml | Inference Model/Training Model |
PicoDet-S | 29.1 | 7.98 / 2.33 | 14.82 / 5.60 | 4.4 M | PicoDet-S.yaml | Inference Model/Training Model |
PicoDet-XS | 26.2 | 9.66 / 2.75 | 19.15 / 7.24 | 5.7 M | PicoDet-XS.yaml | Inference Model/Training Model |
PP-YOLOE_plus-L | 52.9 | 33.55 / 10.46 | 189.05 / 189.05 | 185.3 M | PP-YOLOE_plus-L.yaml | Inference Model/Training Model |
PP-YOLOE_plus-M | 49.8 | 19.52 / 7.46 | 113.36 / 113.36 | 83.2 M | PP-YOLOE_plus-M.yaml | Inference Model/Training Model |
PP-YOLOE_plus-S | 43.7 | 12.16 / 4.58 | 73.86 / 52.90 | 28.3 M | PP-YOLOE_plus-S.yaml | Inference Model/Training Model |
PP-YOLOE_plus-X | 54.7 | 58.87 / 15.84 | 292.93 / 292.93 | 349.4 M | PP-YOLOE_plus-X.yaml | Inference Model/Training Model |
RT-DETR-H | 56.3 | 115.92 / 28.16 | 971.32 / 971.32 | 435.8 M | RT-DETR-H.yaml | Inference Model/Training Model |
RT-DETR-L | 53.0 | 35.00 / 10.45 | 495.51 / 495.51 | 113.7 M | RT-DETR-L.yaml | Inference Model/Training Model |
RT-DETR-R18 | 46.5 | 20.21 / 6.23 | 266.01 / 266.01 | 70.7 M | RT-DETR-R18.yaml | Inference Model/Training Model |
RT-DETR-R50 | 53.1 | 42.14 / 11.31 | 523.97 / 523.97 | 149.1 M | RT-DETR-R50.yaml | Inference Model/Training Model |
RT-DETR-X | 54.8 | 61.24 / 15.83 | 647.08 / 647.08 | 232.9 M | RT-DETR-X.yaml | Inference Model/Training Model |
YOLOv3-DarkNet53 | 39.1 | 41.58 / 10.10 | 158.78 / 158.78 | 219.7 M | YOLOv3-DarkNet53.yaml | Inference Model/Training Model |
YOLOv3-MobileNetV3 | 31.4 | 16.53 / 5.70 | 60.44 / 60.44 | 83.8 M | YOLOv3-MobileNetV3.yaml | Inference Model/Training Model |
YOLOv3-ResNet50_vd_DCN | 40.6 | 32.91 / 10.07 | 225.72 / 224.32 | 163.0 M | YOLOv3-ResNet50_vd_DCN.yaml | Inference Model/Training Model |
YOLOX-L | 50.1 | 121.19 / 13.55 | 295.38 / 274.15 | 192.5 M | YOLOX-L.yaml | Inference Model/Training Model |
YOLOX-M | 46.9 | 87.19 / 10.09 | 183.95 / 172.67 | 90.0 M | YOLOX-M.yaml | Inference Model/Training Model |
YOLOX-N | 26.1 | 53.31 / 45.02 | 69.69 / 59.18 | 3.4M | YOLOX-N.yaml | Inference Model/Training Model |
YOLOX-S | 40.4 | 129.52 / 13.19 | 181.39 / 179.01 | 32.0 M | YOLOX-S.yaml | Inference Model/Training Model |
YOLOX-T | 32.9 | 66.81 / 61.31 | 92.30 / 83.90 | 18.1 M | YOLOX-T.yaml | Inference Model/Training Model |
YOLOX-X | 51.8 | 156.40 / 20.17 | 480.14 / 454.35 | 351.5 M | YOLOX-X.yaml | Inference Model/Training Model |
Note: The above accuracy metrics are based on the COCO2017 validation set mAP(0.5:0.95).
Small Object Detection Module¶
Model Name | mAP (%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Size | yaml File | Model Download Links |
---|---|---|---|---|---|---|
PP-YOLOE_plus_SOD-S | 25.1 | 135.68 / 122.94 | 188.09 / 107.74 | 77.3 M | PP-YOLOE_plus_SOD-S.yaml | Inference Model/Training Model |
Note: The above accuracy metrics are based on the validation set mAP(0.5:0.95) of VisDrone-DET.
Open-Vocabulary Object Detection¶
Model | mAP(0.5:0.95) | mAP(0.5) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Size (M) | Model Download Link |
---|---|---|---|---|---|---|
GroundingDINO-T | 49.4 | 64.4 | 253.72 | 1807.4 | 658.3 | Inference Model |
Note: The above accuracy metrics are based on the COCO val2017 validation set mAP(0.5:0.95).
Open Vocabulary Segmentation¶
Model | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (M) | Model Download Link |
---|---|---|---|---|
SAM-H_box | 144.9 | 33920.7 | 2433.7 | Inference Model |
SAM-H_point | 144.9 | 33920.7 | 2433.7 | Inference Model |
Rotated Object Detection¶
Model | mAP(%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (M) | yaml File | Model Download Link |
---|---|---|---|---|---|---|
PP-YOLOE-R-L | 78.14 | 20.7039 | 157.942 | 211.0 M | PP-YOLOE-R.yaml | Inference Model/Training Model |
Note: The above accuracy metrics are based on the DOTA validation set mAP(0.5:0.95).
Pedestrian Detection Module¶
Model Name | mAP (%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size | yaml File | Model Download Link |
---|---|---|---|---|---|---|
PP-YOLOE-L_human | 48.0 | 33.27 / 9.19 | 173.72 / 173.72 | 196.1 M | PP-YOLOE-L_human.yaml | Inference Model/Training Model |
PP-YOLOE-S_human | 42.5 | 9.94 / 3.42 | 54.48 / 46.52 | 28.8 M | PP-YOLOE-S_human.yaml | Inference Model/Training Model |
Note: The above accuracy metrics are based on the validation set mAP(0.5:0.95) of CrowdHuman.
Vehicle Detection Module¶
Model Name | mAP (%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size | yaml File | Model Download Link |
---|---|---|---|---|---|---|
PP-YOLOE-L_vehicle | 63.9 | 32.84 / 9.03 | 176.60 / 176.60 | 196.1 M | PP-YOLOE-L_vehicle.yaml | Inference Model/Training Model |
PP-YOLOE-S_vehicle | 61.3 | 9.79 / 3.48 | 54.14 / 46.69 | 28.8 M | PP-YOLOE-S_vehicle.yaml | Inference Model/Training Model |
Note: The above precision metrics are based on the validation set mAP(0.5:0.95) of PPVehicle
Face Detection Module¶
Model Name | AP (%) Easy/Medium/Hard |
GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size | yaml File | Model Download Link |
---|---|---|---|---|---|---|
BlazeFace | 77.7/73.4/49.5 | 60.34 / 54.76 | 84.18 / 84.18 | 0.447 M | BlazeFace.yaml | Inference Model/Training Model |
BlazeFace-FPN-SSH | 83.2/80.5/60.5 | 69.29 / 63.42 | 86.96 / 86.96 | 0.606 M | BlazeFace-FPN-SSH.yaml | Inference Model/Training Model |
PicoDet_LCNet_x2_5_face | 93.7/90.7/68.1 | 35.37 / 12.88 | 126.24 / 126.24 | 28.9 M | PicoDet_LCNet_x2_5_face.yaml | Inference Model/Training Model |
PP-YOLOE_plus-S_face | 93.9/91.8/79.8 | 22.54 / 8.33 | 138.67 / 138.67 | 26.5 M | PP-YOLOE_plus-S_face | Inference Model/Training Model |
Note: The above precision metrics are evaluated on the WIDER-FACE validation set with an input size of 640x640.
Anomaly Detection Module¶
Model Name | mIoU | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size | yaml File | Model Download Link |
---|---|---|---|---|---|---|
STFPM | 0.9901 | 2.97 / 1.57 | 38.86 / 13.24 | 22.5 M | STFPM.yaml | Inference Model/Training Model |
Note: The above precision metrics are the average anomaly scores on the validation set of MVTec AD.
Model | Scheme | Input Size | AP(0.5:0.95) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (M) | yaml File | Model Download Link |
---|---|---|---|---|---|---|---|---|
PP-TinyPose_128x96 | Top-Down | 128*96 | 58.4 | 4.9 | PP-TinyPose_128x96.yaml | Inference Model/Training Model | ||
PP-TinyPose_256x192 | Top-Down | 256*192 | 68.3 | 4.9 | PP-TinyPose_256x192.yaml | Inference Model/Training Model |
Note: The above accuracy metrics are based on the COCO dataset AP(0.5:0.95), with detection boxes obtained from ground truth annotations.
3D Multi-modal Fusion Detection Module¶
Model | mAP(%) | NDS | yaml File | Model Download Link |
---|---|---|---|---|
BEVFusion | 53.9 | 60.9 | BEVFusion.yaml | Inference Model/Training Model |
Note: The above accuracy metrics are based on the nuscenes validation set with mAP(0.5:0.95) and NDS 60.9, and the precision type is FP32.
Semantic Segmentation Module¶
Model Name | mloU(%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size | yaml File | Model Download Link |
---|---|---|---|---|---|---|
Deeplabv3_Plus-R50 | 80.36 | 503.51 / 122.30 | 3543.91 / 3543.91 | 94.9 M | Deeplabv3_Plus-R50.yaml | Inference Model/Training Model |
Deeplabv3_Plus-R101 | 81.10 | 803.79 / 175.45 | 5136.21 / 5136.21 | 162.5 M | Deeplabv3_Plus-R101.yaml | Inference Model/Training Model |
Deeplabv3-R50 | 79.90 | 647.56 / 121.67 | 3803.09 / 3803.09 | 138.3 M | Deeplabv3-R50.yaml | Inference Model/Training Model |
Deeplabv3-R101 | 80.85 | 950.43 / 178.50 | 5517.14 / 5517.14 | 205.9 M | Deeplabv3-R101.yaml | Inference Model/Training Model |
OCRNet_HRNet-W18 | 80.67 | 286.12 / 80.76 | 1794.03 / 1794.03 | 43.1 M | OCRNet_HRNet-W18.yaml | Inference Model/Training Model |
OCRNet_HRNet-W48 | 82.15 | 627.36 / 170.76 | 3531.61 / 3531.61 | 249.8 M | OCRNet_HRNet-W48.yaml | Inference Model/Training Model |
PP-LiteSeg-T | 73.10 | 30.16 / 14.03 | 420.07 / 235.01 | 28.5 M | PP-LiteSeg-T.yaml | Inference Model/Training Model |
PP-LiteSeg-B | 75.25 | 40.92 / 20.18 | 494.32 / 310.34 | 47.0 M | PP-LiteSeg-B.yaml | Inference Model/Training Model |
SegFormer-B0 (slice) | 76.73 | 11.1946 | 268.929 | 13.2 M | SegFormer-B0.yaml | Inference Model/Training Model |
SegFormer-B1 (slice) | 78.35 | 17.9998 | 403.393 | 48.5 M | SegFormer-B1.yaml | Inference Model/Training Model |
SegFormer-B2 (slice) | 81.60 | 48.0371 | 1248.52 | 96.9 M | SegFormer-B2.yaml | Inference Model/Training Model |
SegFormer-B3 (slice) | 82.47 | 64.341 | 1666.35 | 167.3 M | SegFormer-B3.yaml | Inference Model/Training Model |
SegFormer-B4 (slice) | 82.38 | 82.4336 | 1995.42 | 226.7 M | SegFormer-B4.yaml | Inference Model/Training Model |
SegFormer-B5 (slice) | 82.58 | 97.3717 | 2420.19 | 229.7 M | SegFormer-B5.yaml | Inference Model/Training Model |
Note: The above accuracy metrics are based on the Cityscapes dataset mIoU.
Model Name | mIoU (%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size | yaml File | Model Download Link |
---|---|---|---|---|---|---|
SeaFormer_base(slice) | 40.92 | 24.4073 | 397.574 | 30.8 M | SeaFormer_base.yaml | Inference Model/Training Model |
SeaFormer_large (slice) | 43.66 | 27.8123 | 550.464 | 49.8 M | SeaFormer_large.yaml | Inference Model/Training Model |
SeaFormer_small (slice) | 38.73 | 19.2295 | 358.343 | 14.3 M | SeaFormer_small.yaml | Inference Model/Training Model |
SeaFormer_tiny (slice) | 34.58 | 13.9496 | 330.132 | 6.1M | SeaFormer_tiny.yaml | Inference Model/Training Model |
Note: The above accuracy metrics are based on the ADE20k dataset. "Slice" indicates that the input images have been cropped.
Instance Segmentation Module¶
Model Name | Mask AP | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size | yaml File | Model Download Link |
---|---|---|---|---|---|---|
Mask-RT-DETR-H | 50.6 | 172.36 / 172.36 | 1615.75 / 1615.75 | 449.9 M | Mask-RT-DETR-H.yaml | Inference Model/Training Model |
Mask-RT-DETR-L | 45.7 | 88.18 / 88.18 | 1090.84 / 1090.84 | 113.6 M | Mask-RT-DETR-L.yaml | Inference Model/Training Model |
Mask-RT-DETR-M | 42.7 | 78.69 / 78.69 | - | 66.6 M | Mask-RT-DETR-M.yaml | Inference Model/Training Model |
Mask-RT-DETR-S | 41.0 | 33.5007 | - | 51.8 M | Mask-RT-DETR-S.yaml | Inference Model/Training Model |
Mask-RT-DETR-X | 47.5 | 114.16 / 114.16 | 1240.92 / 1240.92 | 237.5 M | Mask-RT-DETR-X.yaml | Inference Model/Training Model |
Cascade-MaskRCNN-ResNet50-FPN | 36.3 | 141.69 / 141.69 | - | 254.8 M | Cascade-MaskRCNN-ResNet50-FPN.yaml | Inference Model/Training Model |
Cascade-MaskRCNN-ResNet50-vd-SSLDv2-FPN | 39.1 | 147.62 / 147.62 | - | 254.7 M | Cascade-MaskRCNN-ResNet50-vd-SSLDv2-FPN.yaml | Inference Model/Training Model |
MaskRCNN-ResNet50-FPN | 35.6 | 118.30 / 118.30 | - | 157.5 M | MaskRCNN-ResNet50-FPN.yaml | Inference Model/Training Model |
MaskRCNN-ResNet50-vd-FPN | 36.4 | 118.34 / 118.34 | - | 157.5 M | MaskRCNN-ResNet50-vd-FPN.yaml | Inference Model/Training Model |
MaskRCNN-ResNet50 | 32.8 | 228.83 / 228.83 | - | 127.8 M | MaskRCNN-ResNet50.yaml | Inference Model/Training Model |
MaskRCNN-ResNet101-FPN | 36.6 | 148.14 / 148.14 | - | 225.4 M | MaskRCNN-ResNet101-FPN.yaml | Inference Model/Training Model |
MaskRCNN-ResNet101-vd-FPN | 38.1 | 151.12 / 151.12 | - | 225.1 M | MaskRCNN-ResNet101-vd-FPN.yaml | Inference Model/Training Model |
MaskRCNN-ResNeXt101-vd-FPN | 39.5 | 237.55 / 237.55 | - | 370.0 M | MaskRCNN-ResNeXt101-vd-FPN.yaml | Inference Model/Training Model |
PP-YOLOE_seg-S | 32.5 | - | - | 31.5 M | PP-YOLOE_seg-S.yaml | Inference Model/Training Model |
SOLOv2 | 35.5 | - | - | 179.1 M | SOLOv2.yaml | Inference Model/Training Model |
Note: The above accuracy metrics are based on the Mask AP(0.5:0.95) on the COCO2017 validation set.
Text Detection Module¶
Model | Detection Hmean (%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (M) | yaml File | Model Download Link |
---|---|---|---|---|---|---|
PP-OCRv4_server_det | 82.56 | 83.34 / 80.91 | 442.58 / 442.58 | 109 | PP-OCRv4_server_det.yaml | Inference Model/Training Model |
PP-OCRv4_mobile_det | 77.35 | 8.79 / 3.13 | 51.00 / 28.58 | 4.7 | PP-OCRv4_mobile_det.yaml | Inference Model/Training Model |
PP-OCRv3_mobile_det | 78.68 | 8.44 / 2.91 | 27.87 / 27.87 | 2.1 | PP-OCRv3_mobile_det.yaml | Inference Model/Training Model |
PP-OCRv3_server_det | 80.11 | 65.41 / 13.67 | 305.07 / 305.07 | 102.1 | PP-OCRv3_server_det.yaml | Inference Model/Training Model |
Note: The evaluation dataset for the above accuracy metrics is the self-built Chinese and English dataset of PaddleOCR, covering multiple scenarios such as street view, web images, documents, and handwriting, with 593 images for text recognition.
Seal Text Detection Module¶
Model Name | Detection Hmean (%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size | yaml File | Model Download Link |
---|---|---|---|---|---|---|
PP-OCRv4_mobile_seal_det | 96.47 | 7.82 / 3.09 | 48.28 / 23.97 | 4.7M | PP-OCRv4_mobile_seal_det.yaml | Inference Model/Training Model |
PP-OCRv4_server_seal_det | 98.21 | 74.75 / 67.72 | 382.55 / 382.55 | 108.3 M | PP-OCRv4_server_seal_det.yaml | Inference Model/Training Model |
Note: The evaluation set for the above precision metrics is the seal dataset built by PaddleX, which includes 500 seal images.
Text Recognition Module¶
- Chinese Text Recognition Models
Model | Recognition Avg Accuracy(%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (M) | yaml File | Model Download Link |
---|---|---|---|---|---|---|
PP-OCRv4_server_rec_doc | 81.53 | 6.65 / 2.38 | 32.92 / 32.92 | 74.7 M | PP-OCRv4_server_rec_doc.yaml | Inference Model/Training Model |
PP-OCRv4_mobile_rec | 78.74 | 4.82 / 1.20 | 16.74 / 4.64 | 10.6 M | PP-OCRv4_mobile_rec.yaml | Inference Model/Training Model |
PP-OCRv4_server_rec | 80.61 | 6.58 / 2.43 | 33.17 / 33.17 | 71.2 M | PP-OCRv4_server_rec.yaml | Inference Model/Training Model |
PP-OCRv3_mobile_rec | 72.96 | 5.87 / 1.19 | 9.07 / 4.28 | 9.2 M | PP-OCRv3_mobile_rec.yaml | Inference Model/Training Model |
Note: The evaluation set for the above accuracy metrics is a Chinese dataset built by PaddleOCR, covering multiple scenarios such as street view, web images, documents, and handwriting, with 8367 images for text recognition.
Model | Recognition Avg Accuracy(%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (M) | yaml File | Model Download Link |
---|---|---|---|---|---|---|
ch_SVTRv2_rec | 68.81 | 8.08 / 2.74 | 50.17 / 42.50 | 73.9 M | ch_SVTRv2_rec.yaml | Inference Model/Training Model |
Note: The evaluation dataset for the above accuracy metrics is the PaddleOCR Algorithm Model Challenge - Task 1: OCR End-to-End Recognition Task Leaderboard A.
Model | Recognition Avg Accuracy(%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (M) | yaml File | Model Download Link |
---|---|---|---|---|---|---|
ch_RepSVTR_rec | 65.07 | 5.93 / 1.62 | 20.73 / 7.32 | 22.1 M | ch_RepSVTR_rec.yaml | Inference Model/Training Model |
Note: The evaluation dataset for the above accuracy metrics is the PaddleOCR Algorithm Model Challenge - Task 1: OCR End-to-End Recognition Task Leaderboard B.
English Recognition Model
Model | Recognition Avg Accuracy(%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (M) | yaml File | Model Download Link |
---|---|---|---|---|---|---|
en_PP-OCRv4_mobile_rec | 70.39 | 4.81 / 0.75 | 16.10 / 5.31 | 6.8 M | en_PP-OCRv4_mobile_rec.yaml | Inference Model/Training Model |
en_PP-OCRv3_mobile_rec | 70.69 | 5.44 / 0.75 | 8.65 / 5.57 | 7.8 M | en_PP-OCRv3_mobile_rec.yaml | Inference Model/Training Model |
Note: The evaluation set for the above accuracy metrics is an English dataset built by PaddleX.
Multilingual Recognition Model
Model | Recognition Avg Accuracy(%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (M) | yaml File | Model Download Link |
---|---|---|---|---|---|---|
korean_PP-OCRv3_mobile_rec | 60.21 | 5.40 / 0.97 | 9.11 / 4.05 | 8.6 M | korean_PP-OCRv3_mobile_rec.yaml | Inference Model/Training Model |
japan_PP-OCRv3_mobile_rec | 45.69 | 5.70 / 1.02 | 8.48 / 4.07 | 8.8 M | japan_PP-OCRv3_mobile_rec.yaml | Inference Model/Training Model |
chinese_cht_PP-OCRv3_mobile_rec | 82.06 | 5.90 / 1.28 | 9.28 / 4.34 | 9.7 M | chinese_cht_PP-OCRv3_mobile_rec.yaml | Inference Model/Training Model |
te_PP-OCRv3_mobile_rec | 95.88 | 5.42 / 0.82 | 8.10 / 6.91 | 7.8 M | te_PP-OCRv3_mobile_rec.yaml | Inference Model/Training Model |
ka_PP-OCRv3_mobile_rec | 96.96 | 5.25 / 0.79 | 9.09 / 3.86 | 8.0 M | ka_PP-OCRv3_mobile_rec.yaml | Inference Model/Training Model |
ta_PP-OCRv3_mobile_rec | 76.83 | 5.23 / 0.75 | 10.13 / 4.30 | 8.0 M | ta_PP-OCRv3_mobile_rec.yaml | Inference Model/Training Model |
latin_PP-OCRv3_mobile_rec | 76.93 | 5.20 / 0.79 | 8.83 / 7.15 | 7.8 M | latin_PP-OCRv3_mobile_rec.yaml | Inference Model/Training Model |
arabic_PP-OCRv3_mobile_rec | 73.55 | 5.35 / 0.79 | 8.80 / 4.56 | 7.8 M | arabic_PP-OCRv3_mobile_rec.yaml | Inference Model/Training Model |
cyrillic_PP-OCRv3_mobile_rec | 94.28 | 5.23 / 0.76 | 8.89 / 3.88 | 7.9 M | cyrillic_PP-OCRv3_mobile_rec.yaml | Inference Model/Training Model |
devanagari_PP-OCRv3_mobile_rec | 96.44 | 5.22 / 0.79 | 8.56 / 4.06 | 7.9 M | devanagari_PP-OCRv3_mobile_rec.yaml | Inference Model/Training Model |
Note: The evaluation set for the above accuracy metrics is a multi-language dataset built by PaddleX.
Formula Recognition Module¶
Model | Avg-BLEU(%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (M) | yaml File | Model Download Link | UniMERNet | 86.13 | 2266.96/- | -/- | 1.4 G | UniMERNet.yaml | Inference Model/Training Model |
---|---|---|---|---|---|---|
PP-FormulaNet-S | 87.12 | 202.25/- | -/- | 167.9 M | PP-FormulaNet-S.yaml | Inference Model/Training Model | PP-FormulaNet-L | 92.13 | 1976.52/- | -/- | 535.2 M | PP-FormulaNet-L.yaml | Inference Model/Training Model |
LaTeX_OCR_rec | 71.63 | -/- | -/- | 89.7 M | LaTeX_OCR_rec.yaml | Inference Model/Training Model |
Note: The above accuracy metrics are measured from the internal formula recognition test set of PaddleX. The BLEU score of LaTeX_OCR_rec on the LaTeX-OCR formula recognition test set is 0.8821. All model GPU inference times are based on Tesla V100 GPUs, with precision type FP32.
Table Structure Recognition Module¶
Model | Accuracy (%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (M) | yaml File | Model Download Link |
---|---|---|---|---|---|---|
SLANet | 59.52 | 103.08 / 103.08 | 197.99 / 197.99 | 6.9 M | SLANet.yaml | Inference Model/Training Model |
SLANet_plus | 63.69 | 140.29 / 140.29 | 195.39 / 195.39 | 6.9 M | SLANet_plus.yaml | Inference Model/Training Model |
SLANeXt_wired | 69.65 | -- | -- | -- | SLANeXt_wired.yaml | Inference Model/Training Model |
SLANeXt_wireless | SLANeXt_wireless.yaml | Inference Model/Training Model |
Note: The above accuracy metrics are measured from the high-difficulty Chinese table recognition dataset built internally by PaddleX.
Table Cell Detection Module¶
Model | Model Download Link | mAP(%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (M) | Introduction |
---|---|---|---|---|---|---|
RT-DETR-L_wired_table_cell_det | Inference Model/Training Model | 82.7 | 35.00 / 10.45 | 495.51 / 495.51 | 124M | RT-DETR is the first real-time end-to-end object detection model. The Baidu PaddlePaddle Vision Team, based on RT-DETR-L as the base model, has completed pretraining on a self-built table cell detection dataset, achieving good performance for both wired and wireless table cell detection. |
RT-DETR-L_wireless_table_cell_det | Inference Model/Training Model |
Note: The above accuracy metrics are measured from the internal table cell detection dataset of PaddleX.
Table Classification Module¶
Model | Top1 Acc(%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (M) | yaml File | Model Download Link |
---|---|---|---|---|---|---|
PP-LCNet_x1_0_table_cls | -- | -- | -- | -- | PP-LCNet_x1_0_table_cls.yaml | Inference Model/Training Model |
Note: The above accuracy metrics are measured from the internal table classification dataset built by PaddleX.
Text Image Unwarping Module¶
Model Name | MS-SSIM (%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size | yaml File | Model Download Link |
---|---|---|---|---|---|---|
UVDoc | 54.40 | 16.27 / 7.76 | 176.97 / 80.60 | 30.3 M | UVDoc.yaml | Inference Model/Training Model |
Note: The above accuracy metrics are measured from the image unwarping dataset built by PaddleX.
Layout Detection Module¶
- Table Layout Detection Model
Model | mAP(0.5) (%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (M) | yaml File | Model Download Link |
---|---|---|---|---|---|---|
PicoDet_layout_1x_table | 97.5 | 8.02 / 3.09 | 23.70 / 20.41 | 7.4 M | PicoDet_layout_1x_table.yaml | Inference Model/Training Model |
Note: The evaluation set for the above accuracy metrics is the layout table area detection dataset built by PaddleOCR, which contains 7835 images of document types with tables in both Chinese and English. 3 types of layout detection models, including tables, images, and seals
Model | mAP(0.5) (%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (M) | yaml File | Model Download Link |
---|---|---|---|---|---|---|
PicoDet-S_layout_3cls | 88.2 | 8.99 / 2.22 | 16.11 / 8.73 | 4.8 | PicoDet-S_layout_3cls.yaml | Inference Model/Training Model |
PicoDet-L_layout_3cls | 89.0 | 13.05 / 4.50 | 41.30 / 41.30 | 22.6 | PicoDet-L_layout_3cls.yaml | Inference Model/Training Model |
RT-DETR-H_layout_3cls | 95.8 | 114.93 / 27.71 | 947.56 / 947.56 | 470.1 | RT-DETR-H_layout_3cls.yaml | Inference Model/Training Model |
Note: The evaluation dataset for the above accuracy metrics is the layout area detection dataset built by PaddleOCR, which includes 1,154 common types of document images such as Chinese and English papers, magazines, and research reports.
- 5-class English document layout detection model, including text, title, table, image, and list
Model | mAP(0.5) (%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Size (M) | yaml File | Model Download Link |
---|---|---|---|---|---|---|
PicoDet_layout_1x | 97.8 | 9.03 / 3.10 | 25.82 / 20.70 | 7.4 | PicoDet_layout_1x.yaml | Inference Model/Training Model |
Note: The evaluation dataset for the above accuracy metrics is the PubLayNet evaluation dataset, which contains 11,245 images of English documents.
- 17-class layout detection model, including 17 common layout categories: paragraph title, image, text, number, abstract, content, figure title, formula, table, table title, reference, document title, footnote, header, algorithm, footer, and seal
Model | mAP(0.5) (%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Size (M) | yaml File | Model Download Link |
---|---|---|---|---|---|---|
PicoDet-S_layout_17cls | 87.4 | 9.11 / 2.12 | 15.42 / 9.12 | 4.8 | PicoDet-S_layout_17cls.yaml | Inference Model/Training Model |
Note: The evaluation set for the above accuracy metrics is the layout area detection dataset built by PaddleOCR, which includes 892 images of common document types such as Chinese and English papers, magazines, and research reports.
Document Image Orientation Classification Module¶
Model | Top-1 Acc (%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Size (M) | yaml File | Model Download Link |
---|---|---|---|---|---|---|
PP-LCNet_x1_0_doc_ori | 99.06 | 2.31 / 0.43 | 3.37 / 1.27 | 7 | PP-LCNet_x1_0_doc_ori.yaml | Inference Model/Training Model |
Note: The evaluation set for the above accuracy metrics is a self-built dataset covering multiple scenarios such as documents and certificates, with 1000 images.
Text Line Orientation Classification Module¶
Model | Top-1 Acc (%) | GPU Inference Time (ms) [Standard Mode / High-Performance Mode] |
CPU Inference Time (ms) [Standard Mode / High-Performance Mode] |
Model Storage Size (M) | YAML File | Model Download Link |
---|---|---|---|---|---|---|
PP-LCNet_x1_0_doc_ori | 99.06 | 2.31 / 0.43 | 3.37 / 1.27 | 7 | PP-LCNet_x0_25_textline_ori.yaml | Inference Model/Training Model |
Note: The evaluation dataset for the above accuracy metrics is a self-built dataset covering multiple scenarios such as certificates and documents, with 1,000 images.
Time Series Forecasting Module¶
Model Name | mse | mae | Model Storage Size | yaml File | Model Download Link |
---|---|---|---|---|---|
DLinear | 0.382 | 0.394 | 72 K | DLinear.yaml | Inference Model/Training Model |
NLinear | 0.386 | 0.392 | 40 K | NLinear.yaml | Inference Model/Training Model |
Nonstationary | 0.600 | 0.515 | 55.5 M | Nonstationary.yaml | Inference Model/Training Model |
PatchTST | 0.379 | 0.391 | 2.0 M | PatchTST.yaml | Inference Model/Training Model |
RLinear | 0.385 | 0.392 | 40 K | RLinear.yaml | Inference Model/Training Model |
TiDE | 0.407 | 0.414 | 31.7 M | TiDE.yaml | Inference Model/Training Model |
TimesNet | 0.416 | 0.429 | 4.9 M | TimesNet.yaml | Inference Model/Training Model |
Note: The above accuracy metrics are measured from the ETTH1 dataset (evaluation results on the test.csv test set).
Time Series Anomaly Detection Module¶
Model Name | Precision | Recall | F1 Score | Model Storage Size | YAML File | Model Download Link |
---|---|---|---|---|---|---|
AutoEncoder_ad | 99.36 | 84.36 | 91.25 | 52 K | AutoEncoder_ad.yaml | Inference Model/Training Model |
DLinear_ad | 98.98 | 93.96 | 96.41 | 112 K | DLinear_ad.yaml | Inference Model/Training Model |
Nonstationary_ad | 98.55 | 88.95 | 93.51 | 1.8 M | Nonstationary_ad.yaml | Inference Model/Training Model |
PatchTST_ad | 98.78 | 90.70 | 94.57 | 320 K | PatchTST_ad.yaml | Inference Model/Training Model |
Note: The above precision metrics are measured from the PSM dataset.
Time Series Classification Module¶
Model Name | acc(%) | Model Storage Size | yaml File | Model Download Link |
---|---|---|---|---|
TimesNet_cls | 87.5 | 792 K | TimesNet_cls.yaml | Inference Model/Training Model |
Note: The above accuracy metrics are measured from the UWaveGestureLibrary dataset.
Multilingual Speech Recognition Module¶
Model | Training Data | Model Size | Word Error Rate | YAML File | Model Download Link |
---|---|---|---|---|---|
whisper_large | 680kh | 5.8G | 2.7 (Librispeech) | whisper_large.yaml | Inference Model |
whisper_medium | 680kh | 2.9G | - | whisper_medium.yaml | Inference Model |
whisper_small | 680kh | 923M | - | whisper_small.yaml | Inference Model |
whisper_base | 680kh | 277M | - | whisper_base.yaml | Inference Model |
whisper_tiny | 680kh | 145M | - | whisper_small.yaml | Inference Model |
Video Classification Module¶
Model | Top1 Acc(%) | Model Storage Size (M) | yaml File | Model Download Link |
---|---|---|---|---|
PP-TSM-R50_8frames_uniform | 74.36 | 93.4 M | PP-TSM-R50_8frames_uniform.yaml | Inference Model/Training Model |
PP-TSMv2-LCNetV2_8frames_uniform | 71.71 | 22.5 M | PP-TSMv2-LCNetV2_8frames_uniform.yaml | Inference Model/Training Model |
PP-TSMv2-LCNetV2_16frames_uniform | 73.11 | 22.5 M | PP-TSMv2-LCNetV2_16frames_uniform.yaml | Inference Model/Training Model |
Note: The above accuracy metrics are based on the K400 validation set Top1 Acc.
Video Detection Module¶
Model | Frame-mAP(@ IoU 0.5) | Model Storage Size (M) | yaml File | Model Download Link |
---|---|---|---|---|
YOWO | 80.94 | 462.891M | YOWO.yaml | Inference Model/Training Model |
Note: The above accuracy metrics are based on the test dataset UCF101-24, using the Frame-mAP (@ IoU 0.5) metric.
Test Environment Description:
-
Performance Test Environment
-
Inference Mode Description
Mode | GPU Configuration | CPU Configuration | Acceleration Technology Combination |
---|---|---|---|
Normal Mode | FP32 Precision / No TRT Acceleration | FP32 Precision / 8 Threads | PaddleInference |
High-Performance Mode | Optimal combination of pre-selected precision types and acceleration strategies | FP32 Precision / 8 Threads | Pre-selected optimal backend (Paddle/OpenVINO/TRT, etc.) |