PaddleX Model List (CPU/GPU)¶

PaddleX includes multiple pipelines, each containing several modules, and each module includes several models. You can choose which models to use based on the benchmark data below. If you prioritize model accuracy, choose models with higher accuracy. If you prioritize model inference speed, choose models with faster inference speed. If you prioritize model storage size, choose models with smaller storage size.

Image Classification Module ¶

Model Name	Top1 Acc (%)	GPU Inference Time (ms) [Normal Mode / High-Performance Mode]	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	Model Storage Size (MB)	yaml File	Model Download Link
CLIP_vit_base_patch16_224	85.36	12.03 / 2.49	60.86 / 42.69	331	CLIP_vit_base_patch16_224.yaml	Inference Model/Training Model
CLIP_vit_large_patch14_224	88.1	49.15 / 9.75	223.16 / 206.49	1040	CLIP_vit_large_patch14_224.yaml	Inference Model/Training Model
ConvNeXt_base_224	83.84	11.37 / 5.65	143.98 / 52.31	313.9	ConvNeXt_base_224.yaml	Inference Model/Training Model
ConvNeXt_base_384	84.90	29.48 / 11.17	293.76 / 134.27	313.9	ConvNeXt_base_384.yaml	Inference Model/Training Model
ConvNeXt_large_224	84.26	22.99 / 12.73	220.79 / 113.24	700.7	ConvNeXt_large_224.yaml	Inference Model/Training Model
ConvNeXt_large_384	85.27	58.90 / 24.63	509.48 / 260.27	700.7	ConvNeXt_large_384.yaml	Inference Model/Training Model
ConvNeXt_small	83.13	7.72 / 4.35	95.92 / 33.34	178.0	ConvNeXt_small.yaml	Inference Model/Training Model
ConvNeXt_tiny	82.03	6.00 / 2.47	63.59 / 18.23	101.4	ConvNeXt_tiny.yaml	Inference Model/Training Model
FasterNet-L	83.5	11.96 / 2.68	51.93 / 35.33	357.1	FasterNet-L.yaml	Inference Model/Training Model
FasterNet-M	83.0	11.17 / 2.16	38.49 / 21.17	204.6	FasterNet-M.yaml	Inference Model/Training Model
FasterNet-S	81.3	7.70 / 1.24	19.51 / 11.22	119.3	FasterNet-S.yaml	Inference Model/Training Model
FasterNet-T0	71.9	4.73 / 0.82	6.40 / 1.96	15.1	FasterNet-T0.yaml	Inference Model/Training Model
FasterNet-T1	75.9	4.80 / 0.80	8.14 / 3.13	29.2	FasterNet-T1.yaml	Inference Model/Training Model
FasterNet-T2	79.1	6.10 / 0.88	12.71 / 5.35	57.4	FasterNet-T2.yaml	Inference Model/Training Model
MobileNetV1_x0_5	63.5	1.98 / 0.51	2.50 / 1.04	4.8	MobileNetV1_x0_5.yaml	Inference Model/Training Model
MobileNetV1_x0_25	51.4	1.99 / 0.45	1.82 / 0.73	1.8	MobileNetV1_x0_25.yaml	Inference Model/Training Model
MobileNetV1_x0_75	68.8	2.33 / 0.41	3.33 / 1.34	9.3	MobileNetV1_x0_75.yaml	Inference Model/Training Model
MobileNetV1_x1_0	71.0	2.31 / 0.45	3.91 / 1.89	15.2	MobileNetV1_x1_0.yaml	Inference Model/Training Model
MobileNetV2_x0_5	65.0	3.58 / 0.62	3.86 / 1.23	7.1	MobileNetV2_x0_5.yaml	Inference Model/Training Model
MobileNetV2_x0_25	53.2	3.05 / 0.66	3.30 / 0.98	5.5	MobileNetV2_x0_25.yaml	Inference Model/Training Model
MobileNetV2_x1_0	72.2	3.85 / 0.63	5.50 / 1.87	12.6	MobileNetV2_x1_0.yaml	Inference Model/Training Model
MobileNetV2_x1_5	74.1	3.93 / 0.73	8.84 / 3.12	25.0	MobileNetV2_x1_5.yaml	Inference Model/Training Model
MobileNetV2_x2_0	75.2	3.89 / 0.79	10.36 / 4.50	41.2	MobileNetV2_x2_0.yaml	Inference Model/Training Model
MobileNetV3_large_x0_5	69.2	4.60 / 0.77	5.32 / 1.58	9.6	MobileNetV3_large_x0_5.yaml	Inference Model/Training Model
MobileNetV3_large_x0_35	64.3	4.44 / 0.75	5.20 / 1.50	7.5	MobileNetV3_large_x0_35.yaml	Inference Model/Training Model
MobileNetV3_large_x0_75	73.1	5.30 / 0.85	6.02 / 1.93	14.0	MobileNetV3_large_x0_75.yaml	Inference Model/Training Model
MobileNetV3_large_x1_0	75.3	5.38 / 0.81	7.16 / 2.19	19.5	MobileNetV3_large_x1_0.yaml	Inference Model/Training Model
MobileNetV3_large_x1_25	76.4	5.54 / 0.84	7.06 / 2.84	26.5	MobileNetV3_large_x1_25.yaml	Inference Model/Training Model
MobileNetV3_small_x0_5	59.2	3.87 / 0.77	4.90 / 1.32	6.8	MobileNetV3_small_x0_5.yaml	Inference Model/Training Model
MobileNetV3_small_x0_35	53.0	3.68 / 0.77	3.94 / 1.27	6.0	MobileNetV3_small_x0_35.yaml	Inference Model/Training Model
MobileNetV3_small_x0_75	66.0	3.92 / 0.77	4.68 / 1.39	8.5	MobileNetV3_small_x0_75.yaml	Inference Model/Training Model
MobileNetV3_small_x1_0	68.2	4.23 / 0.78	5.24 / 1.48	10.5	MobileNetV3_small_x1_0.yaml	Inference Model/Training Model
MobileNetV3_small_x1_25	70.7	4.59 / 0.79	5.36 / 1.63	13.0	MobileNetV3_small_x1_25.yaml	Inference Model/Training Model
MobileNetV4_conv_large	83.4	9.04 / 2.28	34.34 / 22.01	125.2	MobileNetV4_conv_large.yaml	Inference Model/Training Model
MobileNetV4_conv_medium	79.9	5.70 / 1.05	13.78 / 5.64	37.6	MobileNetV4_conv_medium.yaml	Inference Model/Training Model
MobileNetV4_conv_small	74.6	3.81 / 0.55	5.24 / 1.50	14.7	MobileNetV4_conv_small.yaml	Inference Model/Training Model
MobileNetV4_hybrid_large	83.8	13.43 / 4.28	61.16 / 31.06	145.1	MobileNetV4_hybrid_large.yaml	Inference Model/Training Model
MobileNetV4_hybrid_medium	80.5	11.82 / 1.30	22.01 / 6.06	42.9	MobileNetV4_hybrid_medium.yaml	Inference Model/Training Model
PP-HGNet_base	85.0	13.43 / 3.81	71.24 / 51.48	249.4	PP-HGNet_base.yaml	Inference Model/Training Model
PP-HGNet_small	81.51	5.87 / 1.68	25.58 / 18.50	86.5	PP-HGNet_small.yaml	Inference Model/Training Model
PP-HGNet_tiny	79.83	5.84 / 1.38	17.03 / 10.58	52.4	PP-HGNet_tiny.yaml	Inference Model/Training Model
PP-HGNetV2-B0	77.77	4.41 / 0.87	10.58 / 1.87	21.4	PP-HGNetV2-B0.yaml	Inference Model/Training Model
PP-HGNetV2-B1	79.18	4.52 / 0.73	11.98 / 2.28	22.6	PP-HGNetV2-B1.yaml	Inference Model/Training Model
PP-HGNetV2-B2	81.74	6.67 / 0.96	14.22 / 4.04	39.9	PP-HGNetV2-B2.yaml	Inference Model/Training Model
PP-HGNetV2-B3	82.98	7.47 / 1.94	17.73 / 5.63	57.9	PP-HGNetV2-B3.yaml	Inference Model/Training Model
PP-HGNetV2-B4	83.57	7.05 / 1.16	16.23 / 7.55	70.4	PP-HGNetV2-B4.yaml	Inference Model/Training Model
PP-HGNetV2-B5	84.75	10.38 / 1.95	31.53 / 18.02	140.8	PP-HGNetV2-B5.yaml	Inference Model/Training Model
PP-HGNetV2-B6	86.30	13.86 / 3.28	67.25 / 56.70	268.4	PP-HGNetV2-B6.yaml	Inference Model/Training Model
PP-LCNet_x0_5	63.14	2.41 / 0.60	2.54 / 0.90	6.7	PP-LCNet_x0_5.yaml	Inference Model/Training Model
PP-LCNet_x0_25	51.86	2.16 / 0.60	2.73 / 0.77	5.5	PP-LCNet_x0_25.yaml	Inference Model/Training Model
PP-LCNet_x0_35	58.09	2.18 / 0.60	2.32 / 0.89	5.9	PP-LCNet_x0_35.yaml	Inference Model/Training Model
PP-LCNet_x0_75	68.18	2.61 / 0.58	3.00 / 1.09	8.4	PP-LCNet_x0_75.yaml	Inference Model/Training Model
PP-LCNet_x1_0	71.32	2.59 / 0.68	3.18 / 1.19	10.5	PP-LCNet_x1_0.yaml	Inference Model/Training Model
PP-LCNet_x1_5	73.71	2.60 / 0.68	3.98 / 1.66	16.0	PP-LCNet_x1_5.yaml	Inference Model/Training Model
PP-LCNet_x2_0	75.18	2.53 / 0.68	5.21 / 2.24	23.2	PP-LCNet_x2_0.yaml	Inference Model/Training Model
PP-LCNet_x2_5	76.60	2.76 / 0.67	6.78 / 3.20	32.1	PP-LCNet_x2_5.yaml	Inference Model/Training Model
PP-LCNetV2_base	77.05	4.04 / 0.62	6.80 / 2.67	23.7	PP-LCNetV2_base.yaml	Inference Model/Training Model
PP-LCNetV2_large	78.51	4.91 / 0.85	10.30 / 5.38	37.3	PP-LCNetV2_large.yaml	Inference Model/Training Model
PP-LCNetV2_small	73.97	3.07 / 0.60	4.28 / 1.58	14.6	PP-LCNetV2_small.yaml	Inference Model/Training Model
ResNet18_vd	72.3	2.87 / 0.77	7.91 / 4.64	41.5	ResNet18_vd.yaml	Inference Model/Training Model
ResNet18	71.0	2.63 / 0.74	6.30 / 4.16	41.5	ResNet18.yaml	Inference Model/Training Model
ResNet34_vd	76.0	4.47 / 1.09	14.30 / 8.33	77.3	ResNet34_vd.yaml	Inference Model/Training Model
ResNet34	74.6	4.20 / 1.07	12.53 / 7.83	77.3	ResNet34.yaml	Inference Model/Training Model
ResNet50_vd	79.1	6.66 / 1.23	16.34 / 10.00	90.8	ResNet50_vd.yaml	Inference Model/Training Model
ResNet50	76.5	6.25 / 1.17	15.93 / 9.72	90.8	ResNet50.yaml	Inference Model/Training Model
ResNet101_vd	80.2	11.93 / 2.07	32.47 / 23.62	158.4	ResNet101_vd.yaml	Inference Model/Training Model
ResNet101	77.6	13.73 / 2.06	29.69 / 17.72	158.7	ResNet101.yaml	Inference Model/Training Model
ResNet152_vd	80.6	20.70 / 2.82	43.90 / 27.91	214.3	ResNet152_vd.yaml	Inference Model/Training Model
ResNet152	78.3	17.86 / 2.79	46.19 / 26.00	214.2	ResNet152.yaml	Inference Model/Training Model
ResNet200_vd	80.9	22.55 / 3.54	58.54 / 35.70	266.0	ResNet200_vd.yaml	Inference Model/Training Model
StarNet-S1	73.6	6.24 / 0.96	8.78 / 2.44	11.2	StarNet-S1.yaml	Inference Model/Training Model
StarNet-S2	74.8	4.78 / 0.85	7.24 / 2.48	14.3	StarNet-S2.yaml	Inference Model/Training Model
StarNet-S3	77.0	6.77 / 1.07	9.69 / 3.35	22.2	StarNet-S3.yaml	Inference Model/Training Model
StarNet-S4	79.0	9.01 / 1.48	14.79 / 4.58	28.9	StarNet-S4.yaml	Inference Model/Training Model
SwinTransformer_base_patch4_window7_224	83.37	13.04 / 10.77	133.79 / 118.45	340	SwinTransformer_base_patch4_window7_224.yaml	Inference Model/Training Model
SwinTransformer_base_patch4_window12_384	84.17	33.99 / 28.42	400.19 / 317.36	311.4	SwinTransformer_base_patch4_window12_384.yaml	Inference Model/Training Model
SwinTransformer_large_patch4_window7_224	86.19	23.69 / 6.18	198.60 / 177.18	694.8	SwinTransformer_large_patch4_window7_224.yaml	Inference Model/Training Model
SwinTransformer_large_patch4_window12_384	87.06	68.07 / 14.84	609.07 / 525.72	696.1	SwinTransformer_large_patch4_window12_384.yaml	Inference Model/Training Model
SwinTransformer_small_patch4_window7_224	83.21	12.17 / 3.51	111.03 / 92.51	175.6	SwinTransformer_small_patch4_window7_224.yaml	Inference Model/Training Model
SwinTransformer_tiny_patch4_window7_224	81.10	7.11 / 2.01	62.72 / 47.35	100.1	SwinTransformer_tiny_patch4_window7_224.yaml	Inference Model/Training Model

Note: The above accuracy metrics are based on the ImageNet-1k validation set Top1 Acc.

Image Multi-label Classification Module ¶

Model Name	mAP (%)	GPU Inference Time (ms) [Normal Mode / High-Performance Mode]	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	Model Storage Size (MB)	yaml File	Model Download Link
CLIP_vit_base_patch16_448_ML	89.15	48.87 / 8.10	275.33 / 188.48	325.6	CLIP_vit_base_patch16_448_ML.yaml	Inference Model/Training Model
PP-HGNetV2-B0_ML	80.98	7.15 / 1.77	21.35 / 8.19	39.6	PP-HGNetV2-B0_ML.yaml	Inference Model/Training Model
PP-HGNetV2-B4_ML	87.96	8.11 / 2.82	44.76 / 29.38	88.5	PP-HGNetV2-B4_ML.yaml	Inference Model/Training Model
PP-HGNetV2-B6_ML	91.06	34.54 / 8.22	189.17 / 189.17	286.5	PP-HGNetV2-B6_ML.yaml	Inference Model/Training Model
PP-LCNet_x1_0_ML	77.96	5.28 / 1.62	13.16 / 5.61	29.4	PP-LCNet_x1_0_ML.yaml	Inference Model/Training Model
ResNet50_ML	83.42	10.54 / 2.97	55.39 / 35.52	108.9	ResNet50_ML.yaml	Inference Model/Training Model

Note: The above accuracy metrics are for the multi-label classification task mAP on COCO2017.

Pedestrian Attribute Module ¶

Model Name	mA (%)	GPU Inference Time (ms) [Normal Mode / High-Performance Mode]	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	Model Storage Size (MB)	yaml File	Model Download Link
PP-LCNet_x1_0_pedestrian_attribute	92.2	2.52 / 0.66	2.60 / 1.07	6.7	PP-LCNet_x1_0_pedestrian_attribute.yaml	Inference Model/Training Model

Note: The above accuracy metrics are for the internal PaddleX dataset mA.

Vehicle Attribute Module ¶

Model Name	mA (%)	GPU Inference Time (ms) [Normal Mode / High-Performance Mode]	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	Model Storage Size (MB)	yaml File	Model Download Link
PP-LCNet_x1_0_vehicle_attribute	91.7	2.53 / 0.67	2.73 / 1.10	6.7	PP-LCNet_x1_0_vehicle_attribute.yaml	Inference Model/Training Model

Note: The above accuracy metrics are based on the VeRi dataset mA.

Image Feature Module ¶

Model Name	recall@1 (%)	GPU Inference Time (ms) [Normal Mode / High-Performance Mode]	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	Model Storage Size (MB)	yaml File	Model Download Link
PP-ShiTuV2_rec	84.2	3.91 / 1.06	6.82 / 2.89	16.3	PP-ShiTuV2_rec.yaml	Inference Model/Training Model
PP-ShiTuV2_rec_CLIP_vit_base	88.69	12.57 / 11.62	67.09 / 67.09	306.6	PP-ShiTuV2_rec_CLIP_vit_base.yaml	Inference Model/Training Model
PP-ShiTuV2_rec_CLIP_vit_large	91.03	49.85 / 49.85	229.14 / 229.14	1050	PP-ShiTuV2_rec_CLIP_vit_large.yaml	Inference Model/Training Model

Note: The above accuracy metrics are based on the AliProducts recall@1.

Face Feature Module ¶

Model Name	Output Feature Dimension	Acc (%) AgeDB-30/CFP-FP/LFW	GPU Inference Time (ms) [Normal Mode / High-Performance Mode]	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	Model Storage Size (MB)	yaml File	Model Download Link
MobileFaceNet	128	96.28/96.71/99.58	3.31 / 0.73	5.93 / 1.30	4.1	MobileFaceNet.yaml	Inference Model/Training Model
ResNet50_face	512	98.12/98.56/99.77	6.12 / 3.11	15.85 / 9.44	87.2	ResNet50_face.yaml	Inference Model/Training Model

Note: The above accuracy metrics are measured on the AgeDB-30, CFP-FP, and LFW datasets.

Main Body Detection Module ¶

Model Name	mAP (%)	GPU Inference Time (ms) [Normal Mode / High-Performance Mode]	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	Model Storage Size (MB)	yaml File	Model Download Link
PP-ShiTuV2_det	41.5	11.81 / 4.53	43.03 / 25.31	27.54	PP-ShiTuV2_det.yaml	Inference Model/Training Model

Note: The above accuracy metrics are based on the PaddleClas Main Body Detection Dataset mAP(0.5:0.95).

Object Detection Module ¶

Model Name	mAP (%)	GPU Inference Time (ms) [Normal Mode / High-Performance Mode]	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	Model Storage Size (MB)	yaml File	Model Download Link
Cascade-FasterRCNN-ResNet50-FPN	41.1	120.28 / 120.28	- / 6514.61	245.4	Cascade-FasterRCNN-ResNet50-FPN.yaml	Inference Model/Training Model
Cascade-FasterRCNN-ResNet50-vd-SSLDv2-FPN	45.0	124.10 / 124.10	- / 6709.52	246.2	Cascade-FasterRCNN-ResNet50-vd-SSLDv2-FPN.yaml	Inference Model/Training Model
CenterNet-DLA-34	37.6	67.19 / 67.19	6622.61 / 6622.61	75.4	CenterNet-DLA-34.yaml	Inference Model/Training Model
CenterNet-ResNet50	38.9	216.06 / 216.06	2545.79 / 2545.79	319.7	CenterNet-ResNet50.yaml	Inference Model/Training Model
DETR-R50	42.3	58.80 / 26.90	370.96 / 208.77	159.3	DETR-R50.yaml	Inference Model/Training Model
FasterRCNN-ResNet34-FPN	37.8	76.90 / 76.90	- / 4136.79	137.5	FasterRCNN-ResNet34-FPN.yaml	Inference Model/Training Model
FasterRCNN-ResNet50-FPN	38.4	95.48 / 95.48	- / 3693.90	148.1	FasterRCNN-ResNet50-FPN.yaml	Inference Model/Training Model
FasterRCNN-ResNet50-vd-FPN	39.5	98.03 / 98.03	- / 4278.36	148.1	FasterRCNN-ResNet50-vd-FPN.yaml	Inference Model/Training Model
FasterRCNN-ResNet50-vd-SSLDv2-FPN	41.4	99.23 / 99.23	- / 4415.68	148.1	FasterRCNN-ResNet50-vd-SSLDv2-FPN.yaml	Inference Model/Training Model
FasterRCNN-ResNet50	36.7	129.10 / 129.10	- / 3868.44	120.2	FasterRCNN-ResNet50.yaml	Inference Model/Training Model
FasterRCNN-ResNet101-FPN	41.4	131.48 / 131.48	- / 4380.00	216.3	FasterRCNN-ResNet101-FPN.yaml	Inference Model/Training Model
FasterRCNN-ResNet101	39.0	216.71 / 216.71	- / 5376.45	188.1	FasterRCNN-ResNet101.yaml	Inference Model/Training Model
FasterRCNN-ResNeXt101-vd-FPN	43.4	234.38 / 234.38	- / 6154.61	360.6	FasterRCNN-ResNeXt101-vd-FPN.yaml	Inference Model/Training Model
FasterRCNN-Swin-Tiny-FPN	42.6	65.92 / 65.92	- / 2468.98	159.8	FasterRCNN-Swin-Tiny-FPN.yaml	Inference Model/Training Model
FCOS-ResNet50	39.6	101.02 / 34.42	752.15 / 752.15	124.2	FCOS-ResNet50.yaml	Inference Model/Training Model
PicoDet-L	42.6	14.31 / 11.06	45.95 / 25.06	20.9	PicoDet-L.yaml	Inference Model/Training Model
PicoDet-M	37.5	10.48 / 5.00	22.88 / 9.03	16.8	PicoDet-M.yaml	Inference Model/Training Model
PicoDet-S	29.1	9.15 / 3.26	16.06 / 4.04	4.4	PicoDet-S.yaml	Inference Model/Training Model
PicoDet-XS	26.2	9.54 / 3.52	17.96 / 5.38	5.7	PicoDet-XS.yaml	Inference Model/Training Model
PP-YOLOE_plus-L	52.9	32.06 / 28.00	185.32 / 116.21	185.3	PP-YOLOE_plus-L.yaml	Inference Model/Training Model
PP-YOLOE_plus-M	49.8	18.37 / 15.04	108.77 / 63.48	83.2	PP-YOLOE_plus-M.yaml	Inference Model/Training Model
PP-YOLOE_plus-S	43.7	11.43 / 7.52	60.16 / 26.94	28.3	PP-YOLOE_plus-S.yaml	Inference Model/Training Model
PP-YOLOE_plus-X	54.7	56.28 / 50.60	292.08 / 212.24	349.4	PP-YOLOE_plus-X.yaml	Inference Model/Training Model
RT-DETR-H	56.3	114.57 / 101.56	938.20 / 938.20	435.8	RT-DETR-H.yaml	Inference Model/Training Model
RT-DETR-L	53.0	34.76 / 27.60	495.39 / 247.68	113.7	RT-DETR-L.yaml	Inference Model/Training Model
RT-DETR-R18	46.5	19.11 / 14.82	263.13 / 143.05	70.7	RT-DETR-R18.yaml	Inference Model/Training Model
RT-DETR-R50	53.1	41.11 / 10.12	536.20 / 482.86	149.1	RT-DETR-R50.yaml	Inference Model/Training Model
RT-DETR-X	54.8	61.91 / 51.41	639.79 / 639.79	232.9	RT-DETR-X.yaml	Inference Model/Training Model
YOLOv3-DarkNet53	39.1	39.62 / 35.54	166.57 / 136.34	219.7	YOLOv3-DarkNet53.yaml	Inference Model/Training Model
YOLOv3-MobileNetV3	31.4	16.54 / 6.21	64.37 / 45.55	83.8	YOLOv3-MobileNetV3.yaml	Inference Model/Training Model
YOLOv3-ResNet50_vd_DCN	40.6	31.64 / 26.72	226.75 / 226.75	163.0	YOLOv3-ResNet50_vd_DCN.yaml	Inference Model/Training Model
YOLOX-L	50.1	49.68 / 45.03	232.52 / 156.24	192.5	YOLOX-L.yaml	Inference Model/Training Model
YOLOX-M	46.9	43.46 / 29.52	147.64 / 80.06	90.0	YOLOX-M.yaml	Inference Model/Training Model
YOLOX-N	26.1	42.94 / 17.79	64.15 / 7.19	3.4	YOLOX-N.yaml	Inference Model/Training Model
YOLOX-S	40.4	46.53 / 29.34	98.37 / 35.02	32.0	YOLOX-S.yaml	Inference Model/Training Model
YOLOX-T	32.9	31.81 / 18.91	55.34 / 11.63	18.1	YOLOX-T.yaml	Inference Model/Training Model
YOLOX-X	51.8	84.06 / 77.28	390.38 / 272.88	351.5	YOLOX-X.yaml	Inference Model/Training Model
Co-Deformable-DETR-R50	49.7	259.62 / 259.62	32413.76 / 32413.76	184	Co-Deformable-DETR-R50.yaml.yaml	Inference Model/Training Model
Co-Deformable-DETR-Swin-T	48.0 (@640x640 input shape)	120.17 / 120.17	- / 15620.29	187	Co-Deformable-Swin-T.yaml	Inference Model/Training Model
Co-DINO-R50	52.0	1123.23 / 1123.23	- / -	186	Co-DINO-R50.yaml	Inference Model/Training Model
Co-DINO-Swin-L	55.9 (@640x640 input shape)	- / -	- / -	840	Co-DINO-Swin-L.yaml	Inference Model/Training Model

Note: The above accuracy metrics are based on the COCO2017 validation set mAP(0.5:0.95).

Small Object Detection Module ¶

Model Name	mAP (%)	GPU Inference Time (ms) [Normal Mode / High-Performance Mode]	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	Model Storage Size (MB)	yaml File	Model Download Links
PP-YOLOE_plus_SOD-S	25.1	116.07 / 20.10	176.44 / 40.21	77.3	PP-YOLOE_plus_SOD-S.yaml	Inference Model/Training Model
PP-YOLOE_plus_SOD-L	31.9	100.02 / 48.33	271.29 / 151.20	325.0	PP-YOLOE_plus_SOD-L.yaml	Inference Model/Training Model
PP-YOLOE_plus_SOD-largesize-L	42.7	515.69 / 460.17	2816.08 / 1736.00	340.5	PP-YOLOE_plus_SOD-largesize-L.yaml	Inference Model/Training Model

Note: The above accuracy metrics are based on the validation set mAP(0.5:0.95) of VisDrone-DET.

Open-Vocabulary Object Detection ¶

Model	mAP(0.5:0.95)	mAP(0.5)	GPU Inference Time (ms) [Normal Mode / High-Performance Mode]	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	Model Storage Size (MB)	yaml File	Model Download Link
GroundingDINO-T	49.4	64.4	- / -	- / -	658.3	GroundingDINO-T.yaml	Inference Model
YOLO-Worldv2-L	44.4	59.8	- / -	292.14 / 292.14	421.4	YOLO-Worldv2-L.yaml	Inference Model

Note: The above accuracy metrics are based on the COCO val2017 validation set mAP(0.5:0.95).

Open Vocabulary Segmentation ¶

Model	GPU Inference Time (ms) [Normal Mode / High-Performance Mode]	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	Model Storage Size (MB)	yaml File	Model Download Link
SAM-H_box	- / -	- / -	2433.7	SAM-H_box.yaml	Inference Model
SAM-H_point	- / -	- / -	2433.7	SAM-H_point.yaml	Inference Model

Rotated Object Detection ¶

Model	mAP(%)	GPU Inference Time (ms) [Normal Mode / High-Performance Mode]	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	Model Storage Size (MB)	yaml File	Model Download Link
PP-YOLOE-R-L	78.14	67.50 / 61.15	414.79 / 414.79	211.0	PP-YOLOE-R-L.yaml	Inference Model/Training Model

Note: The above accuracy metrics are based on the DOTA validation set mAP(0.5:0.95).

Pedestrian Detection Module ¶

Model Name	mAP (%)	GPU Inference Time (ms) [Normal Mode / High-Performance Mode]	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	Model Storage Size (MB)	yaml File	Model Download Link
PP-YOLOE-L_human	48.0	30.59 / 26.64	180.05 / 112.70	196.1	PP-YOLOE-L_human.yaml	Inference Model/Training Model
PP-YOLOE-S_human	42.5	10.26 / 6.66	54.01 / 23.48	28.8	PP-YOLOE-S_human.yaml	Inference Model/Training Model

Note: The above accuracy metrics are based on the validation set mAP(0.5:0.95) of CrowdHuman.

Vehicle Detection Module ¶

Model Name	mAP (%)	GPU Inference Time (ms) [Normal Mode / High-Performance Mode]	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	Model Storage Size (MB)	yaml File	Model Download Link
PP-YOLOE-L_vehicle	63.9	30.30 / 26.27	169.28 / 111.88	196.1	PP-YOLOE-L_vehicle.yaml	Inference Model/Training Model
PP-YOLOE-S_vehicle	61.3	10.54 / 6.69	52.73 / 23.58	28.8	PP-YOLOE-S_vehicle.yaml	Inference Model/Training Model

Note: The above precision metrics are based on the validation set mAP(0.5:0.95) of PPVehicle

Face Detection Module ¶

Model Name	AP (%) Easy/Medium/Hard	GPU Inference Time (ms) [Normal Mode / High-Performance Mode]	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	Model Storage Size (MB)	yaml File	Model Download Link
BlazeFace	77.7/73.4/49.5	50.90 / 45.74	71.92 / 71.92	0.447	BlazeFace.yaml	Inference Model/Training Model
BlazeFace-FPN-SSH	83.2/80.5/60.5	58.99 / 51.75	87.39 / 87.39	0.606	BlazeFace-FPN-SSH.yaml	Inference Model/Training Model
PicoDet_LCNet_x2_5_face	93.7/90.7/68.1	33.91 / 26.53	153.56 / 79.21	28.9	PicoDet_LCNet_x2_5_face.yaml	Inference Model/Training Model
PP-YOLOE_plus-S_face	93.9/91.8/79.8	21.28 / 11.09	137.26 / 72.09	26.5	PP-YOLOE_plus-S_face.yaml	Inference Model/Training Model

Note: The above precision metrics are evaluated on the WIDER-FACE validation set with an input size of 640x640.

Anomaly Detection Module ¶

Model Name	mIoU	GPU Inference Time (ms) [Normal Mode / High-Performance Mode]	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	Model Storage Size (MB)	yaml File	Model Download Link
STFPM	0.9901	4.94 / 1.63	34.88 / 34.88	22.5	STFPM.yaml	Inference Model/Training Model

Note: The above precision metrics are the average anomaly scores on the validation set of MVTec AD.

Human Keypoint Detection Module ¶

Model	Scheme	Input Size	AP(0.5:0.95)	GPU Inference Time (ms) [Normal Mode / High-Performance Mode]	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	Model Storage Size (MB)	yaml File	Model Download Link
PP-TinyPose_128x96	Top-Down	58.4	24.22 / 4.34	- / 6.19	4.9	PP-TinyPose_128x96.yaml	Inference Model/Training Model
PP-TinyPose_256x192	Top-Down	68.3	21.73 / 3.59	- / 10.18	4.9	PP-TinyPose_256x192.yaml	Inference Model/Training Model

Note: The above accuracy metrics are based on the COCO dataset AP(0.5:0.95), with detection boxes obtained from ground truth annotations.

Model	mAP(%)	NDS	yaml File	Model Download Link
BEVFusion	53.9	60.9	BEVFusion.yaml	Inference Model/Training Model

Note: The above accuracy metrics are based on the nuscenes validation set with mAP(0.5:0.95) and NDS 60.9, and the precision type is FP32.

Semantic Segmentation Module ¶

Model Name	mloU（%）	GPU Inference Time (ms) [Normal Mode / High-Performance Mode]	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	Model Storage Size (MB)	yaml File	Model Download Link
Deeplabv3_Plus-R50	80.36	481.33 / 446.18	2952.95 / 1907.07	94.9	Deeplabv3_Plus-R50.yaml	Inference Model/Training Model
Deeplabv3_Plus-R101	81.10	766.70 / 194.42	4441.56 / 2984.19	162.5	Deeplabv3_Plus-R101.yaml	Inference Model/Training Model
Deeplabv3-R50	79.90	681.65 / 602.10	3786.41 / 3093.10	138.3	Deeplabv3-R50.yaml	Inference Model/Training Model
Deeplabv3-R101	80.85	974.62 / 896.99	5222.60 / 4230.79	205.9	Deeplabv3-R101.yaml	Inference Model/Training Model
OCRNet_HRNet-W18	80.67	271.02 / 221.38	1791.52 / 1061.62	43.1	OCRNet_HRNet-W18.yaml	Inference Model/Training Model
OCRNet_HRNet-W48	82.15	582.92 / 536.28	3513.72 / 2543.10	270	OCRNet_HRNet-W48.yaml	Inference Model/Training Model
PP-LiteSeg-T	73.10	28.12 / 23.84	398.31 / 398.31	28.5	PP-LiteSeg-T.yaml	Inference Model/Training Model
PP-LiteSeg-B	75.25	35.69 / 35.69	485.10 / 485.10	47.0	PP-LiteSeg-B.yaml	Inference Model/Training Model
SegFormer-B0 (slice)	76.73	11.1946	268.929	13.2	SegFormer-B0.yaml	Inference Model/Training Model
SegFormer-B1 (slice)	78.35	17.9998	403.393	48.5	SegFormer-B1.yaml	Inference Model/Training Model
SegFormer-B2 (slice)	81.60	48.0371	1248.52	96.9	SegFormer-B2.yaml	Inference Model/Training Model
SegFormer-B3 (slice)	82.47	64.341	1666.35	167.3	SegFormer-B3.yaml	Inference Model/Training Model
SegFormer-B4 (slice)	82.38	82.4336	1995.42	226.7	SegFormer-B4.yaml	Inference Model/Training Model
SegFormer-B5 (slice)	82.58	97.3717	2420.19	229.7	SegFormer-B5.yaml	Inference Model/Training Model

Note: The above accuracy metrics are based on the Cityscapes dataset mIoU.

Model Name	mIoU (%)	GPU Inference Time (ms) [Normal Mode / High-Performance Mode]	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	Model Storage Size (MB)	yaml File	Model Download Link
SeaFormer_base(slice)	40.92	24.4073	397.574	30.8	SeaFormer_base.yaml	Inference Model/Training Model
SeaFormer_large (slice)	43.66	27.8123	550.464	49.8	SeaFormer_large.yaml	Inference Model/Training Model
SeaFormer_small (slice)	38.73	19.2295	358.343	14.3	SeaFormer_small.yaml	Inference Model/Training Model
SeaFormer_tiny (slice)	34.58	13.9496	330.132	6.1	SeaFormer_tiny.yaml	Inference Model/Training Model
MaskFormer_small	49.70	65.21 / 65.21	- / 629.85	242.5	MaskFormer_small.yaml	Inference Model/Training Model
MaskFormer_tiny	46.69	47.95 / 47.95	- / 492.67	160.5	MaskFormer_tiny.yaml	Inference Model/Training Model

Note: The above accuracy metrics are based on the ADE20k dataset. "Slice" indicates that the input images have been cropped.

Instance Segmentation Module ¶

Model Name	Mask AP	GPU Inference Time (ms) [Normal Mode / High-Performance Mode]	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	Model Storage Size (MB)	yaml File	Model Download Link
Mask-RT-DETR-H	50.6	180.83 / 180.83	1711.24 / 1711.24	449.9	Mask-RT-DETR-H.yaml	Inference Model/Training Model
Mask-RT-DETR-L	45.7	113.20 / 113.20	1179.56 / 1179.56	113.6	Mask-RT-DETR-L.yaml	Inference Model/Training Model
Mask-RT-DETR-M	42.7	87.08 / 87.08	- / 2090.73	66.6	Mask-RT-DETR-M.yaml	Inference Model/Training Model
Mask-RT-DETR-S	41.0	120.86 / 120.86	- / 2163.07	51.8	Mask-RT-DETR-S.yaml	Inference Model/Training Model
Mask-RT-DETR-X	47.5	141.43 / 141.43	1379.14 / 1379.14	237.5	Mask-RT-DETR-X.yaml	Inference Model/Training Model
Cascade-MaskRCNN-ResNet50-FPN	36.3	136.79 / 136.79	- / 5935.41	254.8	Cascade-MaskRCNN-ResNet50-FPN.yaml	Inference Model/Training Model
Cascade-MaskRCNN-ResNet50-vd-SSLDv2-FPN	39.1	137.40 / 137.40	- / 6816.68	254.7	Cascade-MaskRCNN-ResNet50-vd-SSLDv2-FPN.yaml	Inference Model/Training Model
MaskRCNN-ResNet50-FPN	35.6	112.79 / 112.79	- / 4912.37	157.5	MaskRCNN-ResNet50-FPN.yaml	Inference Model/Training Model
MaskRCNN-ResNet50-vd-FPN	36.4	112.88 / 112.88	- / 5204.97	157.5	MaskRCNN-ResNet50-vd-FPN.yaml	Inference Model/Training Model
MaskRCNN-ResNet50	32.8	181.60 / 181.60	- / 5523.45	127.8	MaskRCNN-ResNet50.yaml	Inference Model/Training Model
MaskRCNN-ResNet101-FPN	36.6	138.84 / 138.84	- / 5107.74	225.4	MaskRCNN-ResNet101-FPN.yaml	Inference Model/Training Model
MaskRCNN-ResNet101-vd-FPN	38.1	141.73 / 141.73	- / 5592.76	225.1	MaskRCNN-ResNet101-vd-FPN.yaml	Inference Model/Training Model
MaskRCNN-ResNeXt101-vd-FPN	39.5	220.83 / 220.83	- / 5932.59	370.0	MaskRCNN-ResNeXt101-vd-FPN.yaml	Inference Model/Training Model
PP-YOLOE_seg-S	32.5	243.41 / 222.30	2507.70 / 1282.35	31.5	PP-YOLOE_seg-S.yaml	Inference Model/Training Model
SOLOv2	35.5	131.99 / 131.99	- / 2369.98	179.1	SOLOv2.yaml	Inference Model/Training Model

Note: The above accuracy metrics are based on the Mask AP(0.5:0.95) on the COCO2017 validation set.

Text Detection Module ¶

Model	Detection Hmean (%)	GPU Inference Time (ms) [Normal Mode / High-Performance Mode]	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	Model Storage Size (MB)	yaml File	Model Download Link
PP-OCRv5_server_det	83.8	89.55 / 70.19	383.15 / 383.15	84.3	PP-OCRv5_server_det.yaml	Inference Model/Training Model
PP-OCRv5_mobile_det	79.0	10.67 / 6.36	57.77 / 28.15	4.7	PP-OCRv5_mobile_det.yaml	Inference Model/Training Model
PP-OCRv4_server_det	82.56	127.82 / 98.87	585.95 / 489.77	109	PP-OCRv4_server_det.yaml	Inference Model/Training Model
PP-OCRv4_mobile_det	63.8	9.87 / 4.17	56.60 / 20.79	4.7	PP-OCRv4_mobile_det.yaml	Inference Model/Training Model
PP-OCRv3_mobile_det	Accuracy comparable to PP-OCRv4_mobile_det	9.90 / 3.60	41.93 / 20.76	2.1	PP-OCRv3_mobile_det.yaml	Inference Model/Training Model
PP-OCRv3_server_det	80.11	119.50 / 75.00	379.35 / 318.35	102.1	PP-OCRv3_server_det.yaml	Inference Model/Training Model

Note: The evaluation dataset for the above accuracy metrics is the self-built Chinese and English dataset of PaddleOCR, covering multiple scenarios such as street view, web images, documents, and handwriting, with 593 images for text recognition.

Seal Text Detection Module ¶

Model Name	Detection Hmean (%)	GPU Inference Time (ms) [Normal Mode / High-Performance Mode]	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	Model Storage Size (MB)	yaml File	Model Download Link
PP-OCRv4_mobile_seal_det	96.36	9.70 / 3.56	50.38 / 19.64	4.7	PP-OCRv4_mobile_seal_det.yaml	Inference Model/Training Model
PP-OCRv4_server_seal_det	98.40	124.64 / 91.57	545.68 / 439.86	109	PP-OCRv4_server_seal_det.yaml	Inference Model/Training Model

Note: The evaluation set for the above precision metrics is the seal dataset built by PaddleX, which includes 500 seal images.

Text Recognition Module ¶

Chinese Text Recognition Models

Model	Recognition Avg Accuracy(%)	GPU Inference Time (ms) [Normal Mode / High-Performance Mode]	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	Model Storage Size (MB)	yaml File	Model Download Link
PP-OCRv5_server_rec	86.38	8.46 / 2.36	31.21 / 31.21	81	PP-OCRv5_server_rec.yaml	Inference Model/Training Model
PP-OCRv5_mobile_rec	81.29	5.43 / 1.46	21.20 / 5.32	16	PP-OCRv5_mobile_rec.yaml	Inference Model/Training Model
PP-OCRv4_server_rec_doc	86.58	8.69 / 2.78	37.93 / 37.93	182	PP-OCRv4_server_rec_doc.yaml	Inference Model/Training Model
PP-OCRv4_mobile_rec	78.74	5.26 / 1.12	17.48 / 3.61	10.5	PP-OCRv4_mobile_rec.yaml	Inference Model/Training Model
PP-OCRv4_server_rec	85.19	8.75 / 2.49	36.93 / 36.93	173	PP-OCRv4_server_rec.yaml	Inference Model/Training Model
PP-OCRv3_mobile_rec	72.96	3.89 / 1.16	8.72 / 3.56	10.3	PP-OCRv3_mobile_rec.yaml	Inference Model/Training Model

Note: The evaluation set for the above accuracy metrics is a Chinese dataset built by PaddleOCR, covering multiple scenarios such as street view, web images, documents, and handwriting, with 8367 images for text recognition.

Model	Recognition Avg Accuracy(%)	GPU Inference Time (ms) [Normal Mode / High-Performance Mode]	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	Model Storage Size (MB)	yaml File	Model Download Link
ch_SVTRv2_rec	68.81	10.38 / 8.31	66.52 / 30.83	80.5	ch_SVTRv2_rec.yaml	Inference Model/Training Model

Note: The evaluation dataset for the above accuracy metrics is the PaddleOCR Algorithm Model Challenge - Task 1: OCR End-to-End Recognition Task Leaderboard A.

Model	Recognition Avg Accuracy(%)	GPU Inference Time (ms) [Normal Mode / High-Performance Mode]	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	Model Storage Size (MB)	yaml File	Model Download Link
ch_RepSVTR_rec	65.07	6.29 / 1.57	20.64 / 5.40	48.8	ch_RepSVTR_rec.yaml	Inference Model/Training Model

Note: The evaluation dataset for the above accuracy metrics is the PaddleOCR Algorithm Model Challenge - Task 1: OCR End-to-End Recognition Task Leaderboard B.

English Recognition Model

Model	Recognition Avg Accuracy(%)	GPU Inference Time (ms) [Normal Mode / High-Performance Mode]	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	Model Storage Size (MB)	yaml File	Model Download Link
en_PP-OCRv4_mobile_rec	70.39	4.81 / 1.23	17.20 / 4.18	7.5	en_PP-OCRv4_mobile_rec.yaml	Inference Model/Training Model
en_PP-OCRv3_mobile_rec	70.69	3.56 / 0.78	8.44 / 5.78	17.3	en_PP-OCRv3_mobile_rec.yaml	Inference Model/Training Model

Note: The evaluation set for the above accuracy metrics is an English dataset built by PaddleX.

Multilingual Recognition Model

Model	Recognition Avg Accuracy(%)	GPU Inference Time (ms) [Normal Mode / High-Performance Mode]	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	Model Storage Size (MB)	yaml File	Model Download Link
korean_PP-OCRv5_mobile_rec	90.45	5.43 / 1.46	21.20 / 5.32	14	korean_PP-OCRv5_mobile_rec.yaml	Inference Model/Pre-trained Model
latin_PP-OCRv5_mobile_rec	84.7	5.43 / 1.46	21.20 / 5.32	14	latin_PP-OCRv5_mobile_rec.yaml	Inference Model/Pre-trained Model
eslav_PP-OCRv5_mobile_rec	85.8	5.43 / 1.46	21.20 / 5.32	14	eslav_PP-OCRv5_mobile_rec.yaml	Inference Model/Pre-trained Model
korean_PP-OCRv3_mobile_rec	60.21	3.73 / 0.98	8.76 / 2.91	9.6	korean_PP-OCRv3_mobile_rec.yaml	Inference Model/Training Model
japan_PP-OCRv3_mobile_rec	45.69	3.86 / 1.01	8.62 / 2.92	8.8	japan_PP-OCRv3_mobile_rec.yaml	Inference Model/Training Model
chinese_cht_PP-OCRv3_mobile_rec	82.06	3.90 / 1.16	9.24 / 3.18	9.7	chinese_cht_PP-OCRv3_mobile_rec.yaml	Inference Model/Training Model
te_PP-OCRv3_mobile_rec	95.88	3.59 / 0.81	8.28 / 6.21	7.8	te_PP-OCRv3_mobile_rec.yaml	Inference Model/Training Model
ka_PP-OCRv3_mobile_rec	96.96	3.49 / 0.89	8.63 / 2.77	17.4	ka_PP-OCRv3_mobile_rec.yaml	Inference Model/Training Model
ta_PP-OCRv3_mobile_rec	76.83	3.49 / 0.86	8.35 / 3.41	8.7	ta_PP-OCRv3_mobile_rec.yaml	Inference Model/Training Model
latin_PP-OCRv3_mobile_rec	76.93	3.53 / 0.78	8.50 / 6.83	8.7	latin_PP-OCRv3_mobile_rec.yaml	Inference Model/Training Model
arabic_PP-OCRv3_mobile_rec	73.55	3.60 / 0.83	8.44 / 4.69	17.3	arabic_PP-OCRv3_mobile_rec.yaml	Inference Model/Training Model
cyrillic_PP-OCRv3_mobile_rec	94.28	3.56 / 0.79	8.22 / 2.76	8.7	cyrillic_PP-OCRv3_mobile_rec.yaml	Inference Model/Training Model
devanagari_PP-OCRv3_mobile_rec	96.44	3.60 / 0.78	6.95 / 2.87	8.7	devanagari_PP-OCRv3_mobile_rec.yaml	Inference Model/Training Model

Note: The evaluation set for the above accuracy metrics is a multi-language dataset built by PaddleX.

Formula Recognition Module ¶

Model	En-BLEU(%)	Zh-BLEU(%)	GPU Inference Time (ms) [Normal Mode / High-Performance Mode]	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	Model Storage Size (MB)	yaml File	Model Download Link
UniMERNet	85.91	43.50	1311.84 / 1311.84	- / 8288.07	1530	UniMERNet.yaml	Inference Model/Training Model
PP-FormulaNet-S	87.00	45.71	182.25 / 182.25	- / 254.39	224	PP-FormulaNet-S.yaml	Inference Model/Training Model
PP-FormulaNet-L	90.36	45.78	1482.03 / 1482.03	- / 3131.54	695
PP-FormulaNet-L.yaml	Inference Model/Training Model
PP-FormulaNet_plus-S	88.71	53.32	179.20 / 179.20	- / 260.99	248	PP-FormulaNet_plus-S.yaml	Inference Model/Training Model
PP-FormulaNet_plus-M	91.45	89.76	1040.27 / 1040.27	- / 1615.80	592	PP-FormulaNet_plus-M.yaml	Inference Model/Training Model
PP-FormulaNet_plus-L	92.22	90.64	1476.07 / 1476.07	- / 3125.58	698	PP-FormulaNet_plus-L.yaml	Inference Model/Training Model
LaTeX_OCR_rec	74.55	39.96	1088.89 / 1088.89	- / -	99	LaTeX_OCR_rec.yaml	Inference Model/Training Model

Note: The above accuracy metrics are measured from the internal formula recognition test set of PaddleX. The BLEU score of LaTeX_OCR_rec on the LaTeX-OCR formula recognition test set is 0.8821. All model GPU inference times are based on Tesla V100 GPUs, with precision type FP32.

Table Structure Recognition Module ¶

Model	Accuracy (%)	GPU Inference Time (ms) [Normal Mode / High-Performance Mode]	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	Model Storage Size (MB)	yaml File	Model Download Link
SLANet	59.52	23.96 / 21.75	- / 43.12	6.9	SLANet.yaml	Inference Model/Training Model
SLANet_plus	63.69	23.43 / 22.16	- / 41.80	6.9	SLANet_plus.yaml	Inference Model/Training Model
SLANeXt_wired	69.65	85.92 / 85.92	- / 501.66	351	SLANeXt_wired.yaml	Inference Model/Training Model
SLANeXt_wireless	69.65	85.92 / 85.92	- / 501.66	351	SLANeXt_wireless.yaml	Inference Model/Training Model

Note: The above accuracy metrics are measured from the high-difficulty Chinese table recognition dataset built internally by PaddleX.

Table Cell Detection Module ¶

Model	mAP(%)	GPU Inference Time (ms) [Normal Mode / High-Performance Mode]	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	Model Storage Size (MB)	yaml File	Model Download Link
RT-DETR-L_wired_table_cell_det	82.7	33.47 / 27.02	402.55 / 256.56	124M	RT-DETR-L_wired_table_cell_det.yaml	Inference Model/Training Model
RT-DETR-L_wireless_table_cell_det	82.7	33.47 / 27.02	402.55 / 256.56	124M	RT-DETR-L_wireless_table_cell_det.yaml	Inference Model/Training Model

Note: The above accuracy metrics are measured from the internal table cell detection dataset of PaddleX.

Table Classification Module ¶

Model	Top1 Acc(%)	GPU Inference Time (ms) [Normal Mode / High-Performance Mode]	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	Model Storage Size (MB)	yaml File	Model Download Link
PP-LCNet_x1_0_table_cls	94.2	2.62 / 0.60	3.17 / 1.14	6.6	PP-LCNet_x1_0_table_cls.yaml	Inference Model/Training Model

Note: The above accuracy metrics are measured from the internal table classification dataset built by PaddleX.

Text Image Unwarping Module ¶

Model Name	CER	GPU Inference Time (ms) [Normal Mode / High-Performance Mode]	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	Model Storage Size (MB)	yaml File	Model Download Link
UVDoc	0.179	19.05 / 19.05	- / 869.82	30.3	UVDoc.yaml	Inference Model/Training Model

Note: The above accuracy metrics are measured from the image unwarping dataset built by PaddleX.

Layout Detection Module ¶

Layout detection model, including 20 common categories: document title, section title, text, page number, abstract, table of contents, references, footnote, header, footer, algorithm, formula, formula number, image, table, figure and table captions (figure caption, table caption, and chart caption), stamp, chart, sidebar text, and reference content.

Model Name	mAP(0.5)（%）	GPU Inference Time (ms) [Normal Mode / High-Performance Mode]	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	Model Storage Size (MB)	yaml File	Model Download Link
PP-DocLayout_plus-L	83.2	53.03 / 17.23	634.62 / 378.32	126.01	PP-DocLayout_plus-L.yaml	Inference Model/Training Model

Note: The evaluation set for the accuracy metrics mentioned above is a custom-built layout detection dataset, which includes 1,300 document-type images such as Chinese and English papers, magazines, newspapers, research reports, PPTs, exam papers, and textbooks.

Layout detection model, including 1 category: block.

Model Name	mAP(0.5)（%）	GPU Inference Time (ms) [Normal Mode / High-Performance Mode]	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	Model Storage Size (MB)	yaml File	Model Download Link
PP-DocBlockLayout	95.9	34.60 / 28.54	506.43 / 256.83	123.92	PP-DocBlockLayout.yaml	Inference Model/Training Model

Note: The evaluation set for the accuracy metrics mentioned above is a custom-built layout block detection dataset, which includes 1,000 document-type images such as Chinese and English papers, magazines, newspapers, research reports, PPTs, exam papers, and textbooks.

The layout detection model includes 23 common categories: document title, paragraph title, text, page number, abstract, table of contents, references, footnotes, header, footer, algorithm, formula, formula number, image, figure caption, table, table caption, seal, figure title, figure, header image, footer image, and sidebar text.

Model Name	mAP(0.5)（%）	GPU Inference Time (ms) [Normal Mode / High-Performance Mode]	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	Model Storage Size (MB)	yaml File	Model Download Link
PP-DocLayout-L	90.4	33.59 / 33.59	503.01 / 251.08	123.76	PP-DocLayout-L.yaml	Inference Model/Training Model
PP-DocLayout-M	75.2	13.03 / 4.72	43.39 / 24.44	22.578	PP-DocLayout-M.yaml	Inference Model/Training Model
PP-DocLayout-S	70.9	11.54 / 3.86	18.53 / 6.29	4.834	PP-DocLayout-S.yaml	Inference Model/Training Model

Note: The evaluation set for the accuracy metrics mentioned above is a custom-built layout region detection dataset, which includes 500 common document-type images such as Chinese and English papers, magazines, and research reports.

Table Layout Detection Model

Model	mAP(0.5) (%)	GPU Inference Time (ms) [Normal Mode / High-Performance Mode]	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	Model Storage Size (MB)	yaml File	Model Download Link
PicoDet_layout_1x_table	97.5	9.57 / 6.63	27.66 / 16.75	7.4	PicoDet_layout_1x_table.yaml	Inference Model/Training Model

Note: The evaluation set for the above accuracy metrics is the layout table area detection dataset built by PaddleOCR, which contains 7835 images of document types with tables in both Chinese and English. 3 types of layout detection models, including tables, images, and seals

Model	mAP(0.5) (%)	GPU Inference Time (ms) [Normal Mode / High-Performance Mode]	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	Model Storage Size (MB)	yaml File	Model Download Link
PicoDet-S_layout_3cls	88.2	8.43 / 3.44	17.60 / 6.51	4.8	PicoDet-S_layout_3cls.yaml	Inference Model/Training Model
PicoDet-L_layout_3cls	89.0	12.80 / 9.57	45.04 / 23.86	22.6	PicoDet-L_layout_3cls.yaml	Inference Model/Training Model
RT-DETR-H_layout_3cls	95.8	114.80 / 25.65	924.38 / 924.38	470.1	RT-DETR-H_layout_3cls.yaml	Inference Model/Training Model

Note: The evaluation dataset for the above accuracy metrics is the layout area detection dataset built by PaddleOCR, which includes 1,154 common types of document images such as Chinese and English papers, magazines, and research reports.

5-class English document layout detection model, including text, title, table, image, and list

Model	mAP(0.5) (%)	GPU Inference Time (ms) [Normal Mode / High-Performance Mode]	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	Model Storage Size (MB)	yaml File	Model Download Link
PicoDet_layout_1x	97.8	9.62 / 6.75	26.96 / 12.77	7.4	PicoDet_layout_1x.yaml	Inference Model/Training Model

Note: The evaluation dataset for the above accuracy metrics is the PubLayNet evaluation dataset, which contains 11,245 images of English documents.

17-class layout detection model, including 17 common layout categories: paragraph title, image, text, number, abstract, content, figure title, formula, table, table title, reference, document title, footnote, header, algorithm, footer, and seal

Model	mAP(0.5) (%)	GPU Inference Time (ms) [Normal Mode / High-Performance Mode]	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	Model Storage Size (MB)	yaml File	Model Download Link
PicoDet-S_layout_17cls	87.4	8.80 / 3.62	17.51 / 6.35	4.8	PicoDet-S_layout_17cls.yaml	Inference Model/Training Model
PicoDet-L_layout_17cls	89.0	12.60 / 10.27	43.70 / 24.42	22.6	PicoDet-L_layout_17cls.yaml	Inference Model/Training Model
RT-DETR-H_layout_17cls	98.3	115.29 / 101.18	964.75 / 964.75	470.2	RT-DETR-H_layout_17cls.yaml	Inference Model/Training Model

Note: The evaluation set for the above accuracy metrics is the layout area detection dataset built by PaddleOCR, which includes 892 images of common document types such as Chinese and English papers, magazines, and research reports.

Document Image Orientation Classification Module ¶

Model	Top-1 Acc (%)	GPU Inference Time (ms) [Normal Mode / High-Performance Mode]	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	Model Storage Size (MB)	yaml File	Model Download Link
PP-LCNet_x1_0_doc_ori	99.06	2.62 / 0.59	3.24 / 1.19	7	PP-LCNet_x1_0_doc_ori.yaml	Inference Model/Training Model

Note: The evaluation set for the above accuracy metrics is a self-built dataset covering multiple scenarios such as documents and certificates, with 1000 images.

Text Line Orientation Classification Module ¶

Model	Top-1 Acc (%)	GPU Inference Time (ms) [Standard Mode / High-Performance Mode]	CPU Inference Time (ms) [Standard Mode / High-Performance Mode]	Model Storage Size (MB)	YAML File	Model Download Link
PP-LCNet_x0_25_textline_ori	99.06	2.16 / 0.41	2.37 / 0.73	7	PP-LCNet_x0_25_textline_ori.yaml	推理模型/训练模型
PP-LCNet_x1_0_textline_ori	99.42	- / -	2.98 / 2.98	7	PP-LCNet_x1_0_textline_ori.yaml	推理模型/训练模型

Note: The evaluation dataset for the above accuracy metrics is a self-built dataset covering multiple scenarios such as certificates and documents, with 1,000 images.

Time Series Forecasting Module ¶

Model Name	mse	mae	GPU Inference Time (ms) [Normal Mode / High-Performance Mode]	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	Model Storage Size (MB)	yaml File	Model Download Link
DLinear	0.382	0.394	0.34 / 0.12	0.64 / 0.06	0.072	DLinear.yaml	Inference Model/Training Model
NLinear	0.386	0.392	0.27 / 0.10	0.49 / 0.08	0.04	NLinear.yaml	Inference Model/Training Model
Nonstationary	0.600	0.515	3.92 / 2.59	18.09 / 13.36	55.5	Nonstationary.yaml	Inference Model/Training Model
PatchTST	0.379	0.391	1.81 / 0.45	5.79 / 0.77	2.0	PatchTST.yaml	Inference Model/Training Model
RLinear	0.385	0.392	0.39 / 0.18	0.82 / 0.08	0.04	RLinear.yaml	Inference Model/Training Model
TiDE	0.407	0.414	- / -	4.54 / 1.09	31.7	TiDE.yaml	Inference Model/Training Model
TimesNet	0.416	0.429	15.19 / 13.77	23.14 / 12.42	4.9	TimesNet.yaml	Inference Model/Training Model

Note: The above accuracy metrics are measured from the ETTH1 dataset (evaluation results on the test.csv test set).

Time Series Anomaly Detection Module ¶

Model Name	Precision	Recall	F1 Score	Model Storage Size (MB)	YAML File	Model Download Link
AutoEncoder_ad	99.36	0.24 / 0.13	0.41 / 0.05	0.052	AutoEncoder_ad.yaml	Inference Model/Training Model
DLinear_ad	98.98	0.39 / 0.16	0.69 / 0.08	0.112	DLinear_ad.yaml	Inference Model/Training Model
Nonstationary_ad	98.55	1.94 / 1.16	5.31 / 1.66	1.8	Nonstationary_ad.yaml	Inference Model/Training Model
PatchTST_ad	98.78	2.10 / 0.55	6.98 / 0.63	0.32	PatchTST_ad.yaml	Inference Model/Training Model

Note: The above precision metrics are measured from the PSM dataset.

Time Series Classification Module ¶

Model Name	acc(%)	Model Storage Size (MB)	yaml File	Model Download Link
TimesNet_cls	87.5	0.792	TimesNet_cls.yaml	Inference Model/Training Model

Note: The above accuracy metrics are measured from the UWaveGestureLibrary dataset.

Multilingual Speech Recognition Module ¶

Model	Training Data	Model Storage Size (MB)	Word Error Rate	YAML File	Model Download Link
whisper_large	680kh	5800	2.7 (Librispeech)	whisper_large.yaml	Inference Model
whisper_medium	680kh	2900	-	whisper_medium.yaml	Inference Model
whisper_small	680kh	923	-	whisper_small.yaml	Inference Model
whisper_base	680kh	277	-	whisper_base.yaml	Inference Model
whisper_tiny	680kh	145	-	whisper_tiny.yaml	Inference Model

Video Classification Module ¶

Model	Top1 Acc(%)	Model Storage Size (MB)	yaml File	Model Download Link
PP-TSM-R50_8frames_uniform	74.36	93.4	PP-TSM-R50_8frames_uniform.yaml	Inference Model/Training Model
PP-TSMv2-LCNetV2_8frames_uniform	71.71	22.5	PP-TSMv2-LCNetV2_8frames_uniform.yaml	Inference Model/Training Model
PP-TSMv2-LCNetV2_16frames_uniform	73.11	22.5	PP-TSMv2-LCNetV2_16frames_uniform.yaml	Inference Model/Training Model

Note: The above accuracy metrics are based on the K400 validation set Top1 Acc.

Video Detection Module ¶

Model	Frame-mAP(@ IoU 0.5)	Model Storage Size (MB)	yaml File	Model Download Link
YOWO	80.94	462.891	YOWO.yaml	Inference Model/Training Model

Note: The above accuracy metrics are based on the test dataset UCF101-24, using the Frame-mAP (@ IoU 0.5) metric.

Document Vision-Language Model Module ¶

Model	Model Parameter Size（B）	Model Storage Size（GB）	yaml File	Model Download Lin
PP-DocBee-2B	2	4.2	PP-DocBee-2B.yaml	Inference Model
PP-DocBee-7B	7	15.8	PP-DocBee-7B.yaml	Inference Model
PP-DocBee2-3B	3	7.6	PP-DocBee2-3B.yaml	Inference Model

Chart Parsing Model Module ¶

Model	Model Parameter Size（B）	Model Storage Size（GB）	yaml File	Model Download Lin
PP-Chart2Table	0.58	1.4	PP-Chart2Table.yaml	Inference Model

Test Environment Description:

Performance Test Environment
Inference Mode Description

Mode	GPU Configuration	CPU Configuration	Acceleration Technology Combination
Normal Mode	FP32 Precision / No TRT Acceleration	FP32 Precision / 8 Threads	PaddleInference
High-Performance Mode	Optimal combination of pre-selected precision types and acceleration strategies	FP32 Precision / 8 Threads	Pre-selected optimal backend (Paddle/OpenVINO/TRT, etc.)

PaddleX Model List (CPU/GPU)¶

Comments