PP-DocTranslation Pipeline Usage Tutorial¶

1. Introduction to PP-DocTranslation Pipeline¶

PP-DocTranslation is a document intelligent translation solution provided by PaddlePaddle. It integrates advanced general layout analysis technology and large language model (LLM) capabilities to offer you efficient document intelligent translation services. This solution can accurately identify and extract various elements within documents, including text blocks, headings, paragraphs, images, tables, and other complex layout structures, and on this basis, achieve high-quality multilingual translation. PP-DocTranslation supports mutual translation among multiple mainstream languages, particularly excelling in handling documents with complex layouts and strong contextual dependencies, striving to deliver precise, natural, fluent, and professional translation results. This pipeline also provides flexible serving options, supporting the use of multiple programming languages on various hardware. Moreover, it offers the capability for secondary development, allowing you to train and fine-tune models on your own datasets based on this pipeline, and the trained models can also be seamlessly integrated.

The PP-DocTranslation pipeline uses the PP-StructureV3 sub-pipeline, and thus has all the functions of the PP-StructureV3 pipeline. For more information on the functions and usage details of the PP-StructureV3 pipeline, you can click on the PP-StructureV3 Pipeline Documentation page.

In this pipeline, you can select the model to use based on the benchmark data below.

👉Model List Details

Document Image Orientation Classification Module:

Model	Download Link	Top-1 Acc (%)	GPU Inference Time (ms) [Standard / High Performance]	CPU Inference Time (ms) [Standard / High Performance]	Model Size (M)	Description
PP-LCNet_x1_0_doc_ori	Inference Model/Pretrained Model	99.06	2.62 / 0.59	3.24 / 1.19	7	A document image classification model based on PP-LCNet_x1_0 with four classes: 0°, 90°, 180°, and 270°

Text Image Unwarping Module:

Model	Download Link	CER	Model Size (M)	Description
UVDoc	Inference Model/Pretrained Model	0.179	30.3	High-accuracy text image unwarping model

Layout Detection Module Models:

Model	Download Link	mAP(0.5) (%)	GPU Inference Time (ms) [Standard / High Performance]	CPU Inference Time (ms) [Standard / High Performance]	Model Size (M)	Description
PP-DocLayout_plus-L	Inference Model/Pretrained Model	83.2	53.03 / 17.23	634.62 / 378.32	126.01	High-accuracy layout detection model based on RT-DETR-L, trained on a custom dataset covering scenarios like Chinese/English papers, multi-column magazines, newspapers, PPTs, contracts, books, exams, research reports, ancient books, Japanese documents, and vertical text documents
PP-DocLayout-L	Inference Model/Pretrained Model	90.4	33.59 / 33.59	503.01 / 251.08	123.76	High-accuracy layout detection model based on RT-DETR-L, trained on a custom dataset covering papers, magazines, contracts, books, exams, and research reports
PP-DocLayout-M	Inference Model/Pretrained Model	75.2	13.03 / 4.72	43.39 / 24.44	22.578	Balanced accuracy-efficiency layout detection model based on PicoDet-L, trained on a custom dataset covering papers, magazines, contracts, books, exams, and research reports
PP-DocLayout-S	Inference Model/Pretrained Model	70.9	11.54 / 3.86	18.53 / 6.29	4.834	High-efficiency layout detection model based on PicoDet-S, trained on a custom dataset for papers, magazines, contracts, books, exams, and research reports

Table Structure Recognition Module:

Model	Download Link	Accuracy (%)	GPU Inference Time (ms) [Standard / High Performance]	CPU Inference Time (ms) [Standard / High Performance]	Model Size (M)	Description
SLANeXt_wired	Inference Model/Pretrained Model	69.65	85.92 / 85.92	- / 501.66	351M	SLANeXt series is a next-generation table structure recognition model developed by Baidu PaddlePaddle Vision Team. Compared with SLANet and SLANet_plus, SLANeXt focuses on recognizing table structures, with dedicated weights for wired and wireless tables, significantly improving performance especially for wired tables.
SLANeXt_wireless	Inference Model/Pretrained Model	69.65	85.92 / 85.92	- / 501.66	351M

Table Classification Module Models:

Model	Download Link	Top-1 Acc (%)	GPU Inference Time (ms) [Standard / High Performance]	CPU Inference Time (ms) [Standard / High Performance]	Model Size (M)
PP-LCNet_x1_0_table_cls	Inference Model/Pretrained Model	94.2	2.62 / 0.60	3.17 / 1.14	6.6M

Table Cell Detection Module Models:

Model	Download Link	mAP (%)	GPU Inference Time (ms) [Standard / High Performance]	CPU Inference Time (ms) [Standard / High Performance]	Model Size (M)	Description
RT-DETR-L_wired_table_cell_det	Inference Model/Pretrained Model	82.7	33.47 / 27.02	402.55 / 256.56	124M	RT-DETR is the first real-time end-to-end object detection model. Baidu PaddlePaddle Vision Team used RT-DETR-L as the base and pre-trained on a custom table cell detection dataset, achieving strong performance on both wired and wireless tables.
RT-DETR-L_wireless_table_cell_det	Inference Model/Pretrained Model	82.7	33.47 / 27.02	402.55 / 256.56	124M

Text Detection Module:

Model	Download Link	Detection Hmean (%)	GPU Inference Time (ms) [Standard / High Performance]	CPU Inference Time (ms) [Standard / High Performance]	Model Size (M)	Description
PP-OCRv5_server_det	Inference Model/Pretrained Model	83.8	89.55 / 70.19	383.15 / 383.15	84.3	PP-OCRv5 server-side text detection model, higher accuracy, suitable for deployment on high-performance servers
PP-OCRv5_mobile_det	Inference Model/Pretrained Model	79.0	10.67 / 6.36	57.77 / 28.15	4.7	PP-OCRv5 mobile-side text detection model, more efficient, suitable for edge device deployment
PP-OCRv4_server_det	Inference Model/Pretrained Model	69.2	127.82 / 98.87	585.95 / 489.77	109	PP-OCRv4 server-side text detection model, higher accuracy, suitable for deployment on high-performance servers
PP-OCRv4_mobile_det	Inference Model/Pretrained Model	63.8	9.87 / 4.17	56.60 / 20.79	4.7	PP-OCRv4 mobile-side text detection model, more efficient, suitable for edge device deployment
PP-OCRv3_mobile_det	Inference Model/Pretrained Model	Accuracy similar to PP-OCRv4_mobile_det	9.90 / 3.60	41.93 / 20.76	2.1	PP-OCRv3 mobile-side text detection model, more efficient, suitable for edge device deployment
PP-OCRv3_server_det	Inference Model/Pretrained Model	Accuracy similar to PP-OCRv4_server_det	119.50 / 75.00	379.35 / 318.35	102.1	PP-OCRv3 server-side text detection model, higher accuracy, suitable for deployment on high-performance servers

Text Recognition Module Models:

* Chinese Recognition Models

Model	Download Link	Recognition Avg Accuracy(%)	GPU Inference Time (ms) [Regular Mode / High-Performance Mode]	CPU Inference Time (ms) [Regular Mode / High-Performance Mode]	Model Size (M)	Description
PP-OCRv5_server_rec	Inference Model/Training Model	86.38	8.46 / 2.36	31.21 / 31.21	81	PP-OCRv5_rec is a next-generation text recognition model. It aims to efficiently and accurately support four major languages—Simplified Chinese, Traditional Chinese, English, and Japanese—as well as complex text scenarios such as handwriting, vertical text, pinyin, and rare characters. While maintaining recognition performance, it balances inference speed and model robustness, providing efficient and precise technical support for document understanding in various scenarios.
PP-OCRv5_mobile_rec	Inference Model/Training Model	81.29	5.43 / 1.46	21.20 / 5.32	16
PP-OCRv4_server_rec_doc	Inference Model/Training Model	86.58	8.69 / 2.78	37.93 / 37.93	74.7	PP-OCRv4_server_rec_doc is trained on a mix of more Chinese document data and PP-OCR training data, based on PP-OCRv4_server_rec. It enhances recognition capabilities for Traditional Chinese, Japanese, and special characters, supporting 15,000+ characters. In addition to improving document-related text recognition, it also enhances general text recognition.
PP-OCRv4_mobile_rec	Inference Model/Training Model	78.74	5.26 / 1.12	17.48 / 3.61	10.6	The lightweight recognition model of PP-OCRv4, with high inference efficiency, deployable on various hardware devices including edge devices.
PP-OCRv4_server_rec	Inference Model/Training Model	80.61	8.75 / 2.49	36.93 / 36.93	71.2	The server-side model of PP-OCRv4, with high inference accuracy, deployable on various servers.
PP-OCRv3_mobile_rec	Inference Model/Training Model	72.96	3.89 / 1.16	8.72 / 3.56	9.2	The lightweight recognition model of PP-OCRv3, with high inference efficiency, deployable on various hardware devices including edge devices.

Model	Download Link	Recognition Avg Accuracy(%)	GPU Inference Time (ms) [Regular Mode / High-Performance Mode]	CPU Inference Time (ms) [Regular Mode / High-Performance Mode]	Model Size (M)	Description
ch_SVTRv2_rec	Inference Model/Training Model	68.81	10.38 / 8.31	66.52 / 30.83	73.9	SVTRv2 is a server-side text recognition model developed by the OpenOCR team from Fudan University's Vision and Learning Lab (FVL). It won first prize in the PaddleOCR Algorithm Model Challenge - Task 1: OCR End-to-End Recognition, achieving a 6% improvement in end-to-end recognition accuracy over PP-OCRv4 on the A榜 leaderboard.

Model	Download Link	Recognition Avg Accuracy(%)	GPU Inference Time (ms) [Regular Mode / High-Performance Mode]	CPU Inference Time (ms) [Regular Mode / High-Performance Mode]	Model Size (M)	Description
ch_RepSVTR_rec	Inference Model/Training Model	65.07	6.29 / 1.57	20.64 / 5.40	22.1	RepSVTR is a mobile text recognition model based on SVTRv2. It won first prize in the PaddleOCR Algorithm Model Challenge - Task 1: OCR End-to-End Recognition, achieving a 2.5% improvement in end-to-end recognition accuracy over PP-OCRv4 on the B榜 leaderboard, with comparable inference speed.

* English Recognition Models

Model	Download Link	Recognition Avg Accuracy(%)	GPU Inference Time (ms) [Regular Mode / High-Performance Mode]	CPU Inference Time (ms) [Regular Mode / High-Performance Mode]	Model Size (M)	Description
en_PP-OCRv4_mobile_rec	Inference Model/Training Model	70.39	4.81 / 1.23	17.20 / 4.18	6.8	An ultra-lightweight English recognition model trained based on the PP-OCRv4 recognition model, supporting English and numeric recognition.
en_PP-OCRv3_mobile_rec	Inference Model/Training Model	70.69	3.56 / 0.78	8.44 / 5.78	7.8 M	An ultra-lightweight English recognition model trained based on the PP-OCRv3 recognition model, supporting English and numeric recognition.

* Multilingual Recognition Models

Model	Download Link	Recognition Avg Accuracy(%)	GPU Inference Time (ms) [Regular Mode / High-Performance Mode]	CPU Inference Time (ms) [Regular Mode / High-Performance Mode]	Model Size (M)	Description
korean_PP-OCRv3_mobile_rec	Inference Model/Training Model	60.21	3.73 / 0.98	8.76 / 2.91	8.6	An ultra-lightweight Korean recognition model trained based on the PP-OCRv3 recognition model, supporting Korean and numeric recognition.
japan_PP-OCRv3_mobile_rec	Inference Model/Training Model	45.69	3.86 / 1.01	8.62 / 2.92	8.8 M	An ultra-lightweight Japanese recognition model trained based on the PP-OCRv3 recognition model, supporting Japanese and numeric recognition.
chinese_cht_PP-OCRv3_mobile_rec	Inference Model/Training Model	82.06	3.90 / 1.16	9.24 / 3.18	9.7 M	An ultra-lightweight Traditional Chinese recognition model trained based on the PP-OCRv3 recognition model, supporting Traditional Chinese and numeric recognition.
te_PP-OCRv3_mobile_rec	Inference Model/Training Model	95.88	3.59 / 0.81	8.28 / 6.21	7.8 M	An ultra-lightweight Telugu recognition model trained based on the PP-OCRv3 recognition model, supporting Telugu and numeric recognition.
ka_PP-OCRv3_mobile_rec	Inference Model/Training Model	96.96	3.49 / 0.89	8.63 / 2.77	8.0 M	An ultra-lightweight Kannada recognition model trained based on the PP-OCRv3 recognition model, supporting Kannada and numeric recognition.
ta_PP-OCRv3_mobile_rec	Inference Model/Training Model	76.83	3.49 / 0.86	8.35 / 3.41	8.0 M	An ultra-lightweight Tamil recognition model trained based on the PP-OCRv3 recognition model, supporting Tamil and numeric recognition.
latin_PP-OCRv3_mobile_rec	Inference Model/Training Model	76.93	3.53 / 0.78	8.50 / 6.83	7.8	An ultra-lightweight Latin recognition model trained based on the PP-OCRv3 recognition model, supporting Latin and numeric recognition.
arabic_PP-OCRv3_mobile_rec	Inference Model/Training Model	73.55	3.60 / 0.83	8.44 / 4.69	7.8	An ultra-lightweight Arabic script recognition model trained based on the PP-OCRv3 recognition model, supporting Arabic script and numeric recognition.
cyrillic_PP-OCRv3_mobile_rec	Inference Model/Training Model	94.28	3.56 / 0.79	8.22 / 2.76	7.9 M	An ultra-lightweight Cyrillic script recognition model trained based on the PP-OCRv3 recognition model, supporting Cyrillic script and numeric recognition.
devanagari_PP-OCRv3_mobile_rec	Inference Model/Training Model	96.44	3.60 / 0.78	6.95 / 2.87	7.9	An ultra-lightweight Devanagari script recognition model trained based on the PP-OCRv3 recognition model, supporting Devanagari script and numeric recognition.

Text Line Orientation Classification Module (Optional):

Model	Download Link	Top-1 Acc(%)	GPU Inference Time (ms) [Regular Mode / High-Performance Mode]	CPU Inference Time (ms) [Regular Mode / High-Performance Mode]	Model Size (M)	Description
PP-LCNet_x0_25_textline_ori	Inference Model/Training Model	95.54	2.16 / 0.41	2.37 / 0.73	0.32	A text line classification model based on PP-LCNet_x0_25, with two classes: 0 degrees and 180 degrees.

Formula Recognition Module:

Model	Download Link	Avg-BLEU(%)	GPU Inference Time (ms) [Regular Mode / High-Performance Mode]	CPU Inference Time (ms) [Regular Mode / High-Performance Mode]	Model Size (M)	Description
UniMERNet	Inference Model/Training Model	86.13	2266.96/-	-/-	1.4 G	UniMERNet is a formula recognition model developed by Shanghai AI Lab. It uses Donut Swin as the encoder and MBartDecoder as the decoder. Trained on a dataset of one million samples, including simple formulas, complex formulas, scanned formulas, and handwritten formulas, it significantly improves recognition accuracy for real-world scenarios.	PP-FormulaNet-S	Inference Model/Training Model	87.12	1311.84 / 1311.84	- / 8288.07	167.9	PP-FormulaNet is an advanced formula recognition model developed by Baidu's PaddlePaddle Vision team, supporting 50,000 common LaTeX vocabulary items. The PP-FormulaNet-S version uses PP-HGNetV2-B4 as its backbone and employs techniques like parallel masking and model distillation to significantly improve inference speed while maintaining high recognition accuracy, suitable for simple printed formulas, cross-line simple printed formulas, etc. The PP-FormulaNet-L version is based on Vary_VIT_B as its backbone and is trained on a large-scale formula dataset, showing significant improvement in complex formula recognition compared to PP-FormulaNet-S, suitable for simple printed formulas, complex printed formulas, handwritten formulas, etc.
PP-FormulaNet-L	Inference Model/Training Model	92.13	1976.52/-	-/-	535.2	LaTeX_OCR_rec	Inference Model/Training Model	71.63	1088.89 / 1088.89	- / -	89.7	LaTeX-OCR is a formula recognition algorithm based on an autoregressive large model. By using Hybrid ViT as the backbone and transformer as the decoder, it significantly improves the accuracy of formula recognition.

Seal Text Recognition Module:

Model	Download Link	Detection Hmean(%)	GPU Inference Time (ms) [Regular Mode / High-Performance Mode]	CPU Inference Time (ms) [Regular Mode / High-Performance Mode]	Model Size (M)	Description
PP-OCRv4_server_seal_det	Inference Model/Training Model	98.21	124.64 / 91.57	545.68 / 439.86	109	The server-side seal text detection model of PP-OCRv4, with higher accuracy, suitable for deployment on high-performance servers.
PP-OCRv4_mobile_seal_det	Inference Model/Training Model	96.47	9.70 / 3.56	50.38 / 19.64	4.6	The mobile-side seal text detection model of PP-OCRv4, with higher efficiency, suitable for deployment on edge devices.

Testing Environment Description:

Performance Testing Environment
- Test Datasets:
  - Document Image Orientation Classification Model: A dataset built by PaddleX, covering multiple scenarios such as IDs and documents, containing 1,000 images.
  - Text Image Unwarping Model: DocUNet.
  - Layout Detection Model: A layout analysis dataset built by PaddleOCR, containing 10,000 common document-type images such as Chinese and English papers, magazines, and reports.
  - PP-DocLayout_plus-L: A layout detection dataset built by PaddleOCR, containing 1,300 document-type images such as Chinese and English papers, magazines, newspapers, reports, PPTs, exams, and textbooks.
  - Table Structure Recognition Model: An internal English table recognition dataset built by PaddleX.
  - Text Detection Model: A Chinese dataset built by PaddleOCR, covering street views, web images, documents, and handwriting, with 500 images for detection.
  - Chinese Recognition Model: A Chinese dataset built by PaddleOCR, covering street views, web images, documents, and handwriting, with 11,000 images for text recognition.
  - ch_SVTRv2_rec: PaddleOCR Algorithm Model Challenge - Task 1: OCR End-to-End Recognition A榜 evaluation set.
  - ch_RepSVTR_rec: PaddleOCR Algorithm Model Challenge - Task 1: OCR End-to-End Recognition B榜 evaluation set.
  - English Recognition Model: An English dataset built by PaddleX.
  - Multilingual Recognition Model: A multilingual dataset built by PaddleX.
  - Text Line Orientation Classification Model: A dataset built by PaddleX, covering multiple scenarios such as IDs and documents, containing 1,000 images.
  - Seal Text Recognition Model: A dataset built by PaddleX, containing 500 circular seal images.
- Hardware Configuration:
  - GPU: NVIDIA Tesla T4
  - CPU: Intel Xeon Gold 6271C @ 2.60GHz
- Software Environment:
  - Ubuntu 20.04 / CUDA 11.8 / cuDNN 8.9 / TensorRT 8.6.1.6
  - paddlepaddle 3.0.0 / paddleocr 3.0.3
Inference Mode Description

Mode	GPU Configuration	CPU Configuration	Acceleration Technology Combination
Regular Mode	FP32 Precision / No TRT Acceleration	FP32 Precision / 8 Threads	PaddleInference
High-Performance Mode	Optimal combination of precision types and acceleration strategies	FP32 Precision / 8 Threads	Optimal backend selection (Paddle/OpenVINO/TRT, etc.)

2. Quick Start¶

Before using the PP-DocTranslation pipeline locally, please ensure that you have completed the installation of the wheel package according to the Installation Tutorial.

Please note: If you encounter issues such as the program becoming unresponsive, unexpected program termination, running out of memory resources, or extremely slow inference during execution, please try adjusting the configuration according to the documentation, such as disabling unnecessary features or using lighter-weight models.

Before use, you need to prepare the API key for a large language model, which supports the Baidu Cloud Qianfan Platform or local large model services that comply with the OpenAI interface standards.

2.1 Experience via Command Line¶

You can download the test file and quickly experience the pipeline effect with a single command:

paddleocr pp_doctranslation -i vehicle_certificate-1.png --target_language en --qianfan_api_key your_api_key

Command line supports more parameter settings. Click to expand for detailed description of command line parameters

Parameter	Description	Type	Default Value
`input`	Data to be predicted, required. For example, local path of image file or PDF file: `/root/data/img.jpg`; URL link, such as network URL of image or PDF file: example; local directory, the directory must contain images to be predicted, such as local path: `/root/data/` (currently does not support PDF files in the directory, PDF files need to specify the exact file path).	`str`
`save_path`	Specifies the path to save the inference result files. If not set, inference results will not be saved locally.	`str`
`target_language`	Target language (ISO 639-1 language code).	`str`	`zh`
`layout_detection_model_name`	Model name for layout detection. If not set, the pipeline default model will be used.	`str`
`layout_detection_model_dir`	Directory path of the layout detection model. If not set, the official model will be downloaded.	`str`
`layout_threshold`	Score threshold for layout model. Any float between `0-1`. If not set, the pipeline initialized value will be used, default initialized as `0.5`.	`float`
`layout_nms`	Whether to use post-processing NMS in layout detection. If not set, the pipeline initialized value will be used, default initialized as `True`.	`bool`
`layout_unclip_ratio`	Expansion coefficient for detection boxes in layout detection model. Any float greater than `0`. If not set, the pipeline initialized value will be used, default initialized as `1.0`.	`float`
`layout_merge_bboxes_mode`	Mode for merging detection boxes output by the layout detection model. large: when set to large, among overlapping boxes, only the largest outer box is kept and the overlapping inner boxes are deleted; small: when set to small, among overlapping boxes, only the smaller inner boxes are kept and the overlapping outer boxes are deleted; union: no box filtering, both inner and outer boxes are kept; If not set, the pipeline initialized value will be used, default initialized as `large`.	`str`
`chart_recognition_model_name`	Model name for chart parsing. If not set, the pipeline default model will be used.	`str`
`chart_recognition_model_dir`	Directory path for chart parsing model. If not set, the official model will be downloaded.	`str`
`chart_recognition_batch_size`	Batch size for chart parsing model. If not set, batch size defaults to `1`.	`int`
`region_detection_model_name`	Model name for region detection. If not set, the pipeline default model will be used.	`str`
`region_detection_model_dir`	Directory path for region detection model. If not set, the official model will be downloaded.	`str`
`doc_orientation_classify_model_name`	Model name for document orientation classification. If not set, the pipeline default model will be used.	`str`
`doc_orientation_classify_model_dir`	Directory path for document orientation classification model. If not set, the official model will be downloaded.	`str`
`doc_unwarping_model_name`	Model name for text image unwarping. If not set, the pipeline default model will be used.	`str`
`doc_unwarping_model_dir`	Directory path for text image unwarping model. If not set, the official model will be downloaded.	`str`
`text_detection_model_name`	Model name for text detection. If not set, the pipeline default model will be used.	`str`
`text_detection_model_dir`	Directory path for text detection model. If not set, the official model will be downloaded.	`str`
`text_det_limit_side_len`	Image side length limit for text detection. Any integer greater than `0`. If not set, the pipeline initialized value will be used, default initialized as `960`.	`int`
`text_det_limit_type`	Type of image side length limit for text detection. Supports `min` and `max`. `min` means ensuring the shortest side of the image is not less than `det_limit_side_len`, `max` means ensuring the longest side of the image is not greater than `limit_side_len`. If not set, the pipeline initialized value will be used, default initialized as `max`.	`str`
`text_det_thresh`	Detection pixel threshold. In the output probability map, pixels with score greater than this threshold are considered text pixels. Any float greater than `0`. If not set, the pipeline initialized value `0.3` will be used by default.	`float`
`text_det_box_thresh`	Detection box threshold. If the average score of all pixels within the detected bounding box is greater than this threshold, the result is considered a text region. Any float greater than `0`. If not set, the pipeline initialized value `0.6` will be used by default.	`float`
`text_det_unclip_ratio`	Text detection expansion coefficient, used to expand text regions. The larger the value, the larger the expansion area. Any float greater than `0`. If not set, the pipeline initialized value `2.0` will be used by default.	`float`
`textline_orientation_model_name`	Model name for textline orientation. If not set, the pipeline default model will be used.	`str`
`textline_orientation_model_dir`	Directory path for textline orientation model. If not set, the official model will be downloaded.	`str`
`textline_orientation_batch_size`	Batch size for textline orientation model. If not set, batch size defaults to `1`.	`int`
`text_recognition_model_name`	Model name for text recognition. If not set, the pipeline default model will be used.	`str`
`text_recognition_model_dir`	Directory path for text recognition model. If not set, the official model will be downloaded.	`str`
`text_recognition_batch_size`	Batch size for text recognition model. If not set, batch size defaults to `1`.	`int`
`text_rec_score_thresh`	Text recognition threshold. Text results with scores greater than this threshold will be kept. Any float greater than `0`. If not set, the pipeline initialized value `0.0` will be used, meaning no threshold.	`float`
`table_classification_model_name`	Model name for table classification. If not set, the pipeline default model will be used.	`str`
`table_classification_model_dir`	Directory path for table classification model. If not set, the official model will be downloaded.	`str`
`wired_table_structure_recognition_model_name`	Model name for wired table structure recognition. If not set, the pipeline default model will be used.	`str`
`wired_table_structure_recognition_model_dir`	Directory path for wired table structure recognition model. If not set, the official model will be downloaded.	`str`
`wireless_table_structure_recognition_model_name`	Model name for wireless table structure recognition. If not set, the pipeline default model will be used.	`str`
`wireless_table_structure_recognition_model_dir`	Directory path for wireless table structure recognition model. If not set, the official model will be downloaded.	`str`
`wired_table_cells_detection_model_name`	Model name for wired table cells detection. If not set, the pipeline default model will be used.	`str`
`wired_table_cells_detection_model_dir`	Directory path for wired table cells detection model. If not set, the official model will be downloaded.	`str`
`wireless_table_cells_detection_model_name`	Model name for wireless table cells detection. If not set, the pipeline default model will be used.	`str`
`wireless_table_cells_detection_model_dir`	Directory path for wireless table cells detection model. If not set, the official model will be downloaded.	`str`
`table_orientation_classify_model_name`	Model name for table orientation classification. If not set, the pipeline default model will be used.	`str`
`table_orientation_classify_model_dir`	Directory path for table orientation classification model. If not set, the official model will be downloaded.	`str`
`seal_text_detection_model_name`	Model name for seal text detection. If not set, the pipeline default model will be used.	`str`
`seal_text_detection_model_dir`	Directory path for seal text detection model. If not set, the official model will be downloaded.	`str`
`seal_det_limit_side_len`	Image side length limit for seal text detection. Any integer greater than `0`. If not set, the pipeline initialized value will be used, default initialized as `736`.	`int`
`seal_det_limit_type`	Type of image side length limit for seal text detection. Supports `min` and `max`. `min` means ensuring the shortest side of the image is not less than `det_limit_side_len`, `max` means ensuring the longest side is not greater than `limit_side_len`. If not set, the pipeline initialized value will be used, default initialized as `min`.	`str`
`seal_det_thresh`	Detection pixel threshold. In the output probability map, pixels with score greater than this threshold are considered text pixels. Any float greater than `0`. If not set, the pipeline initialized value `0.2` will be used by default.	`float`
`seal_det_box_thresh`	Detection box threshold. If the average score of all pixels within the detected bounding box is greater than this threshold, the result is considered a text region. Any float greater than `0`. If not set, the pipeline initialized value `0.6` will be used by default.	`float`
`seal_det_unclip_ratio`	Expansion coefficient for seal text detection. This method is used to expand the text region; the larger the value, the larger the expansion area. Any float greater than `0`. If not set, the pipeline initialized value `0.5` will be used by default.	`float`
`seal_text_recognition_model_name`	Model name for seal text recognition. If not set, the pipeline default model will be used.	`str`
`seal_text_recognition_model_dir`	Directory path for seal text recognition model. If not set, the official model will be downloaded.	`str`
`seal_text_recognition_batch_size`	Batch size for seal text recognition model. If not set, batch size defaults to `1`.	`int`
`seal_rec_score_thresh`	Text recognition threshold. Text results with scores greater than this threshold will be kept. Any float greater than `0`. If not set, the pipeline initialized value `0.0` will be used, meaning no threshold.	`float`
`formula_recognition_model_name`	Model name for formula recognition. If not set, the pipeline default model will be used.	`str`
`formula_recognition_model_dir`	Directory path for formula recognition model. If not set, the official model will be downloaded.	`str`
`formula_recognition_batch_size`	Batch size of the formula recognition model. If not set, the batch size defaults to `1`.	`int`
`use_doc_orientation_classify`	Whether to load and use the document orientation classification module. If not set, the pipeline initialized value will be used, default is `False`.	`bool`
`use_doc_unwarping`	Whether to load and use the text image unwarping module. If not set, the pipeline initialized value will be used, default is `False`.	`bool`
`use_textline_orientation`	Whether to load and use the text line orientation classification module. If not set, the pipeline initialized value will be used, default is `True`.	`bool`
`use_seal_recognition`	Whether to load and use the seal text recognition sub-pipeline. If not set, the pipeline initialized value will be used, default is `True`.	`bool`
`use_table_recognition`	Whether to load and use the table recognition sub-pipeline. If not set, the pipeline initialized value will be used, default is `True`.	`bool`
`use_formula_recognition`	Whether to load and use the formula recognition sub-pipeline. If not set, the pipeline initialized value will be used, default is `True`.	`bool`
`use_chart_recognition`	Whether to load and use the chart parsing module. If not set, the pipeline initialized value will be used, default is `False`.	`bool`
`use_region_detection`	Whether to load and use the region detection module. If not set, the pipeline initialized value will be used, default is `True`.	`bool`
`qianfan_api_key`	API key for the Qianfan platform.	`str`
`device`	Device used for inference. Supports specifying exact card number: CPU: e.g. `cpu` means using CPU for inference; GPU: e.g. `gpu:0` means using GPU #1 for inference; NPU: e.g. `npu:0` means using NPU #1 for inference; XPU: e.g. `xpu:0` means using XPU #1 for inference; MLU: e.g. `mlu:0` means using MLU #1 for inference; DCU: e.g. `dcu:0` means using DCU #1 for inference; If not set, the pipeline initialized value will be used. At initialization, the local GPU device #0 will be preferred, if none, CPU device will be used.	`str`
`enable_hpi`	Whether to enable high-performance inference.	`bool`	`False`
`use_tensorrt`	Whether to enable the TensorRT subgraph engine of Paddle Inference. If the model does not support acceleration by TensorRT, enabling this flag will not enable acceleration. For PaddlePaddle with CUDA 11.8, compatible TensorRT version is 8.x (x≥6), recommended TensorRT version is 8.6.1.6.	`bool`	`False`
`precision`	Computation precision, e.g. fp32, fp16.	`str`	`fp32`
`enable_mkldnn`	Whether to enable MKL-DNN accelerated inference. If MKL-DNN is unavailable or the model does not support acceleration via MKL-DNN, enabling this flag will not enable acceleration.	`bool`	`True`
`mkldnn_cache_capacity`	MKL-DNN cache capacity.	`int`	`10`
`cpu_threads`	Number of threads used for inference on CPU.	`int`	`8`
`paddlex_config`	Path to PaddleX pipeline configuration file.	`str`

The execution results will be printed to the terminal.

2.2 Integration via Python Script¶

The command-line method is for quickly experiencing and viewing the results. Generally, in projects, integration via code is often required. You can download the test file and use the following sample code for inference:

from paddleocr import PPDocTranslation

# Create a translation pipeline
pipeline = PPDocTranslation()

# Document path
input_path = "document_sample.pdf"

# Output directory
output_path = "./output"

# Large model configuration
chat_bot_config = {
    "module_name": "chat_bot",
    "model_name": "ernie-3.5-8k",
    "base_url": "https://qianfan.baidubce.com/v2",
    "api_type": "openai",
    "api_key": "api_key",  # your api_key
}

if input_path.lower().endswith(".md"):
    # Read markdown documents, supporting passing in directories and url links with the .md suffix
    ori_md_info_list = pipeline.load_from_markdown(input_path)
else:
    # Use PP-StructureV3 to perform layout parsing on PDF/image documents to obtain markdown information
    visual_predict_res = pipeline.visual_predict(
        input_path,
        use_doc_orientation_classify=False,
        use_doc_unwarping=False,
        use_common_ocr=True,
        use_seal_recognition=True,
use_table_recognition=True,
    )

    ori_md_info_list = []
    for res in visual_predict_res:
        layout_parsing_result = res["layout_parsing_result"]
        ori_md_info_list.append(layout_parsing_result.markdown)
        layout_parsing_result.save_to_img(output_path)
        layout_parsing_result.save_to_markdown(output_path)

    # Concatenate the markdown information of multi-page documents into a single markdown file, and save the merged original markdown text
    if input_path.lower().endswith(".pdf"):
        ori_md_info = pipeline.concatenate_markdown_pages(ori_md_info_list)
        ori_md_info.save_to_markdown(output_path)

# Perform document translation (target language: English)
tgt_md_info_list = pipeline.translate(
    ori_md_info_list=ori_md_info_list,
    target_language="en",
    chunk_size=5000,
    chat_bot_config=chat_bot_config,
)
# Save the translation results
for tgt_md_info in tgt_md_info_list:
    tgt_md_info.save_to_markdown(output_path)

After executing the above code, you will obtain the parsed results of the original document to be translated, the Markdown file of the original text to be translated, and the Markdown file of the translated document, all saved in the output directory.

The process, API description, and output description of PP-DocTranslation prediction are as follows:

(1) Instantiate the PP-DocTranslation pipeline object by calling PPDocTranslation.

Relevant parameter descriptions are as follows:

Parameter	Description	Type	Default Value
`layout_detection_model_name`	The model name for layout detection. If set to `None`, the pipeline's default model will be used.	`str\|None`	`None`
`layout_detection_model_dir`	The directory path of the layout detection model. If set to `None`, the official model will be downloaded.	`str\|None`	`None`
`layout_threshold`	Score threshold for the layout model. float: Any float between `0-1`; dict: `{0:0.1}`, where the key is the class ID and the value is the threshold for that class; None: If set to `None`, the pipeline's initialized value will be used, defaulting to `0.5`.	`float\|dict\|None`	`None`
`layout_nms`	Whether to use post-processing NMS for layout detection. If set to `None`, the pipeline's initialized value will be used, defaulting to `True`.	`bool\|None`	`None`
`layout_unclip_ratio`	Expansion coefficient for detection boxes in the layout detection model. float: Any float greater than `0`; Tuple[float,float]: Expansion coefficients in horizontal and vertical directions respectively; dict: Keys are int representing `cls_id`, values are tuple, e.g. `{0: (1.1, 2.0)}`, meaning for class 0 detection boxes, center remains unchanged, width expanded by 1.1 times, height expanded by 2.0 times; None: If set to `None`, the pipeline's initialized value will be used, defaulting to `1.0`.	`float\|Tuple[float,float]\|dict\|None`	`None`
`layout_merge_bboxes_mode`	Overlap box filtering method for layout detection. str: `large`, `small`, `union`, indicating whether to keep the larger box, smaller box, or both during overlap filtering; dict: Keys are int `cls_id`, values are str, e.g. `{0: "large", 2: "small"}`, meaning use "large" mode for class 0 boxes and "small" mode for class 2 boxes; None: If set to `None`, the pipeline's initialized value will be used, defaulting to `large`.	`str\|dict\|None`	`None`
`chart_recognition_model_name`	The model name for chart parsing. If set to `None`, the pipeline's default model will be used.	`str\|None`	`None`
`chart_recognition_model_dir`	The directory path of the chart parsing model. If set to `None`, the official model will be downloaded.	`str\|None`	`None`
`chart_recognition_batch_size`	Batch size for the chart parsing model. If set to `None`, batch size defaults to `1`.	`int\|None`	`None`
`region_detection_model_name`	The model name for region detection. If set to `None`, the pipeline's default model will be used.	`str\|None`	`None`
`region_detection_model_dir`	The directory path of the region detection model. If set to `None`, the official model will be downloaded.	`str\|None`	`None`
`doc_orientation_classify_model_name`	The model name for document orientation classification. If set to `None`, the pipeline's default model will be used.	`str\|None`	`None`
`doc_orientation_classify_model_dir`	The directory path of the document orientation classification model. If set to `None`, the official model will be downloaded.	`str\|None`	`None`
`doc_unwarping_model_name`	The model name for text image unwarping. If set to `None`, the pipeline's default model will be used.	`str\|None`	`None`
`doc_unwarping_model_dir`	The directory path of the text image unwarping model. If set to `None`, the official model will be downloaded.	`str\|None`	`None`
`text_detection_model_name`	The model name for text detection. If set to `None`, the pipeline's default model will be used.	`str\|None`	`None`
`text_detection_model_dir`	The directory path of the text detection model. If set to `None`, the official model will be downloaded.	`str\|None`	`None`
`text_det_limit_side_len`	Image side length limit for text detection. int: Any integer greater than `0`; None: If set to `None`, the pipeline's initialized value will be used, defaulting to `960`.	`int\|None`	`None`
`text_det_limit_type`	Type of image side length limit for text detection. str: Supports `min` and `max`, where `min` means ensuring the shortest side of the image is not less than `det_limit_side_len`, and `max` means ensuring the longest side is not greater than `limit_side_len`; None: If set to `None`, the pipeline's initialized value will be used, defaulting to `max`.	`str\|None`	`None`
`text_det_thresh`	Pixel threshold for detection; pixels in the output probability map with scores above this threshold are considered text pixels. float: Any float greater than `0`; None: If set to `None`, the pipeline's initialized value of `0.3` will be used.	`float\|None`	`None`
`text_det_box_thresh`	Detection box threshold; when the average score of all pixels inside a detected box exceeds this threshold, it is considered a text region. float: Any float greater than `0`; None: If set to `None`, the pipeline's initialized value of `0.6` will be used.	`float\|None`	`None`
`text_det_unclip_ratio`	Expansion coefficient for text detection; this method expands the text region, and the larger the value, the larger the expansion area. float: Any float greater than `0`; None: If set to `None`, the pipeline's initialized value of `2.0` will be used.	`float\|None`	`None`
`textline_orientation_model_name`	The model name for text line orientation classification. If set to `None`, the pipeline's default model will be used.	`str\|None`	`None`
`textline_orientation_model_dir`	The directory path of the text line orientation model. If set to `None`, the official model will be downloaded.	`str\|None`	`None`
`textline_orientation_batch_size`	Batch size for the text line orientation model. If set to `None`, batch size defaults to `1`.	`int\|None`	`None`
`text_recognition_model_name`	The model name for text recognition. If set to `None`, the pipeline's default model will be used.	`str\|None`	`None`
`text_recognition_model_dir`	The directory path of the text recognition model. If set to `None`, the official model will be downloaded.	`str\|None`	`None`
`text_recognition_batch_size`	Batch size for the text recognition model. If set to `None`, batch size defaults to `1`.	`int\|None`	`None`
`text_rec_score_thresh`	Text recognition threshold; text results with scores greater than this threshold will be retained. float: Any float greater than `0`; None: If set to `None`, the pipeline's initialized value of `0.0` (no threshold) will be used.	`float\|None`	`None`
`table_classification_model_name`	The model name for table classification. If set to `None`, the pipeline's default model will be used.	`str\|None`	`None`
`table_classification_model_dir`	The directory path of the table classification model. If set to `None`, the official model will be downloaded.	`str\|None`	`None`
`wired_table_structure_recognition_model_name`	The model name for wired table structure recognition. If set to `None`, the pipeline's default model will be used.	`str\|None`	`None`
`wired_table_structure_recognition_model_dir`	The directory path of the wired table structure recognition model. If set to `None`, the official model will be downloaded.	`str\|None`	`None`
`wireless_table_structure_recognition_model_name`	The model name for wireless table structure recognition. If set to `None`, the pipeline's default model will be used.	`str\|None`	`None`
`wireless_table_structure_recognition_model_dir`	The directory path of the wireless table structure recognition model. If set to `None`, the official model will be downloaded.	`str\|None`	`None`
`wired_table_cells_detection_model_name`	The model name for wired table cell detection. If set to `None`, the pipeline's default model will be used.	`str\|None`	`None`
`wired_table_cells_detection_model_dir`	The directory path of the wired table cell detection model. If set to `None`, the official model will be downloaded.	`str\|None`	`None`
`wireless_table_cells_detection_model_name`	The model name for wireless table cell detection. If set to `None`, the pipeline's default model will be used.	`str\|None`	`None`
`wireless_table_cells_detection_model_dir`	The directory path of the wireless table cell detection model. If set to `None`, the official model will be downloaded.	`str\|None`	`None`
`table_orientation_classify_model_name`	The model name for table orientation classification. If set to `None`, the pipeline's default model will be used.	`str\|None`	`None`
`table_orientation_classify_model_dir`	The directory path of the table orientation classification model. If set to `None`, the official model will be downloaded.	`str\|None`	`None`
`seal_text_detection_model_name`	The model name for seal text detection. If set to `None`, the pipeline's default model will be used.	`str\|None`	`None`
`seal_text_detection_model_dir`	The directory path of the seal text detection model. If set to `None`, the official model will be downloaded.	`str\|None`	`None`
`seal_det_limit_side_len`	Image side length limit for seal text detection. int: any integer greater than `0`; None: if set to `None`, the parameter value initialized by the pipeline will be used, with a default initialization of `736`.	`int\|None`	`None`
`seal_det_limit_type`	Type of image side length limit for seal text detection. str: supports `min` and `max`, where `min` ensures the shortest image side is not less than `det_limit_side_len`, and `max` ensures the longest image side is not greater than `limit_side_len`; None: if set to `None`, the parameter value initialized by the pipeline will be used, with a default initialization of `min`.	`str\|None`	`None`
`seal_det_thresh`	Detection pixel threshold. In the output probability map, pixels with scores above this threshold are considered text pixels. float: any floating number greater than `0`; None: if set to `None`, the pipeline default parameter value `0.2` will be used.	`float\|None`	`None`
`seal_det_box_thresh`	Detection box threshold. When the average score of all pixels within the detected bounding box is greater than this threshold, the result is considered a text region. float: any floating number greater than `0`; None: if set to `None`, the pipeline default parameter value `0.6` will be used.	`float\|None`	`None`
`seal_det_unclip_ratio`	Expansion coefficient for seal text detection. This method expands the text region; the larger the value, the larger the expansion area. float: any floating number greater than `0`; None: if set to `None`, the pipeline default parameter value `0.5` will be used.	`float\|None`	`None`
`seal_text_recognition_model_name`	Name of the seal text recognition model. If set to `None`, the pipeline default model will be used.	`str\|None`	`None`
`seal_text_recognition_model_dir`	Directory path for the seal text recognition model. If set to `None`, the official model will be downloaded.	`str\|None`	`None`
`seal_text_recognition_batch_size`	Batch size for the seal text recognition model. If set to `None`, the batch size defaults to `1`.	`int\|None`	`None`
`seal_rec_score_thresh`	Seal text recognition threshold. Text results with scores above this threshold will be retained. float: any floating number greater than `0`; None: if set to `None`, the pipeline default parameter value `0.0` will be used, meaning no threshold is set.	`float\|None`	`None`
`formula_recognition_model_name`	Name of the formula recognition model. If set to `None`, the pipeline default model will be used.	`str\|None`	`None`
`formula_recognition_model_dir`	Directory path for the formula recognition model. If set to `None`, the official model will be downloaded.	`str\|None`	`None`
`formula_recognition_batch_size`	Batch size for the formula recognition model. If set to `None`, the batch size defaults to `1`.	`int\|None`	`None`
`use_doc_orientation_classify`	Whether to load and use the document orientation classification module. If set to `None`, the pipeline initialized parameter value will be used, defaulting to `False`.	`bool\|None`	`None`
`use_doc_unwarping`	Whether to load and use the text image unwarping module. If set to `None`, the pipeline initialized parameter value will be used, defaulting to `False`.	`bool\|None`	`None`
`use_textline_orientation`	Whether to load and use the text line orientation classification module. If set to `None`, the pipeline initialized parameter value will be used, defaulting to `True`.	`bool\|None`	`None`
`use_seal_recognition`	Whether to load and use the seal text recognition sub-pipeline. If set to `None`, the pipeline initialized parameter value will be used, defaulting to `True`.	`bool\|None`	`None`
`use_table_recognition`	Whether to load and use the table recognition sub-pipeline. If set to `None`, the pipeline initialized parameter value will be used, defaulting to `True`.	`bool\|None`	`None`
`use_formula_recognition`	Whether to load and use the formula recognition sub-pipeline. If set to `None`, the pipeline initialized parameter value will be used, defaulting to `True`.	`bool\|None`	`None`
`use_chart_recognition`	Whether to load and use the chart parsing module. If set to `None`, the pipeline initialized parameter value will be used, defaulting to `False`.	`bool\|None`	`None`
`use_region_detection`	Whether to load and use the document region detection module. If set to `None`, the pipeline initialized parameter value will be used, defaulting to `True`.	`bool\|None`	`None`
`chat_bot_config`	Large language model configuration information. The configuration content is the following dict: `{ "module_name": "chat_bot", "model_name": "ernie-3.5-8k", "base_url": "https://qianfan.baidubce.com/v2", "api_type": "openai", "api_key": "api_key" # Please set this to the actual API key }`	`dict\|None`	`None`
`device`	Device used for inference. Supports specifying a specific card number: CPU: e.g. `cpu` means using CPU for inference; GPU: e.g. `gpu:0` means using the first GPU for inference; NPU: e.g. `npu:0` means using the first NPU for inference; XPU: e.g. `xpu:0` means using the first XPU for inference; MLU: e.g. `mlu:0` means using the first MLU for inference; DCU: e.g. `dcu:0` means using the first DCU for inference; None: if set to `None`, initialization will prioritize using the local GPU device 0; if unavailable, CPU will be used.	`str\|None`	`None`
`enable_hpi`	Whether to enable high-performance inference.	`bool`	`False`
`use_tensorrt`	Whether to enable Paddle Inference’s TensorRT subgraph engine. If the model does not support acceleration via TensorRT, enabling this flag will have no effect. For Paddle with CUDA 11.8, the compatible TensorRT version is 8.x (x≥6), recommended installation is TensorRT 8.6.1.6.	`bool`	`False`
`precision`	Computation precision, such as fp32, fp16.	`str`	`"fp32"`
`enable_mkldnn`	Whether to enable MKL-DNN accelerated inference. If MKL-DNN is unavailable or the model does not support acceleration via MKL-DNN, enabling this flag will have no effect.	`bool`	`True`
`mkldnn_cache_capacity`	MKL-DNN cache capacity.	`int`	`10`
`cpu_threads`	Number of threads used during inference on CPU.	`int`	`8`
`paddlex_config`	Path to the PaddleX pipeline configuration file.	`str\|None`	`None`

(2) Call the visual_predict() method of the PP-DocTranslation pipeline object to obtain visual prediction results. This method returns a list of results. Additionally, the pipeline provides a visual_predict_iter() method. Both methods accept the same parameters and return the same results, but visual_predict_iter() returns a generator, which can process and retrieve prediction results step-by-step, suitable for large datasets or memory-saving scenarios. You can choose either method according to your actual needs. Below are the parameters of the visual_predict() method and their descriptions:

Parameter	Description	Type	Default
`input`	Data to be predicted, supports multiple input types, required. Python Var: image data such as `numpy.ndarray`; str: local path of image or PDF files, e.g. `/root/data/img.jpg`; URL link: network URL of image or PDF files, e.g. example; local directory: directory containing images to be predicted, e.g. `/root/data/` (currently does not support PDFs in directories, PDF files need to specify exact file path); list: list elements must be one of the above types, e.g. `[numpy.ndarray, numpy.ndarray]`, `["/root/data/img1.jpg", "/root/data/img2.jpg"]`, `["/root/data1", "/root/data2"]`.	`Python Var\|str\|list`
`use_doc_orientation_classify`	Whether to use the document orientation classification module during inference. Setting to `None` means using the instantiated parameter, otherwise this parameter has higher priority.	`bool\|None`	`None`
`use_doc_unwarping`	Whether to use the text image unwarping module during inference. Setting to `None` means using the instantiated parameter, otherwise this parameter has higher priority.	`bool\|None`	`None`
`use_textline_orientation`	Whether to use the text line orientation classification module during inference. Setting to `None` means using the instantiated parameter, otherwise this parameter has higher priority.	`bool\|None`	`None`
`use_seal_recognition`	Whether to use the seal text recognition sub-pipeline during inference. Setting to `None` means using the instantiated parameter, otherwise this parameter has higher priority.	`bool\|None`	`None`
`use_table_recognition`	Whether to use the table recognition sub-pipeline during inference. Setting to `None` means using the instantiated parameter, otherwise this parameter has higher priority.	`bool\|None`	`None`
`use_formula_recognition`	Whether to use the formula recognition sub-pipeline during inference. Setting to `None` means using the instantiated parameter, otherwise this parameter has higher priority.	`bool\|None`	`None`
`use_chart_recognition`	Whether to use the chart parsing module during inference. Setting to `None` means using the instantiated parameter, otherwise this parameter has higher priority.	`bool\|None`	`None`
`use_region_detection`	Whether to use the document layout detection module during inference. Setting to `None` means using the instantiated parameter, otherwise this parameter has higher priority.	`bool\|None`	`None`
`layout_threshold`	Parameter meaning is basically the same as the instantiated parameter. Setting to `None` means using the instantiated parameter, otherwise this parameter has higher priority.	`float\|dict\|None`	`None`
`layout_nms`	Parameter meaning is basically the same as the instantiated parameter. Setting to `None` means using the instantiated parameter, otherwise this parameter has higher priority.	`bool\|None`	`None`
`layout_unclip_ratio`	Parameter meaning is basically the same as the instantiated parameter. Setting to `None` means using the instantiated parameter, otherwise this parameter has higher priority.	`float\|Tuple[float,float]\|dict\|None`	`None`
`layout_merge_bboxes_mode`	Parameter meaning is basically the same as the instantiated parameter. Setting to `None` means using the instantiated parameter, otherwise this parameter has higher priority.	`str\|dict\|None`	`None`
`text_det_limit_side_len`	Parameter meaning is basically the same as the instantiated parameter. Setting to `None` means using the instantiated parameter, otherwise this parameter has higher priority.	`int\|None`	`None`
`text_det_limit_type`	Parameter meaning is basically the same as the instantiated parameter. Setting to `None` means using the instantiated parameter, otherwise this parameter has higher priority.	`str\|None`	`None`
`text_det_thresh`	Parameter meaning is basically the same as the instantiated parameter. Setting to `None` means using the instantiated parameter, otherwise this parameter has higher priority.	`float\|None`	`None`
`text_det_box_thresh`	Parameter meaning is basically the same as the instantiated parameter. Setting to `None` means using the instantiated parameter, otherwise this parameter has higher priority.	`float\|None`	`None`
`text_det_unclip_ratio`	Parameter meaning is basically the same as the instantiated parameter. Setting to `None` means using the instantiated parameter, otherwise this parameter has higher priority.	`float\|None`	`None`
`text_rec_score_thresh`	Parameter meaning is basically the same as the instantiated parameter. Setting to `None` means using the instantiated parameter, otherwise this parameter has higher priority.	`float\|None`	`None`
`seal_det_limit_side_len`	Parameter meaning is basically the same as the instantiated parameter. Setting to `None` means using the instantiated parameter, otherwise this parameter has higher priority.	`int\|None`	`None`
`seal_det_limit_type`	Parameter meaning is basically the same as the instantiated parameter. Setting to `None` means using the instantiated parameter, otherwise this parameter has higher priority.	`str\|None`	`None`
`seal_det_thresh`	Parameter meaning is basically the same as the instantiated parameter. Setting to `None` means using the instantiated parameter, otherwise this parameter has higher priority.	`float\|None`	`None`
`seal_det_box_thresh`	Parameter meaning is basically the same as the instantiated parameter. Setting to `None` means using the instantiated parameter, otherwise this parameter has higher priority.	`float\|None`	`None`
`seal_det_unclip_ratio`	Parameter meaning is basically the same as the instantiated parameter. Setting to `None` means using the instantiated parameter, otherwise this parameter has higher priority.	`float\|None`	`None`
`seal_rec_score_thresh`	Parameter meaning is basically the same as the instantiated parameter. Setting to `None` means using the instantiated parameter, otherwise this parameter has higher priority.	`float\|None`	`None`
`use_wired_table_cells_trans_to_html`	Whether to enable direct conversion of wired table cell detection results to HTML. When enabled, HTML is constructed directly based on the geometric relations of wired table cell detection results.	`bool`	`False`
`use_wireless_table_cells_trans_to_html`	Whether to enable direct conversion of wireless table cell detection results to HTML. When enabled, HTML is constructed directly based on the geometric relations of wireless table cell detection results.	`bool`	`False`
`use_table_orientation_classify`	Whether to enable table orientation classification. When enabled, tables with 90/180/270 degree rotations in images can be corrected in orientation and correctly recognized.	`bool`	`True`
`use_ocr_results_with_table_cells`	Whether to enable OCR segmentation by table cells. When enabled, OCR detection results are segmented and re-recognized based on cell prediction results to avoid missing text.	`bool`	`True`
`use_e2e_wired_table_rec_model`	Whether to enable end-to-end wired table recognition mode. When enabled, the cell detection model is not used, only the table structure recognition model is used.	`bool`	`False`
`use_e2e_wireless_table_rec_model`	Whether to enable end-to-end wireless table recognition mode. When enabled, the cell detection model is not used, only the table structure recognition model is used.	`bool`	`True`

(3) Processing visual prediction results: Each sample's prediction result is a corresponding Result object, supporting operations such as printing, saving as images, and saving as json files:

Method	Description	Parameter	Parameter Type	Parameter Description	Default
`print()`	Print results to terminal	`format_json`	`bool`	Whether to format the output content using `JSON` indentation	`True`
		`indent`	`int`	Specify indentation level to beautify output `JSON` data for better readability, effective only when `format_json` is `True`	4
		`ensure_ascii`	`bool`	Control whether non-`ASCII` characters are escaped as `Unicode`. When set to `True`, all non-`ASCII` characters will be escaped; if `False`, original characters are preserved. Effective only when `format_json` is `True`	`False`
`save_to_json()`	Save results as a JSON file	`save_path`	`str`	File path for saving. If a directory is specified, the saved file name matches the input file type name	None
		`indent`	`int`	Specify indentation level to beautify output `JSON` data for better readability, effective only when `format_json` is `True`	4
		`ensure_ascii`	`bool`	Control whether non-`ASCII` characters are escaped as `Unicode`. When set to `True`, all non-`ASCII` characters will be escaped; if `False`, original characters are preserved. Effective only when `format_json` is `True`	`False`
`save_to_img()`	Save visualized images from intermediate modules as PNG format images	`save_path`	`str`	File path for saving, supports directory or file path	None
`save_to_markdown()`	Save each page of image or PDF files as separate markdown files	`save_path`	`str`	File path for saving, supports directory or file path	None
`save_to_html()`	Save tables in the file as HTML format files	`save_path`	`str`	File path for saving, supports directory or file path	None
`save_to_xlsx()`	Save tables in the file as XLSX format files	`save_path`	`str`	File path for saving, supports directory or file path	None

- Calling the `print()` method will print the results to the terminal, with the following explanation of printed content: - `input_path`: `(str)` Input path of the image or PDF to be predicted - `page_index`: `(Union[int, None])` If the input is a PDF, this indicates the current page number; otherwise `None` - `model_settings`: `(Dict[str, bool])` Model parameters configured for the pipeline - `use_doc_preprocessor`: `(bool)` Controls whether to enable the document preprocessing sub-pipeline - `use_general_ocr`: `(bool)` Controls whether to enable the OCR sub-pipeline - `use_seal_recognition`: `(bool)` Controls whether to enable the seal text recognition sub-pipeline - `use_table_recognition`: `(bool)` Controls whether to enable the table recognition sub-pipeline - `use_formula_recognition`: `(bool)` Controls whether to enable the formula recognition sub-pipeline - `doc_preprocessor_res`: `(Dict[str, Union[List[float], str]])` Document preprocessing result dictionary, present only when `use_doc_preprocessor=True` - `input_path`: `(str)` Image path accepted by the document preprocessing sub-pipeline; if input is `numpy.ndarray`, saved as `None`, here it is `None` - `page_index`: `None`, here input is `numpy.ndarray`, so value is `None` - `model_settings`: `(Dict[str, bool])` Model configuration parameters of the document preprocessing sub-pipeline - `use_doc_orientation_classify`: `(bool)` Controls whether to enable the document image orientation classification sub-module - `use_doc_unwarping`: `(bool)` Controls whether to enable the text image unwarping sub-module - `angle`: `(int)` Prediction result of the document image orientation classification sub-module, returns actual angle value if enabled - `parsing_res_list`: `(List[Dict])` List of parsing results, each element is a dictionary; list order corresponds to reading order after parsing - `block_bbox`: `(np.ndarray)` Bounding box of layout detection - `block_label`: `(str)` Label of the layout region, e.g. `text`, `table`, etc. - `block_content`: `(str)` Content within the layout region - `seg_start_flag`: `(bool)` Indicates whether this layout region is the start of a paragraph - `seg_end_flag`: `(bool)` Indicates whether this layout region is the end of a paragraph - `sub_label`: `(str)` Sub-label of the layout region, e.g. sub-label of `text` could be `title_text` - `sub_index`: `(int)` Sub-index of the layout region, used for restoring Markdown - `index`: `(int)` Index of the layout region, used to display layout sorting results - `overall_ocr_res`: `(Dict[str, Union[List[str], List[float], numpy.ndarray]])` Global OCR result dictionary - `input_path`: `(Union[str, None])` Image path accepted by the image OCR sub-pipeline; if input is `numpy.ndarray`, saved as `None` - `page_index`: `None`, here input is `numpy.ndarray`, so value is `None` - `model_settings`: `(Dict)` Model configuration parameters of the OCR sub-pipeline - `dt_polys`: `(List[numpy.ndarray])` List of text detection polygons; each detection box is a numpy array with 4 vertex coordinates, shape (4, 2), dtype int16 - `dt_scores`: `(List[float])` Confidence scores of text detection boxes - `text_det_params`: `(Dict[str, Dict[str, int, float]])` Configuration parameters of the text detection module - `limit_side_len`: `(int)` Length limit for image preprocessing - `limit_type`: `(str)` Processing method for length limit - `thresh`: `(float)` Confidence threshold for text pixel classification - `box_thresh`: `(float)` Confidence threshold for text detection boxes - `unclip_ratio`: `(float)` Expansion factor for text detection boxes - `text_type`: `(str)` Type of text detection, currently fixed as "general" - `text_type`: `(str)` Type of text detection, currently fixed as "general" - `textline_orientation_angles`: `(List[int])` Prediction results of text line orientation classification; returns actual angle values when enabled (e.g. [0,0,1]) - `text_rec_score_thresh`: `(float)` Filtering threshold for text recognition results - `rec_texts`: `(List[str])` List of text recognition results, only including texts exceeding the `text_rec_score_thresh` - `rec_scores`: `(List[float])` Confidence scores of text recognition, filtered by `text_rec_score_thresh` - `rec_polys`: `(List[numpy.ndarray])` List of text detection boxes filtered by confidence, format same as `dt_polys` - `formula_res_list`: `(List[Dict[str, Union[numpy.ndarray, List[float], str]]])` List of formula recognition results, each element is a dictionary - `rec_formula`: `(str)` Formula recognition result - `rec_polys`: `(numpy.ndarray)` Formula detection boxes, shape (4, 2), dtype int16 - `formula_region_id`: `(int)` Region ID where the formula is located - `seal_res_list`: `(List[Dict[str, Union[numpy.ndarray, List[float], str]]])` List of seal recognition results, each element is a dictionary - `input_path`: `(str)` Input path of seal image - `page_index`: `None`, here input is `numpy.ndarray`, so value is `None` - `model_settings`: `(Dict)` Model configuration parameters of the seal text recognition sub-pipeline - `dt_polys`: `(List[numpy.ndarray])` List of seal detection boxes, format same as `dt_polys` - `text_det_params`: `(Dict[str, Dict[str, int, float]])` Configuration parameters of the seal detection module, meanings same as above - `text_type`: `(str)` Type of seal detection, currently fixed as "seal" - `text_rec_score_thresh`: `(float)` Filtering threshold for seal recognition results - `rec_texts`: `(List[str])` List of seal recognition results, only including texts exceeding the `text_rec_score_thresh` - `rec_scores`: `(List[float])` Confidence scores of seal recognition, filtered by `text_rec_score_thresh` - `rec_polys`: `(List[numpy.ndarray])` List of seal detection boxes filtered by confidence, format same as `dt_polys` - `rec_boxes`: `(numpy.ndarray)` Rectangular bounding box array of detection boxes, shape (n, 4), dtype int16; each row represents one rectangle - `table_res_list`: `(List[Dict[str, Union[numpy.ndarray, List[float], str]]])` List of table recognition results, each element is a dictionary - `cell_box_list`: `(List[numpy.ndarray])` List of table cell bounding boxes - `pred_html`: `(str)` Table in HTML format string - `table_ocr_pred`: `(dict)` OCR recognition results of the table - `rec_polys`: `(List[numpy.ndarray])` List of cell detection boxes - `rec_texts`: `(List[str])` Recognition results of cells - `rec_scores`: `(List[float])` Recognition confidence scores of cells - `rec_boxes`: `(numpy.ndarray)` Rectangular bounding box array of detection boxes, shape (n, 4), dtype int16; each row represents one rectangle - Calling the `save_to_json()` method will save the above content to the specified `save_path`. If a directory is specified, the saved path will be `save_path/{your_img_basename}_res.json`. If a file is specified, it will be saved directly to that file. Since JSON files do not support saving numpy arrays, all `numpy.array` types will be converted to list format. - Calling the `save_to_img()` method will save visualization results to the specified `save_path`. If a directory is specified, it will save layout detection visual images, global OCR visual images, layout reading order visual images, etc. If a file is specified, it will be saved directly to that file. (The pipeline usually contains many result images, so it is not recommended to specify a specific file path directly, or multiple images will be overwritten, leaving only the last image.) - Calling the `save_to_markdown()` method will save the converted Markdown files to the specified `save_path`. The saved file path will be `save_path/{your_img_basename}.md`. If the input is a PDF file, it is recommended to specify a directory directly, otherwise multiple markdown files will be overwritten. - Calling the `concatenate_markdown_pages()` method merges the multi-page Markdown contents output by the PP-DocTranslation pipeline `markdown_list` into a single complete document and returns the merged Markdown content.

(4) Call the translate() method to perform document translation. This method returns the original and translated markdown content as a markdown object, which can be saved locally by executing the save_to_markdown() method for the desired parts. Below are the relevant parameters of the translate() method:

Parameter	Description	Type	Default
`ori_md_info_list`	List of original Markdown data containing content to be translated. Must be a list of dictionaries, each representing a document block	`List[Dict]`
`target_language`	Target language (ISO 639-1 language code, e.g. `"en"`/`"ja"`/`"fr"`)	`str`	`"zh"`
`chunk_size`	Character count threshold for chunked translation processing	`int`	`5000`
`task_description`	Custom task description prompt	`str\|None`	`None`
`output_format`	Specified output format requirements, e.g. "preserve original Markdown structure"	`str\|None`	`None`
`rules_str`	Custom translation rule description	`str\|None`	`None`
`few_shot_demo_text_content`	Few-shot learning example text content	`str\|None`	`None`
`few_shot_demo_key_value_list`	Structured few-shot example data in key-value pairs, can include professional terminology glossary	`str\|None`	`None`
`glossary`	Professional terminology glossary for translation	`dict\|None`	`None`
`llm_request_interval`	Interval in seconds between requests to the large language model. This parameter helps prevent too frequent calls to the LLM.	`float`	`0.0`
`chat_bot_config`	Large language model configuration. Setting to `None` uses instantiation parameters; otherwise, this parameter takes priority.	`dict\|None`	`None`

3. Development Integration/Deployment¶

If the pipeline can meet your requirements for inference speed and accuracy, you can proceed directly with development integration/deployment.

If you need to directly apply the pipeline in your Python project, you can refer to the sample code in 2.2 Python Script Approach.

In addition, PaddleOCR also offers two other deployment methods, detailed as follows:

🚀 High-Performance Inference: In real-world production environments, many applications have stringent performance criteria (especially response speed) for deployment strategies to ensure efficient system operation and a smooth user experience. To this end, PaddleOCR provides high-performance inference capabilities, aiming to deeply optimize model inference and pre/post-processing, achieving significant acceleration in the end-to-end process. For detailed information on the high-performance inference process, please refer to High-Performance Inference.

☁️ Serving: Serving is a common deployment form in real-world production environments. By encapsulating inference functions as services, clients can access these services through network requests to obtain inference results. For detailed information on the pipeline serving process, please refer to Serving.

Below are the API references for basic serving and examples of multi-language service invocation:

API Reference

Main operations provided by the serving:

HTTP request method is POST.
Both request body and response body are JSON data (JSON objects).
When the request is processed successfully, the response status code is 200, and the response body has the following properties:

Name	Type	Meaning
`logId`	`string`	Request UUID.
`errorCode`	`integer`	Error code. Fixed as `0`.
`errorMsg`	`string`	Error message. Fixed as `"Success"`.
`result`	`object`	Operation result.

When the request is not successful, the response body has the following properties:

Name	Type	Meaning
`logId`	`string`	Request UUID.
`errorCode`	`integer`	Error code. Same as response status code.
`errorMsg`	`string`	Error message.

Main operations provided by the serving are as follows:

analyzeImages

Use computer vision models to analyze images, obtaining OCR, table recognition results, etc.

POST /doctrans-visual

Request body properties are as follows:

Name	Type	Meaning	Required
`file`	`string`	URL of image or PDF file accessible by the server, or Base64 encoding of such file contents. By default, for PDF files over 10 pages, only the first 10 pages are processed. To remove the page limit, add the following configuration in the pipeline config file: `Serving: extra: max_num_input_imgs: null`	Yes
`fileType`	`integer`｜`null`	File type. `0` means PDF, `1` means image file. If not present in the request, the file type will be inferred from the URL.	No
`useDocOrientationClassify`	`boolean` \| `null`	See the `use_doc_orientation_classify` parameter description in the pipeline object's `visual_predict` method.	No
`useDocUnwarping`	`boolean` \| `null`	See the `use_doc_unwarping` parameter description in the pipeline object's `visual_predict` method.	No
`useTextlineOrientation`	`boolean` \| `null`	See the `use_textline_orientation` parameter description in the pipeline object's `visual_predict` method.	No
`useSealRecognition`	`boolean` \| `null`	See the `use_seal_recognition` parameter description in the pipeline object's `visual_predict` method.	No
`useTableRecognition`	`boolean` \| `null`	See the `use_table_recognition` parameter description in the pipeline object's `visual_predict` method.	No
`useFormulaRecognition`	`boolean` \| `null`	See the `use_formula_recognition` parameter description in the pipeline object's `visual_predict` method.	No
`useChartRecognition`	`boolean` \| `null`	See the `use_chart_recognition` parameter description in the pipeline object's `visual_predict` method.	No
`useRegionDetection`	`boolean` \| `null`	See the `use_region_detection` parameter description in the pipeline object's `visual_predict` method.	No
`layoutThreshold`	`number` \| `object` \| `null`	See the `layout_threshold` parameter description in the pipeline object's `visual_predict` method.	No
`layoutNms`	`boolean` \| `null`	See the `layout_nms` parameter description in the pipeline object's `visual_predict` method.	No
`layoutUnclipRatio`	`number` \| `array` \| `object` \| `null`	See the `layout_unclip_ratio` parameter description in the pipeline object's `visual_predict` method.	No
`layoutMergeBboxesMode`	`string` \| `object` \| `null`	See the `layout_merge_bboxes_mode` parameter description in the pipeline object's `visual_predict` method.	No
`textDetLimitSideLen`	`integer` \| `null`	See the `text_det_limit_side_len` parameter description in the pipeline object's `visual_predict` method.	No
`textDetLimitType`	`string` \| `null`	See the `text_det_limit_type` parameter description in the pipeline object's `visual_predict` method.	No
`textDetThresh`	`number` \| `null`	See the `text_det_thresh` parameter description in the pipeline object's `visual_predict` method.	No
`textDetBoxThresh`	`number` \| `null`	See the `text_det_box_thresh` parameter description in the pipeline object's `visual_predict` method.	No
`textDetUnclipRatio`	`number` \| `null`	See the `text_det_unclip_ratio` parameter description in the pipeline object's `visual_predict` method.	No
`textRecScoreThresh`	`number` \| `null`	See the `text_rec_score_thresh` parameter description in the pipeline object's `visual_predict` method.	No
`sealDetLimitSideLen`	`integer` \| `null`	See the `seal_det_limit_side_len` parameter description in the pipeline object's `visual_predict` method.	No
`sealDetLimitType`	`string` \| `null`	See the `seal_det_limit_type` parameter description in the pipeline object's `visual_predict` method.	No
`sealDetThresh`	`number` \| `null`	See the `seal_det_thresh` parameter description in the pipeline object's `visual_predict` method.	No
`sealDetBoxThresh`	`number` \| `null`	See the `seal_det_box_thresh` parameter description in the pipeline object's `visual_predict` method.	No
`sealDetUnclipRatio`	`number` \| `null`	See the `seal_det_unclip_ratio` parameter description in the pipeline object's `visual_predict` method.	No
`sealRecScoreThresh`	`number` \| `null`	See the `seal_rec_score_thresh` parameter description in the pipeline object's `visual_predict` method.	No
`useWiredTableCellsTransToHtml`	`boolean`	See the `use_wired_table_cells_trans_to_html` parameter description in the pipeline object's `visual_predict` method.	No
`useWirelessTableCellsTransToHtml`	`boolean`	See the `use_wireless_table_cells_trans_to_html` parameter description in the pipeline object's `visual_predict` method.	No
`useTableOrientationClassify`	`boolean`	See the `use_table_orientation_classify` parameter description in the pipeline object's `visual_predict` method.	No
`useOcrResultsWithTableCells`	`boolean`	See the `use_ocr_results_with_table_cells` parameter description in the pipeline object's `visual_predict` method.	No
`useE2eWiredTableRecModel`	`boolean`	See the `use_e2e_wired_table_rec_model` parameter description in the pipeline object's `visual_predict` method.	No
`useE2eWirelessTableRecModel`	`boolean`	See the `use_e2e_wireless_table_rec_model` parameter description in the pipeline object's `visual_predict` method.	No
`visualize`	`boolean` \| `null`	Whether to return visualization result images and intermediate images during processing. If `true` is passed: return images. If `false` is passed: do not return images. If this parameter is not provided in the request body or `null` is passed: follow the pipeline config file setting `Serving.visualize`. For example, add the following field in the pipeline config file: `Serving: visualize: False` By default, images will not be returned; the `visualize` parameter in the request body can override this default behavior. If neither the request body nor the config file sets it (or the request body passes `null` and the config file does not set it), images will be returned by default.	No

When the request is processed successfully, the response body's result has the following properties:

Name	Type	Meaning
`layoutParsingResults`	`array`	Layout parsing results. The array length is 1 (for image input) or equals the actual number of processed pages (for PDF input). For PDF input, each element corresponds to the result of each processed page in order.
`dataInfo`	`object`	Input data information.

Each element in layoutParsingResults is an object with the following properties:

Name	Type	Meaning
`prunedResult`	`object`	Simplified version of the `res` field in the JSON representation of the `layout_parsing_result` generated by the pipeline object's `visual_predict` method, with `input_path` and `page_index` fields removed.
`markdown`	`object`	Markdown result.
`outputImages`	`object` \| `null`	See the `img` property description in the pipeline prediction results. Images are in JPEG format and Base64 encoded.
`inputImage`	`string` \| `null`	Input image. JPEG format, Base64 encoded.

markdown is an object with the following properties:

Name	Type	Meaning
`text`	`string`	Markdown text.
`images`	`object`	Key-value pairs of Markdown image relative paths and Base64 encoded images.
`isStart`	`boolean`	Whether the first element on the current page is the start of a paragraph.
`isEnd`	`boolean`	Whether the last element on the current page is the end of a paragraph.

translate

Use a large model to translate documents.

POST /doctrans-translate

Request body properties are as follows:

Name	Type	Meaning	Required
`markdownList`	`array`	List of Markdown to be translated. Can be obtained from the results of the `analyzeImages` operation. The `images` attribute will not be used.	Yes
`targetLanguage`	`string`	Please refer to the `target_language` parameter description in the `translate` method of the pipeline object.	No
`chunkSize`	`integer`	Please refer to the `chunk_size` parameter description in the `translate` method of the pipeline object.	No
`taskDescription`	`string` \| `null`	Please refer to the `task_description` parameter description in the `translate` method of the pipeline object.	No
`outputFormat`	`string` \| `null`	Please refer to the `output_format` parameter description in the `translate` method of the pipeline object.	No
`rulesStr`	`string` \| `null`	Please refer to the `rules_str` parameter description in the `translate` method of the pipeline object.	No
`fewShotDemoTextContent`	`string` \| `null`	Please refer to the `few_shot_demo_text_content` parameter description in the `translate` method of the pipeline object.	No
`fewShotDemoKeyValueList`	`string` \| `null`	Please refer to the `few_shot_demo_key_value_list` parameter description in the `translate` method of the pipeline object.	No
`glossary`	`object` \| `null`	Please refer to the `glossary` parameter description in the `translate` method of the pipeline object.	No
`llmRequestInterval`	`number` \| `null`	Please refer to the `llm_request_interval` parameter description in the `translate` method of the pipeline object.	No
`chatBotConfig`	`object` \| `null`	Please refer to the `chat_bot_config` parameter description in the `translate` method of the pipeline object.	No

When the request is successfully processed, the result in the response body has the following attributes:

Name	Type	Meaning
`translationResults`	`array`	Translation results.

Each element in translationResults is an object with the following attributes:

Name	Type	Meaning
`language`	`string`	Target language.
`markdown`	`object`	Markdown result. Object definition is consistent with the `markdown` returned by the `analyzeImages` operation.

Note:

Including sensitive parameters such as the API key for large model calls in the request body may pose security risks. If not necessary, set these parameters in the configuration file and do not pass them during the request.

Examples of multi-language service invocation

Python

import base64
import pathlib
import pprint
import sys

import requests


API_BASE_URL = "http://127.0.0.1:8080"

file_path = "./demo.jpg"
target_language = "en"

with open(file_path, "rb") as file:
    file_bytes = file.read()
    file_data = base64.b64encode(file_bytes).decode("ascii")

payload = {
    "file": file_data,
    "fileType": 1,
}
resp_visual = requests.post(url=f"{API_BASE_URL}/doctrans-visual", json=payload)
if resp_visual.status_code != 200:
    print(
        f"Request to doctrans-visual failed with status code {resp_visual.status_code}."
    )
    pprint.pp(resp_visual.json())
    sys.exit(1)
result_visual = resp_visual.json()["result"]

markdown_list = []
for i, res in enumerate(result_visual["layoutParsingResults"]):
    md_dir = pathlib.Path(f"markdown_{i}")
    md_dir.mkdir(exist_ok=True)
    (md_dir / "doc.md")
write_text(res["markdown"]["text"])
    for img_path, img in res["markdown"]["images"].items():
        img_path = md_dir / img_path
        img_path.parent.mkdir(parents=True, exist_ok=True)
        img_path.write_bytes(base64.b64decode(img))
    print(f"The Markdown document to be translated is saved at {md_dir / 'doc.md'}")
    del res["markdown"]["images"]
    markdown_list.append(res["markdown"])
    for img_name, img in res["outputImages"].items():
        img_path = f"{img_name}_{i}.jpg"
        with open(img_path, "wb") as f:
            f.write(base64.b64decode(img))
        print(f"Output image saved at {img_path}")

payload = {
    "markdownList": markdown_list,
"targetLanguage": target_language,
}
resp_translate = requests.post(url=f"{API_BASE_URL}/doctrans-translate", json=payload)
if resp_translate.status_code != 200:
    print(
        f"Request to doctrans-translate failed with status code {resp_translate.status_code}."
    )
    pprint.pprint(resp_translate.json())  # Corrected 'pp' to 'pprint' for proper function call
    sys.exit(1)
result_translate = resp_translate.json()["result"]

for i, res in enumerate(result_translate["translationResults"]):
    md_dir = pathlib.Path(f"markdown_{i}")
    (md_dir / "doc_translated.md").write_text(res["markdown"]["text"])
    print(f"Translated markdown document saved at {md_dir / 'doc_translated.md'}")

4. Secondary Development¶

If the default model weights provided by the PP-DocTranslation pipeline do not meet your accuracy or speed requirements in your scenario, you can try to useyour own data from specific domains or application scenariosto furtherfine-tunethe existing model to improve the recognition effect in your scenario.

4.1 Model Fine-tuning¶

Since the PP-DocTranslation pipeline contains several modules, if the performance of the model pipeline does not meet expectations, the issue may originate from any one of these modules. You can analyze cases with poor extraction results, use visualized images to determine which module has the problem, and refer to the corresponding fine-tuning tutorial links in the following table to fine-tune the model.

Scenario	Fine-tuning module	Fine-tuning reference link
Inaccurate detection of layout areas, such as failure to detect seals and tables	Layout detection module	Link
Inaccurate recognition of table structures	Table structure recognition module	Link
Inaccurate recognition of formulas	Formula recognition module	Link
Omission in detecting seal texts	Seal text detection module	Link
Omission in detecting texts	Text detection module	Link
Inaccurate text content	Text recognition module	Link
Inaccurate correction of vertical or rotated text lines	Text line orientation classification module	Link
Inaccurate correction of whole image rotation	Document image orientation classification module	Link
Inaccurate correction of image distortion	Text image unwarping module	Fine-tuning is temporarily not supported

4.2 Model Application¶

After completing fine-tuning training with your private dataset, you can obtain a local model weight file. Then, you can use the fine-tuned model weights by customizing the pipeline configuration file.

Obtain the pipeline configuration file

You can call the export_paddlex_config_to_yaml method of the PP-DocTranslation pipeline object in PaddleOCR to export the current pipeline configuration to a YAML file:

from paddleocr import PPDocTranslation

pipeline = PPDocTranslation()
pipeline.export_paddlex_config_to_yaml("PP-DocTranslation.yaml")

Modify the configuration file

After obtaining the default pipeline configuration file, replace the local path of the fine-tuned model weights with the corresponding location in the pipeline configuration file. For example,

......
SubModules:
    TextDetection:
    module_name: text_detection
    model_name: PP-OCRv5_server_det
    model_dir: null # Replace with the path to the weights of the fine-tuned text detection model
    limit_side_len: 960
    limit_type: max
    thresh: 0.3
    box_thresh: 0.6
    unclip_ratio: 1.5

    TextRecognition:
    module_name: text_recognition
    model_name: PP-OCRv5_server_rec
    model_dir: null # Replace with the path to the weights of the fine-tuned text recognition model
    batch_size: 1
            score_thresh: 0
......

The pipeline configuration file not only includes parameters supported by PaddleOCR CLI and Python API but also allows for more advanced configurations. Detailed information can be found in the corresponding pipeline usage tutorial in the Overview of PaddleX Model Pipeline Usage. Refer to the detailed instructions therein and adjust the configurations according to your needs.

Load the pipeline configuration file in CLI

After modifying the configuration file, specify the path to the modified pipeline configuration file using the --paddlex_config parameter in the command line. PaddleOCR will then read its contents as the pipeline configuration. Here is an example:

paddleocr pp_doctranslation --paddlex_config PP-DocTranslation.yaml ...

Load the pipeline configuration file in the Python API

When initializing the pipeline object, you can pass the path of the PaddleX pipeline configuration file or a configuration dict through the paddlex_config parameter, and PaddleOCR will read its content as the pipeline configuration. The example is as follows:

from paddleocr import PPDocTranslation

pipeline = PPDocTranslation(paddlex_config="PP-DocTranslation.yaml")