PP-DocTranslation Pipeline Usage Tutorial¶

1. Introduction to PP-DocTranslation Pipeline¶

PP-DocTranslation is a document intelligent translation solution provided by PaddlePaddle. It integrates advanced general layout analysis technology and large language model (LLM) capabilities to offer you efficient document intelligent translation services. This solution can accurately identify and extract various elements within documents, including text blocks, headings, paragraphs, images, tables, and other complex layout structures, and on this basis, achieve high-quality multilingual translation. PP-DocTranslation supports mutual translation among multiple mainstream languages, particularly excelling in handling documents with complex layouts and strong contextual dependencies, striving to deliver precise, natural, fluent, and professional translation results. This pipeline also provides flexible serving options, supporting the use of multiple programming languages on various hardware. Moreover, it offers the capability for secondary development, allowing you to train and fine-tune models on your own datasets based on this pipeline, and the trained models can also be seamlessly integrated.

The PP-DocTranslation pipeline uses the PP-StructureV3 sub-pipeline, and thus has all the functions of the PP-StructureV3 pipeline. For more information on the functions and usage details of the PP-StructureV3 pipeline, you can click on the PP-StructureV3 Pipeline Documentation page.

In this pipeline, you can select the model to use based on the benchmark data below.

👉Details of model list

Document image orientation classification module:

Model	Model download link	Top-1 Acc (%)	GPU inference time (ms) [Normal mode / High-performance mode]	CPU inference time (ms) [Normal mode / High-performance mode]	Model storage size (M)	Introduction
PP-LCNet_x1_0_doc_ori	Inference model/Training model	99.06	2.62 / 0.59	3.24 / 1.19	7	A document image classification model based on PP-LCNet_x1_0, with four categories: 0 degrees, 90 degrees, 180 degrees, and 270 degrees

Text image unwarping module:

Model	Model download link	CER	Model storage size (M)	Introduction
UVDoc	Inference model/Training model	0.179	30.3 M	A high-precision text image unwarping model

Layout region detection module model:

Model	Model download link	mAP(0.5) (%)	GPU inference time (ms) [Normal mode / High-performance mode]	CPU inference time (ms) [Normal mode / High-performance mode]	Model storage size (M)	Introduction
PP-DocLayout_plus-L	Inference model/Training model	83.2	53.03 / 17.23	634.62 / 378.32	126.01 M	A higher-precision layout region localization model trained on a self-built dataset based on RT-DETR-L, covering scenarios such as Chinese and English papers, multi-column magazines, newspapers, PPTs, contracts, books, examination papers, research reports, ancient books, Japanese documents, and documents with vertical text.
PP-DocLayout-L	Inference model/Training model	90.4	33.59 / 33.59	503.01 / 251.08	123.76 M	A high-precision layout region localization model trained on a self-built dataset based on RT-DETR-L, covering scenarios such as Chinese and English papers, magazines, contracts, books, examination papers, and research reports.
PP-DocLayout-M	Inference model/Training model	75.2	13.03 / 4.72	43.39 / 24.44	22.578	A layout region localization model with balanced precision and efficiency trained on a self-built dataset based on PicoDet-L, covering scenarios such as Chinese and English papers, magazines, contracts, books, examination papers, and research reports.
PP-DocLayout-S	Inference model/Training model	70.9	11.54 / 3.86	18.53 / 6.29	4.834	A highly efficient layout region localization model trained on a self-built dataset based on PicoDet-S, covering scenarios such as Chinese and English papers, magazines, contracts, books, examination papers, and research reports.

Table structure recognition module:

Model	Model download link	Accuracy (%)	GPU inference time (ms) [Normal Mode / High-Performance Mode]	CPU inference time (ms) [Normal Mode / High-Performance Mode]	Model storage size (M)	Introduction
SLANeXt_wired	Inference model/Training model	69.65	85.92 / 85.92	- / 501.66	351M	The SLANeXt series is a new generation of table structure recognition models independently developed by Baidu PaddlePaddle's vision team. Compared to SLANet and SLANet_plus, SLANeXt focuses on recognizing table structures and has trained dedicated weights for wired and wireless tables separately. This has significantly improved its ability to recognize various types of tables, especially wired tables.
SLANeXt_wireless	Inference model/Training model	69.65	85.92 / 85.92	- / 501.66	351M

Table classification module model:

Model	Model download link	Top1 Acc(%)	GPU inference time (ms) [Normal Mode / High-Performance Mode]	CPU inference time (ms) [Normal Mode / High-Performance Mode]	Model storage size (M)
PP-LCNet_x1_0_table_cls	Inference model/Training model	94.2	2.62 / 0.60	3.17 / 1.14	6.6M

Table cell detection module model:

Model	Model download link	mAP(%)	GPU inference time (ms) [Normal mode / High-performance mode]	CPU inference time (ms) [Normal mode / High-performance mode]	Model storage size (M)	Introduction
RT-DETR-L_wired_table_cell_det	Inference model/Training model	82.7	33.47 / 27.02	402.55 / 256.56	124M	RT-DETR is the first real-time end-to-end object detection model. Based on RT-DETR-L as the base model, Baidu PaddlePaddle's vision team completed pre-training on a self-built table cell detection dataset, achieving table cell detection with good performance for both wired and wireless tables.
RT-DETR-L_wireless_table_cell_det	Inference model/Training model	82.7	33.47 / 27.02	402.55 / 256.56	124M

Text detection module:

Model	Model download link	Detection Hmean (%)	GPU inference time (ms) [Normal mode / High-performance mode]	CPU inference time (ms) [Normal mode / High-performance mode]	Model storage size (M)	Introduction
PP-OCRv5_server_det	Inference model/Training model	83.8	89.55 / 70.19	383.15 / 383.15	84.3	The server-side text detection model of PP-OCRv5, with higher accuracy, suitable for deployment on servers with better performance
PP-OCRv5_mobile_det	Inference model/Training model	79.0	10.67 / 6.36	57.77 / 28.15	4.7	PP-OCRv5's mobile-end text detection model, with higher efficiency, suitable for deployment on edge devices
PP-OCRv4_server_det	Inference model/Training model	69.2	127.82 / 98.87	585.95 / 489.77	109	PP-OCRv4's server-end text detection model, with higher accuracy, suitable for deployment on servers with better performance
PP-OCRv4_mobile_det	Inference model/Training model	63.8	9.87 / 4.17	56.60 / 20.79	4.7	PP-OCRv4's mobile-end text detection model, with higher efficiency, suitable for deployment on edge devices
PP-OCRv3_mobile_det	Inference model/Training model	Accuracy is close to PP-OCRv4_mobile_det	9.90 / 3.60	41.93 / 20.76	2.1	PP-OCRv3's mobile-end text detection model, with higher efficiency, suitable for deployment on edge devices
PP-OCRv3_server_det	Inference model/Training model	Accuracy is close to PP-OCRv4_server_det	119.50 / 75.00	379.35 / 318.35	102.1	Server-side text detection model of PP-OCRv3, with higher accuracy, suitable for deployment on servers with better performance

Text recognition module model:

*Chinese recognition model

Model	Model download link	Recognition Avg Accuracy(%)	GPU inference time (ms) [Normal mode / High-performance mode]	CPU inference time (ms) [Normal mode / High-performance mode]	Model storage size (M)	Introduction
PP-OCRv5_server_rec	Inference model/Training model	86.38	8.46 / 2.36	31.21 / 31.21	81 M	PP-OCRv5_rec is a new generation of text recognition model. This model is committed to efficiently and accurately supporting four major languages, namely Simplified Chinese, Traditional Chinese, English, and Japanese, as well as complex text scenarios such as handwriting, vertical text, pinyin, and rare characters with a single model. While maintaining recognition effectiveness, it also takes into account inference speed and model robustness, providing efficient and accurate technical support for document understanding in various scenarios.
PP-OCRv5_mobile_rec	Inference model/Training model	81.29	5.43 / 1.46	21.20 / 5.32	16 M
PP-OCRv4_server_rec_doc	Inference model/Training model	86.58	8.69 / 2.78	37.93 / 37.93	74.7 M	PP-OCRv4_server_rec_doc is trained on a mixed dataset of more Chinese document data and PP-OCR training data based on PP-OCRv4_server_rec. It has enhanced the ability to recognize some traditional Chinese characters, Japanese characters, and special characters, and can support the recognition of over 15,000 characters. In addition to improving the document-related text recognition ability, it has also enhanced the general text recognition ability.
PP-OCRv4_mobile_rec	Inference model/Training model	78.74	5.26 / 1.12	17.48 / 3.61	10.6 M	A lightweight recognition model of PP-OCRv4 with high inference efficiency, which can be deployed on various hardware devices including edge devices.
PP-OCRv4_server_rec	Inference model/Training model	80.61	8.75 / 2.49	36.93 / 36.93	71.2 M	A server-side model of PP-OCRv4 with high inference accuracy, which can be deployed on various servers.
PP-OCRv3_mobile_rec	Inference model/Training model	72.96	3.89 / 1.16	8.72 / 3.56	9.2 M	A lightweight recognition model of PP-OCRv3 with high inference efficiency, which can be deployed on various hardware devices including edge devices.

Model	Model download link	Recognition Avg Accuracy(%)	GPU inference time (ms) [Normal mode / High-performance mode]	CPU inference time (ms) [Normal mode / High-performance mode]	Model storage size (M)	Introduction
ch_SVTRv2_rec	Inference model/Training model	68.81	10.38 / 8.31	66.52 / 30.83	73.9 M	SVTRv2 is a server-side text recognition model developed by the OpenOCR team of the Vision and Learning Lab (FVL) at Fudan University. It won the first prize in the PaddleOCR Algorithm Model Challenge - Task 1: OCR End-to-End Recognition Task, with a 6% improvement in end-to-end recognition accuracy on Leaderboard A compared to PP-OCRv4.

Model	Model download link	Recognition Avg Accuracy(%)	GPU inference time (ms) [Normal mode / High-performance mode]	CPU inference time (ms) [Normal mode / High-performance mode]	Model storage size (M)	Introduction
ch_RepSVTR_rec	Inference model/Training model	65.07	6.29 / 1.57	20.64 / 5.40	22.1 M	RepSVTR is a mobile-side text recognition model based on SVTRv2. It won the first prize in the PaddleOCR Algorithm Model Challenge - Task 1: OCR End-to-End Recognition Task, with a 2.5% improvement in end-to-end recognition accuracy on Leaderboard B compared to PP-OCRv4, while maintaining the same inference speed.

*English recognition model

Model	Model download link	Recognition Avg Accuracy(%)	GPU inference time (ms) [Normal mode / High-performance mode]	CPU inference time (ms) [Normal mode / High-performance mode]	Model storage size (M)	Introduction
en_PP-OCRv4_mobile_rec	Inference model/Training model	70.39	4.81 / 1.23	17.20 / 4.18	6.8 M	An ultra-lightweight English recognition model trained based on the PP-OCRv4 recognition model, supporting English and number recognition
en_PP-OCRv3_mobile_rec	Inference model/Training model	70.69	3.56 / 0.78	8.44 / 5.78	7.8 M	An ultra-lightweight English recognition model trained based on the PP-OCRv3 recognition model, supporting English and number recognition

*Multilingual recognition model

Model	Model download link	Avg Accuracy of recognition (%)	GPU inference time (ms) [Normal mode / High-performance mode]	CPU inference time (ms) [Normal mode / High-performance mode]	Model storage size (M)	Introduction
korean_PP-OCRv3_mobile_rec	Inference model/Training model	60.21	3.73 / 0.98	8.76 / 2.91	8.6 M	An ultra-lightweight Korean recognition model trained based on the PP-OCRv3 recognition model, supporting Korean and digit recognition
japan_PP-OCRv3_mobile_rec	Inference model/Training model	45.69	3.86 / 1.01	8.62 / 2.92	8.8 M	An ultra-lightweight Japanese recognition model trained based on the PP-OCRv3 recognition model, supporting Japanese and digit recognition
chinese_cht_PP-OCRv3_mobile_rec	Inference model/Training model	82.06	3.90 / 1.16	9.24 / 3.18	9.7 M	An ultra-lightweight traditional Chinese recognition model trained based on the PP-OCRv3 recognition model, supporting traditional Chinese and digit recognition
te_PP-OCRv3_mobile_rec	Inference model/Training model	95.88	3.59 / 0.81	8.28 / 6.21	7.8 M	An ultra-lightweight Telugu recognition model trained based on the PP-OCRv3 recognition model, supporting Telugu and digit recognition
ka_PP-OCRv3_mobile_rec	Inference model/Training model	96.96	3.49 / 0.89	8.63 / 2.77	8.0 M	An ultra-lightweight Kannada recognition model trained based on the PP-OCRv3 recognition model, supporting Kannada and digit recognition
ta_PP-OCRv3_mobile_rec	Inference model/Training model	76.83	3.49 / 0.86	8.35 / 3.41	8.0 M	An ultra-lightweight Tamil recognition model trained based on the PP-OCRv3 recognition model, supporting Tamil and digit recognition
latin_PP-OCRv3_mobile_rec	Inference model/Training model	76.93	3.53 / 0.78	8.50 / 6.83	7.8 M	An ultra-lightweight Latin recognition model trained based on the PP-OCRv3 recognition model, supporting Latin and digit recognition
arabic_PP-OCRv3_mobile_rec	Inference model/Training model	73.55	3.60 / 0.83	8.44 / 4.69	7.8 M	An ultra-lightweight Arabic alphabet recognition model trained based on the PP-OCRv3 recognition model, supporting Arabic alphabet and digit recognition
cyrillic_PP-OCRv3_mobile_rec	Inference model/Training model	94.28	3.56 / 0.79	8.22 / 2.76	7.9 M	An ultra-lightweight Slavic alphabet recognition model trained based on the PP-OCRv3 recognition model, supporting Slavic alphabet and digit recognition
devanagari_PP-OCRv3_mobile_rec	Inference model/Training model	96.44	3.60 / 0.78	6.95 / 2.87	7.9 M	An ultra-lightweight Sanskrit alphabet recognition model trained based on the PP-OCRv3 recognition model, supporting Sanskrit alphabet and digit recognition

Text line direction classification module (optional):

Model	Model download link	Top-1 Acc (%)	GPU inference time (ms) [Normal mode / High-performance mode]	CPU inference time (ms) [Normal mode / High-performance mode]	Model storage size (M)	Introduction
PP-LCNet_x0_25_textline_ori	Inference model/Training model	95.54	2.16 / 0.41	2.37 / 0.73	0.32	A text line classification model based on PP-LCNet_x0_25, with two categories, namely 0 degrees and 180 degrees

Formula recognition module:

Model	Model download link	Avg-BLEU(%)	GPU inference time (ms) [Normal mode / High-performance mode]	CPU inference time (ms) [Normal mode / High-performance mode]	Model storage size (M)	Introduction
UniMERNet	Inference model/Training model	86.13	2266.96/-	-/-	1.4 G	UniMERNet is a formula recognition model developed by Shanghai AI Lab. It uses Donut Swin as the encoder and MBartDecoder as the decoder. By training on a dataset of one million entries that includes simple formulas, complex formulas, scanned formulas, and handwritten formulas, the model significantly improves its recognition accuracy for formulas in real-world scenarios.	PP-FormulaNet-S	Inference model/Training model	87.12	1311.84 / 1311.84	- / 8288.07	167.9 M	PP-FormulaNet is an advanced formula recognition model developed by Baidu PaddlePaddle's vision team, supporting the recognition of 50,000 common LaTeX source code vocabulary. The PP-FormulaNet-S version employs PP-HGNetV2-B4 as its backbone network. Through techniques such as parallel masking and model distillation, it significantly enhances the model's inference speed while maintaining high recognition accuracy, suitable for scenarios like simple printed formulas and simple multi-line printed formulas. The PP-FormulaNet-L version, on the other hand, is based on Vary_VIT_B as its backbone network and has undergone in-depth training on a large-scale formula dataset. It shows significant improvement in recognizing complex formulas compared to PP-FormulaNet-S and is suitable for scenarios like simple printed formulas, complex printed formulas, and handwritten formulas.	PP-FormulaNet-L	Inference model/Training model	92.13	1976.52/-	-/-	535.2 M	LaTeX_OCR_rec	Inference model/Training model	71.63	1088.89 / 1088.89	- / -	89.7 M	LaTeX-OCR is a formula recognition algorithm based on an autoregressive large model. By adopting Hybrid ViT as the backbone network and transformer as the decoder, it significantly improves the accuracy of formula recognition.

Seal text detection module:

Model	Model download link	Detection Hmean (%)	GPU inference time (ms) [Normal mode / High-performance mode]	CPU inference time (ms) [Normal mode / High-performance mode]	Model storage size (M)	Introduction
PP-OCRv4_server_seal_det	Inference model/Training model	98.21	124.64 / 91.57	545.68 / 439.86	109	PP-OCRv4's server-side seal text detection model with higher accuracy, suitable for deployment on better servers
PP-OCRv4_mobile_seal_det	Inference model/Training model	96.47	9.70 / 3.56	50.38 / 19.64	4.6	PP-OCRv4's mobile-side seal text detection model with higher efficiency, suitable for deployment on the end side

Test environment description:

Performance test environment
- Test dataset:
  - Document image orientation classification model: A self-built dataset by PaddleX, covering multiple scenarios such as certificates and documents, containing 1000 images.
  - Text image unwarping model:DocUNet.
  - Layout area detection model: The self-built layout area analysis dataset of PaddleOCR, which includes 10,000 common document images such as Chinese and English papers, magazines, and research reports.
  - PP-DocLayout_plus-L: The self-built layout area detection dataset of PaddleOCR, which includes 1,300 document images such as Chinese and English papers, magazines, newspapers, research reports, PPTs, examination papers, and textbooks.
  - Table structure recognition model: The self-built English table recognition dataset within PaddleX.
  - Text detection model: The self-built Chinese dataset of PaddleOCR, covering multiple scenarios such as street views, web images, documents, and handwriting, with 500 images for detection.
  - Chinese recognition model: The self-built Chinese dataset of PaddleOCR, covering multiple scenarios such as street views, web images, documents, and handwriting, with 11,000 images for text recognition.
  - ch_SVTRv2_rec:PaddleOCR Algorithm Model Challenge - Task 1: OCR End-to-End Recognition TaskEvaluation set for Leaderboard A.
  - ch_RepSVTR_rec:PaddleOCR Algorithm Model Challenge - Task 1: OCR End-to-End Recognition TaskEvaluation set for Leaderboard B.
  - English recognition model: The self-built English dataset of PaddleX.
  - Multilingual recognition model: The self-built multilingual dataset of PaddleX.
  - Text line direction classification model: The self-built dataset of PaddleX, covering multiple scenarios such as certificates and documents, with 1,000 images.
  - Seal text detection model: The self-built dataset of PaddleX, which includes 500 images of round seals.
- Hardware configuration:
  - GPU: NVIDIA Tesla T4
  - CPU: Intel Xeon Gold 6271C @ 2.60GHz
  - Other environments: Ubuntu 20.04 / CUDA 11.8 / cuDNN 8.9 / TensorRT 8.6.1.6
Description of inference modes

Modes	GPU configuration	CPU configuration	Combination of acceleration technologies
Regular mode	FP32 precision / no TRT acceleration	FP32 precision / 8 threads	PaddleInference
High-performance mode	Select the optimal combination of prior precision type and acceleration strategy	FP32 precision / 8 threads	Select the optimal prior backend (Paddle/OpenVINO/TRT, etc.)

2. Quick Start¶

Before using the PP-DocTranslation pipeline locally, please ensure that you have completed the installation of the wheel package according to the Installation Tutorial.

Please note: If you encounter issues such as the program becoming unresponsive, unexpected program termination, running out of memory resources, or extremely slow inference during execution, please try adjusting the configuration according to the documentation, such as disabling unnecessary features or using lighter-weight models.

Before use, you need to prepare the API key for a large language model, which supports the Baidu Cloud Qianfan Platform or local large model services that comply with the OpenAI interface standards.

2.1 Experience via Command Line¶

You can download the test file and quickly experience the pipeline effect with a single command:

paddleocr pp_doctranslation -i vehicle_certificate-1.png --target_language en --qianfan_api_key your_api_key

The command line supports more parameter settings. Click to expand for detailed descriptions of command line parameters.

Parameter	Description	Parameter Type	Default Value
`input`	Data to be predicted, required. For example, the local path of an image file or PDF file:`/root/data/img.jpg`;Or a URL link, such as the network URL of an image file or PDF file:Example;Or a local directory, which should contain the images to be predicted, such as the local path:`/root/data/`(Currently, prediction for PDF files within a directory is not supported. PDF files need to be specified to a specific file path).	`str`
`save_path`	Specify the path where the inference result file will be saved. If not set, the inference result will not be saved locally.	`str`
`target_language`	Target language (ISO 639-1 language code).	`str`	`zh`
`layout_detection_model_name`	The model name for layout area detection. If not set, the default model of the pipeline will be used.	`str`
`layout_detection_model_dir`	The directory path of the layout area detection model. If not set, the official model will be downloaded.	`str`
`layout_threshold`	The score threshold for the layout model.`Any floating-point number between` 0-1`. If not set, the parameter value initialized by the pipeline will be used, which is initialized to` 0.5	`by default.`
`float`	Whether to use post-processing NMS for layout detection. If not set, the parameter value initialized by the pipeline will be used, and the default initialization is `True`.	`bool`
`layout_unclip_ratio`	The expansion coefficient of the detection box for the layout area detection model. `Any floating-point number greater than` 0`. If not set, the parameter value initialized by the pipeline will be used, and the default initialization is` 1.0	`.`
`float`	layout_merge_bboxes_mode The merging processing mode for the detection boxes output by the model in layout detection.large , when set to large, it means that among the detection boxes output by the model, for the detection boxes that overlap and contain each other, only the largest outer box is retained, and the overlapping inner boxes are deleted;small , when set to small, it means that among the detection boxes output by the model, for the detection boxes that overlap and contain each other, only the small inner box that is contained is retained, and the overlapping outer boxes are deleted;union , no filtering processing is performed on the boxes, and both inner and outer boxes are retained;`If not set, the parameter value initialized by the pipeline will be used, and the default initialization is` large	`.`
`str`	chart_recognition_model_name	`The model name for chart parsing. If not set, the default model of the pipeline will be used.`
`str`	chart_recognition_model_dir	`The directory path for the chart parsing model. If not set, the official model will be downloaded.`
`str`	chart_recognition_batch_size`The batch size for the chart parsing model. If not set, the batch size will be set to` 。	`int`
`region_detection_model_name`	Name of the model for detecting submodules of document image layout. If not set, the default model in the pipeline will be used.	`str`
`region_detection_model_dir`	Directory path of the model for detecting submodules of document image layout. If not set, the official model will be downloaded.	`str`
`doc_orientation_classify_model_name`	Name of the model for document orientation classification. If not set, the default model in the pipeline will be used.	`str`
`doc_orientation_classify_model_dir`	Directory path of the model for document orientation classification. If not set, the official model will be downloaded.	`str`
`doc_unwarping_model_name`	Name of the model for text image unwarping. If not set, the default model in the pipeline will be used.	`str`
`doc_unwarping_model_dir`	Directory path of the model for text image unwarping. If not set, the official model will be downloaded.	`str`
`text_detection_model_name`	Name of the model for text detection. If not set, the default model in the pipeline will be used.	`str`
`text_detection_model_dir`	Directory path of the model for text detection. If not set, the official model will be downloaded.	`str`
`text_det_limit_side_len`	Limit on the side length of the image for text detection. Any integer greater than `0`. If not set, the parameter value initialized in the pipeline will be used, and the default initialization value is `960`。	`int`
`text_det_limit_type`	Type of image side length limit for text detection. It supports`min`and`max`,`min`means ensuring that the shortest side of the image is not less than`det_limit_side_len`,`max`means ensuring that the longest side of the image is not greater than`limit_side_len`. If not set, the parameter value initialized by the pipeline will be used, and the default initialization is`max`.	`str`
`text_det_thresh`	Detection pixel threshold. Only pixels with scores greater than this threshold in the output probability map will be considered as text pixels. Any floating-point number greater than`0`. If not set, the parameter value initialized by the pipeline will be used by default,`0.3`.	`float`
`text_det_box_thresh`	Detection box threshold. When the average score of all pixels within the detection result border is greater than this threshold, the result will be considered as a text area. Any floating-point number greater than`0`. If not set, the parameter value initialized by the pipeline will be used by default,`0.6`.	`float`
`text_det_unclip_ratio`	Text detection expansion coefficient. This method is used to expand the text area. The larger the value, the larger the expanded area. Any floating-point number greater than`0`. If not set, the parameter value initialized by the pipeline will be used by default,`2.0`.	`float`
`textline_orientation_model_name`	Name of the text line orientation model. If not set, the default model in the pipeline will be used.	`str`
`textline_orientation_model_dir`	Directory path of the text line orientation model. If not set, the official model will be downloaded.	`str`
`textline_orientation_batch_size`	Batch size of the text line orientation model. If not set, the batch size will be set to `1` by default.	`int`
`text_recognition_model_name`	Name of the text recognition model. If not set, the default model in the pipeline will be used.	`str`
`text_recognition_model_dir`	Directory path of the text recognition model. If not set, the official model will be downloaded.	`str`
`text_recognition_batch_size`	Batch size of the text recognition model. If not set, the batch size will be set to `1` by default.	`int`
`text_rec_score_thresh`	Text recognition threshold. Text results with scores greater than this threshold will be retained. `Any floating-point number greater than` 0`. If not set, the parameter value initialized in the pipeline,` 0.0	`, will be used by default. That is, no threshold is set.`
`float`	table_classification_model_name	`Name of the table classification model. If not set, the default model in the pipeline will be used.`
`table_classification_model_dir`	The directory path of the table classification model. If not set, the official model will be downloaded.	`str`
`wired_table_structure_recognition_model_name`	The name of the wired table structure recognition model. If not set, the default model in the pipeline will be used.	`str`
`wired_table_structure_recognition_model_dir`	The directory path of the wired table structure recognition model. If not set, the official model will be downloaded.	`str`
`wireless_table_structure_recognition_model_name`	The name of the wireless table structure recognition model. If not set, the default model in the pipeline will be used.	`str`
`wireless_table_structure_recognition_model_dir`	The directory path of the wireless table structure recognition model. If not set, the official model will be downloaded.	`str`
`wired_table_cells_detection_model_name`	The name of the wired table cells detection model. If not set, the default model in the pipeline will be used.	`str`
`wired_table_cells_detection_model_dir`	The directory path of the wired table cells detection model. If not set, the official model will be downloaded.	`str`
`wireless_table_cells_detection_model_name`	The name of the wireless table cells detection model. If not set, the default model in the pipeline will be used.	`str`
`wireless_table_cells_detection_model_dir`	Directory path of the wireless table cell detection model. If not set, the official model will be downloaded.	`str`
`table_orientation_classify_model_name`	Name of the table orientation classification model. If not set, the default model in the pipeline will be used.	`str`
`table_orientation_classify_model_dir`	Directory path of the table orientation classification model. If not set, the official model will be downloaded.	`str`
`seal_text_detection_model_name`	Name of the seal text detection model. If not set, the default model in the pipeline will be used.	`str`
`seal_text_detection_model_dir`	Directory path of the seal text detection model. If not set, the official model will be downloaded.	`str`
`seal_det_limit_side_len`	Limit on the side length of the image for seal text detection. `Any integer greater than` 0`. If not set, the parameter value initialized in the pipeline will be used, which is initialized to` 736	`by default.`
`int`	seal_det_limit_type`Type of the side length limit for seal text detection image. Supports` min `and` max`, where` min `means ensuring that the shortest side of the image is not less than` det_limit_side_len`, and` max`limit_side_len`. If not set, the parameter value initialized by the pipeline will be used, and the default initialization is `min`.	`str`
`seal_det_thresh`	Detection pixel threshold. Only pixels with scores greater than this threshold in the output probability map will be considered as text pixels. Any floating-point number greater than `0`. If not set, the parameter value initialized by the pipeline will be used by default, which is `0.2`.	`float`
`seal_det_box_thresh`	Detection box threshold. When the average score of all pixels within the bounding box of the detection result is greater than this threshold, the result will be considered as a text region. Any floating-point number greater than `0`. If not set, the parameter value initialized by the pipeline will be used by default, which is `0.6`.	`float`
`seal_det_unclip_ratio`	Expansion coefficient for seal text detection. This method is used to expand the text region. The larger the value, the larger the expanded area. Any floating-point number greater than `0`. If not set, the parameter value initialized by the pipeline will be used by default, which is `0.5`.	`float`
`seal_text_recognition_model_name`	Name of the seal text recognition model. If not set, the default model of the pipeline will be used.	`str`
`seal_text_recognition_model_dir`	Directory path of the seal text recognition model. If not set, the official model will be downloaded.	`str`
`seal_text_recognition_batch_size`	The batch size of the seal text recognition model. If not set, the batch size will be set to `1` by default.	`int`
`seal_rec_score_thresh`	Text recognition threshold. Text results with scores greater than this threshold will be retained. `Any floating-point number greater than` 0`. If not set, the parameter value initialized by the pipeline will be used by default, which is` 0.0	`. That is, no threshold is set.`
`float`	formula_recognition_model_name	`The name of the formula recognition model. If not set, the default model of the pipeline will be used.`
`str`	formula_recognition_model_dir	`The directory path of the formula recognition model. If not set, the official model will be downloaded.`
`str`	formula_recognition_batch_size`The batch size of the formula recognition model. If not set, the batch size will be set to` 1	`by default.`
`int`	use_doc_orientation_classify	`Whether to use the document orientation classification module.`	`bool`
`False`	use_doc_unwarping	`Whether to use the text image unwarping module.`	`bool`
`False`	use_textline_orientation`Whether to load and use the text line orientation classification module. If not set, the parameter value initialized by the pipeline will be used, which is initialized to` True	`by default.`
`use_seal_recognition`	Whether to load and use the seal text recognition sub-pipeline. If not set, the parameter value initialized by the pipeline will be used, and the default initialization is `True`.	`bool`
`use_table_recognition`	Whether to load and use the table recognition sub-pipeline. If not set, the parameter value initialized by the pipeline will be used, and the default initialization is `True`.	`bool`
`use_formula_recognition`	Whether to load and use the formula recognition sub-pipeline. If not set, the parameter value initialized by the pipeline will be used, and the default initialization is `True`.	`bool`
`use_chart_recognition`	Whether to use the chart parsing module.	`bool`	`False`
`use_region_detection`	Whether to load and use the document region detection sub-pipeline. If not set, the parameter value initialized by the pipeline will be used, and the default initialization is `True`.	`bool`
`device`	The device used for inference. It supports specifying a specific card number: CPU: For example, `cpu` means using CPU for inference; GPU: For example, `gpu:0` means using the first GPU for inference; NPU: For example, `npu:0` means using the first NPU for inference; XPU: For example, `xpu:0`Indicates the use of the first XPU for inference; MLU: e.g.,`mlu:0`Indicates the use of the first MLU for inference; DCU: e.g.,`dcu:0`Indicates the use of the first DCU for inference; If not set, the parameter value initialized by the pipeline will be used by default. During initialization, the local GPU device 0 will be used preferentially. If not available, the CPU device will be used.	`str`
`enable_hpi`	Whether to enable high-performance inference.	`bool`	`False`
`use_tensorrt`	Whether to enable the TensorRT subgraph engine of Paddle Inference. If the model does not support acceleration via TensorRT, acceleration will not be used even if this flag is set. For PaddlePaddle with CUDA 11.8, the compatible TensorRT version is 8.x (x>=6), and it is recommended to install TensorRT 8.6.1.6. For PaddlePaddle with CUDA 12.6, the compatible TensorRT version is 10.x (x>=5), and it is recommended to install TensorRT 10.5.0.18.	`bool`	`False`
`precision`	Computational precision, such as fp32, fp16.	`str`	`fp32`
`enable_mkldnn`	Whether to enable MKL-DNN accelerated inference. If MKL-DNN is not available or the model does not support acceleration via MKL-DNN, acceleration will not be used even if this flag is set.	`bool`	`True`
`mkldnn_cache_capacity`	MKL-DNN cache capacity.	`int`	`10`
`cpu_threads`	Number of threads used for inference on CPU.	`int`	`8`
`paddlex_config`	Path to the PaddleX pipeline configuration file.	`str`

The execution results will be printed to the terminal.

2.2 Integration via Python Script¶

The command-line method is for quickly experiencing and viewing the results. Generally, in projects, integration via code is often required. You can download the test file and use the following sample code for inference:

from paddlex import create_pipeline
# Create a translation pipeline
pipeline = create_pipeline(pipeline="PP-DocTranslation")

# Document path
input_path = "document_sample.pdf"

# Output directory
output_path = "./output"

# Large model configuration
chat_bot_config = {
    "module_name": "chat_bot",
    "model_name": "ernie-3.5-8k",
    "base_url": "https://qianfan.baidubce.com/v2",
    "api_type": "openai",
    "api_key": "api_key",  # your api_key
}

if input_path.lower().endswith(".md"):
    # Read markdown documents, supporting passing in directories and url links with the .md suffix
    ori_md_info_list = pipeline.load_from_markdown(input_path)
else:
    # Use PP-StructureV3 to perform layout parsing on PDF/image documents to obtain markdown information
    visual_predict_res = pipeline.visual_predict(
        input_path,
        use_doc_orientation_classify=False,
        use_doc_unwarping=False,
        use_common_ocr=True,
        use_seal_recognition=True,
use_table_recognition=True,
    )

    ori_md_info_list = []
    for res in visual_predict_res:
        layout_parsing_result = res["layout_parsing_result"]
        ori_md_info_list.append(layout_parsing_result.markdown)
        layout_parsing_result.save_to_img(output_path)
        layout_parsing_result.save_to_markdown(output_path)

    # Concatenate the markdown information of multi-page documents into a single markdown file, and save the merged original markdown text
    if input_path.lower().endswith(".pdf"):
        ori_md_info = pipeline.concatenate_markdown_pages(ori_md_info_list)
        ori_md_info.save_to_markdown(output_path)

# Perform document translation (target language: English)
tgt_md_info_list = pipeline.translate(
    ori_md_info_list=ori_md_info_list,
    target_language="en",
    chunk_size=5000,
    chat_bot_config=chat_bot_config,
)
# Save the translation results
for tgt_md_info in tgt_md_info_list:
    tgt_md_info.save_to_markdown(output_path)

After executing the above code, you will obtain the parsed results of the original document to be translated, the Markdown file of the original text to be translated, and the Markdown file of the translated document, all saved in the output directory.

The process, API description, and output description of PP-DocTranslation prediction are as follows:

(1) CallPPDocTranslationInstantiate a PP-DocTranslation pipeline object.

The descriptions of relevant parameters are as follows:

Parameter	Description	Parameter Type	Default Value
`layout_detection_model_name`	The model name for layout area detection. If set to `None`, the default model of the pipeline will be used.	`str\|None`	`None`
`layout_detection_model_dir`	The directory path of the layout area detection model. If set to `None`, the official model will be downloaded.	`str\|None`	`None`
`layout_threshold`	The score threshold for the layout model. float:`Any floating-point number between` 0-1 ;dict`:`{0:0.1} where the key is the class ID and the value is the threshold for that class;None`: If set to` None`, the parameter value initialized by the pipeline will be used, which is initialized to` 0.5	`by default.`	`float\|dict\|None`
`None`	layout_nms`Whether to use post-processing NMS for layout detection. If set to` None`, the parameter value initialized by the pipeline will be used, which is initialized to` True	`by default.`	`bool\|None`
`layout_unclip_ratio`	Expansion coefficient of the detection box for the layout area detection model. float: any floating-point number greater than `0`; Tuple[float,float]: expansion coefficients in the horizontal and vertical directions respectively; dict, where the key of the dict is of int type, representing `cls_id`, and the value is of tuple type, such as `{0: (1.1, 2.0)}`, indicating that the center of the detection box for category 0 output by the model remains unchanged, with the width expanded by 1.1 times and the height expanded by 2.0 times; None: if set to `None`, the parameter value initialized by the pipeline will be used, which is initialized to `1.0` by default.	`float\|Tuple[float,float]\|dict\|None`	`None`
`layout_merge_bboxes_mode`	Filtering method for overlapping boxes in layout area detection. str: `large`, `small`, `union`, indicating whether to retain the large box, small box, or both during overlapping box filtering, respectively; dict: the key of the dict is of int type, representing `cls_id`, and the value is of str type, such as `{0: "large", 2: "small"}`, which means using the large mode for detection boxes of category 0 and the small mode for detection boxes of category 2; None: If set to `None`, the parameter value initialized by the pipeline will be used, and the default initialization is `large`.	`str\|dict\|None`	`None`
`chart_recognition_model_name`	The model name for chart parsing. If set to `None`, the default model of the pipeline will be used.	`str\|None`	`None`
`chart_recognition_model_dir`	The directory path of the model for chart parsing. If set to `None`, the official model will be downloaded.	`str\|None`	`None`
`chart_recognition_batch_size`	The batch size of the model for chart parsing. If set to `None`, the batch size will be set to `1` by default.	`int\|None`	`None`
`region_detection_model_name`	The model name for detecting submodules of document image layout. If set to `None`, the default model of the pipeline will be used.	`str\|None`	`None`
`region_detection_model_dir`	The directory path of the model for detecting submodules of document image layout. If set to `None`, the official model will be downloaded.	`str\|None`	`None`
`doc_orientation_classify_model_name`	Name of the document orientation classification model. If set to `None`, the default model in the pipeline will be used.	`str\|None`	`None`
`doc_orientation_classify_model_dir`	Directory path of the document orientation classification model. If set to `None`, the official model will be downloaded.	`str\|None`	`None`
`doc_unwarping_model_name`	Name of the text image unwarping model. If set to `None`, the default model in the pipeline will be used.	`str\|None`	`None`
`doc_unwarping_model_dir`	Directory path of the text image unwarping model. If set to `None`, the official model will be downloaded.	`str\|None`	`None`
`text_detection_model_name`	Name of the text detection model. If set to `None`, the default model in the pipeline will be used.	`str\|None`	`None`
`text_detection_model_dir`	Directory path of the text detection model. If set to `None`, the official model will be downloaded.	`str\|None`	`None`
`text_det_limit_side_len`	Limit on the side length of the image for text detection. int: greater than`0`, any integer; None: if set to`None`, the parameter value initialized by the pipeline will be used, and the default initialization value is`960`.	`int\|None`	`None`
`text_det_limit_type`	The type of image side length limit for text detection. str: supports`min`and`max`, where`min`means ensuring that the shortest side of the image is not less than`det_limit_side_len`, and`max`means ensuring that the longest side of the image is not greater than`limit_side_len`; None: if set to`None`, the parameter value initialized by the pipeline will be used, and the default initialization value is`max`.	`str\|None`	`None`
`text_det_thresh`	Detection pixel threshold. Only pixels with scores greater than this threshold in the output probability map will be considered as text pixels. float: any floating-point number greater than`0`; None: if set to`None`, the parameter value initialized by the pipeline will be used by default,`0.3`.	`float\|None`	`None`
`text_det_box_thresh`	Detection box threshold: When the average score of all pixels within the detected bounding box is greater than this threshold, the result is considered a text region. float: any floating-point number greater than`0`; None: If set to`None`, the parameter value initialized by the pipeline, `0.6`, will be used by default.	`float\|None`	`None`
`text_det_unclip_ratio`	Text detection expansion coefficient. This method is used to expand the text region. The larger the value, the larger the expanded area. float: any floating-point number greater than`0`; None: If set to`None`, the parameter value initialized by the pipeline, `2.0`, will be used by default.	`float\|None`	`None`
`textline_orientation_model_name`	Name of the text line orientation model. If set to`None`, the default model of the pipeline will be used.	`str\|None`	`None`
`textline_orientation_model_dir`	Directory path of the text line orientation model. If set to`None`, the official model will be downloaded.	`str\|None`	`None`
`textline_orientation_batch_size`	Batch size of the text line orientation model. If set to`None`Set the default batch size to `1`.	`int\|None`	`None`
`text_recognition_model_name`	The name of the text recognition model. If set to `None`, the default model in the pipeline will be used.	`str\|None`	`None`
`text_recognition_model_dir`	The directory path of the text recognition model. If set to `None`, the official model will be downloaded.	`str\|None`	`None`
`text_recognition_batch_size`	The batch size of the text recognition model. If set to `None`, the default batch size will be set to `1`.	`int\|None`	`None`
`text_rec_score_thresh`	The threshold for text recognition. Text results with scores higher than this threshold will be retained. float: Any floating-point number greater than `0`; None: If set to `None`, the parameter value initialized by the pipeline, `0.0`, will be used by default, meaning no threshold will be set.	`float\|None`	`None`
`table_classification_model_name`	The name of the table classification model. If set to `None`, the default model in the pipeline will be used.	`str\|None`	`None`
`table_classification_model_dir`	The directory path of the table classification model. If set to `None`, the official model will be downloaded.	`str\|None`	`None`
`wired_table_structure_recognition_model_name`	The name of the wired table structure recognition model. If set to `None`, the default model in the pipeline will be used.	`str\|None`	`None`
`wired_table_structure_recognition_model_dir`	The directory path of the wired table structure recognition model. If set to `None`, the official model will be downloaded.	`str\|None`	`None`
`wireless_table_structure_recognition_model_name`	The name of the wireless table structure recognition model. If set to `None`, the default model in the pipeline will be used.	`str\|None`	`None`
`wireless_table_structure_recognition_model_dir`	The directory path of the wireless table structure recognition model. If set to `None`, the official model will be downloaded.	`str\|None`	`None`
`wired_table_cells_detection_model_name`	The name of the wired table cell detection model. If set to `None`, the default model in the pipeline will be used.	`str\|None`	`None`
`wired_table_cells_detection_model_dir`	The directory path of the wired table cell detection model. If set to `None`, the official model will be downloaded.	`str\|None`	`None`
`wireless_table_cells_detection_model_name`	The name of the wireless table cell detection model. If set to `None`, the default model in the pipeline will be used.	`str\|None`	`None`
`wireless_table_cells_detection_model_dir`	The directory path of the wireless table cell detection model. If set to `None`, the official model will be downloaded.	`str\|None`	`None`
`table_orientation_classify_model_name`	The name of the table orientation classification model. If set to `None`, the default model in the pipeline will be used.	`str\|None`	`None`
`table_orientation_classify_model_dir`	The directory path of the table orientation classification model. If set to `None`, the official model will be downloaded.	`str\|None`	`None`
`seal_text_detection_model_name`	The name of the seal text detection model. If set to `None`, the default model in the pipeline will be used.	`str\|None`	`None`
`seal_text_detection_model_dir`	The directory path of the seal text detection model. If set to `None`, the official model will be downloaded.	`str\|None`	`None`
`seal_det_limit_side_len`	The image side length limit for seal text detection. int: any integer greater than `0`; None: If set to `None`, the parameter value initialized by the pipeline will be used, and the default initialization value is `736`.	`int\|None`	`None`
`seal_det_limit_type`	The image side length limit type for seal text detection. str: supports `min` and `max`, where `min` indicates that the shortest side of the image is guaranteed to be no less than `det_limit_side_len`, and `max` indicates that the longest side of the image is guaranteed to be no greater than `limit_side_len`; None: If set to `None`, the parameter value initialized by the pipeline will be used, and the default initialization value is `min`.	`str\|None`	`None`
`seal_det_thresh`	The detection pixel threshold. Only pixels with scores greater than this threshold in the output probability map will be considered as text pixels. float: any floating-point number greater than `0`; None: if set to `None`, the parameter value initialized by the pipeline will be used by default, which is `0.2`.	`float\|None`	`None`
`seal_det_box_thresh`	Detection box threshold. When the average score of all pixels within the detected bounding box is greater than this threshold, the result is considered a text region. float: any floating-point number greater than `0`; None: if set to `None`, the parameter value initialized by the pipeline will be used by default, which is `0.6`.	`float\|None`	`None`
`seal_det_unclip_ratio`	Expansion coefficient for seal text detection. This method is used to expand the text region. The larger the value, the larger the expanded area. float: any floating-point number greater than `0`; None: if set to `None`, the parameter value initialized by the pipeline will be used by default, which is `0.5`.	`float\|None`	`None`
`seal_text_recognition_model_name`	Name of the seal text recognition model. If set to `None`, the default model of the pipeline will be used.	`str\|None`	`None`
`seal_text_recognition_model_dir`	Directory path of the seal text recognition model. If set to `None`, the official model will be downloaded.	`str\|None`	`None`
`seal_text_recognition_batch_size`	Batch size of the seal text recognition model. If set to `None`, the batch size will be set to `1` by default.	`int\|None`	`None`
`seal_rec_score_thresh`	Threshold for seal text recognition. Text results with scores higher than this threshold will be retained. float: any floating-point number greater than `0`; None: if set to `None`, the parameter value initialized by the pipeline, `0.0`, will be used by default, meaning no threshold is set.	`float\|None`	`None`
`formula_recognition_model_name`	Name of the formula recognition model. If set to `None`, the default model of the pipeline will be used.	`str\|None`	`None`
`formula_recognition_model_dir`	Directory path of the formula recognition model. If set to `None`, the official model will be downloaded.	`str\|None`	`None`
`formula_recognition_batch_size`	The batch size of the formula recognition model. If set to `None`, the batch size will be set to `1` by default.	`int\|None`	`None`
`use_doc_orientation_classify`	Whether to load and use the document orientation classification module. If set to `None`, the parameter value initialized by the pipeline will be used, and the default initialization is `True`.	`bool\|None`	`None`
`use_doc_unwarping`	Whether to load and use the text image unwarping module. If set to `None`, the parameter value initialized by the pipeline will be used, and the default initialization is `True`.	`bool\|None`	`None`
`use_textline_orientation`	Whether to load and use the text line orientation classification module. If set to `None`, the parameter value initialized by the pipeline will be used, and the default initialization is `True`.	`bool\|None`	`None`
`use_seal_recognition`	Whether to load and use the sub-pipeline for seal text recognition. If set to `None`, the parameter value initialized by the pipeline will be used, and the default initialization is `True`.	`bool\|None`	`None`
`use_table_recognition`	Whether to load and use the sub-pipeline for table recognition. If set to `None`The parameter value initialized by the pipeline will be used, and the default initialization is `True`.	`bool\|None`	`None`
`use_formula_recognition`	Whether to load and use the sub-pipeline for formula recognition. If set to `None`, the parameter value initialized by the pipeline will be used, and the default initialization is `True`.	`bool\|None`	`None`
`use_chart_recognition`	Whether to load and use the chart parsing module. If set to `None`, the parameter value initialized by the pipeline will be used, and the default initialization is `True`.	`bool\|None`	`None`
`use_region_detection`	Whether to load and use the sub-pipeline for document region detection. If set to `None`, the parameter value initialized by the pipeline will be used, and the default initialization is `True`.	`bool\|None`	`None`
`chat_bot_config`	Configuration information for the large language model. The configuration content is the following dict: `{ "module_name": "chat_bot", "model_name": "ernie-3.5-8k", "base_url": "https://qianfan.baidubce.com/v2", "api_type": "openai", "api_key": "api_key" # Please set this to the actual API key }`	`dict\|None`	`None`
`device`	Device for inference. Support specifying a specific card number: CPU: e.g.,`cpu`means using CPU for inference; GPU: e.g.,`gpu:0`means using the 1st GPU for inference; NPU: e.g.,`npu:0`means using the 1st NPU for inference; XPU: e.g.,`xpu:0`means using the 1st XPU for inference; MLU: e.g.,`mlu:0`means using the 1st MLU for inference; DCU: e.g.,`dcu:0`means using the 1st DCU for inference; None: If set to`None`, during initialization, the local GPU device 0 will be used preferentially. If not available, the CPU device will be used.	`str\|None`	`None`
`enable_hpi`	Whether to enable high-performance inference.	`bool`	`False`
`use_tensorrt`	Whether to enable the TensorRT subgraph engine of Paddle Inference. If the model does not support acceleration via TensorRT, acceleration will not be used even if this flag is set. For PaddlePaddle with CUDA 11.8, the compatible TensorRT version is 8.x (x>=6), and it is recommended to install TensorRT 8.6.1.6. For PaddlePaddle with CUDA 12.6, the compatible TensorRT version is 10.x (x>=5), and it is recommended to install TensorRT 10.5.0.18.	`bool`	`False`
`precision`	Computational precision, such as fp32, fp16.	`str`	`"fp32"`
`enable_mkldnn`	Whether to enable MKL-DNN for accelerated inference. If MKL-DNN is not available or the model does not support acceleration via MKL-DNN, acceleration will not be used even if this flag is set.	`bool`	`True`
`mkldnn_cache_capacity`	MKL-DNN cache capacity.	`int`	`10`
`cpu_threads`	The number of threads used for inference on the CPU.	`int`	`8`
`paddlex_config`	Path to the PaddleX pipeline configuration file.	`str\|None`	`None`

(2) Call the visual_predict()method of the PP-DocTranslation pipeline object to obtain visual prediction results. This method returns a list of results. Additionally, the pipeline also provides the visual_predict_iter()method. Both methods are identical in terms of parameter acceptance and result return. The difference is that visual_predict_iter()returns a generatorthat can process and obtain prediction results step by step, which is suitable for scenarios involving large datasets or where memory conservation is desired. Either of these two methods can be chosen based on actual needs. Below is visual_predict()Parameters of the method and their descriptions:

Parameter	Description	Parameter Type	Default Value
`input`	Data to be predicted, supporting multiple input types, required. Python Var: such as`numpy.ndarray`representing image data; str: such as the local path of an image file or PDF file:`/root/data/img.jpg`;such as URL links, such as the network URL of an image file or PDF file:Example;such as local directories, which should contain images to be predicted, such as the local path:`/root/data/`(Currently, prediction for PDF files within directories is not supported. PDF files need to be specified to their exact file paths); list: List elements should be of the aforementioned data types, such as`[numpy.ndarray, numpy.ndarray]`,`["/root/data/img1.jpg", "/root/data/img2.jpg"]`,`["/root/data1", "/root/data2"]`.	`Python Var\|str\|list`
`use_doc_orientation_classify`	Whether to use the document orientation classification module during inference. Setting it to`None`means using the instantiation parameter; otherwise, this parameter takes precedence.	`bool\|None`	`False`
`use_doc_unwarping`	Whether to use the text image unwarping module during inference. Set to `None` to use the instantiated parameter; otherwise, this parameter takes precedence.	`bool\|None`	`False`
`use_textline_orientation`	Whether to use the text line orientation classification module during inference. Set to `None` to use the instantiated parameter; otherwise, this parameter takes precedence.	`bool\|None`	`None`
`use_seal_recognition`	Whether to use the seal text recognition sub-pipeline during inference. Set to `None` to use the instantiated parameter; otherwise, this parameter takes precedence.	`bool\|None`	`None`
`use_table_recognition`	Whether to use the table recognition sub-pipeline during inference. Set to `None` to use the instantiated parameter; otherwise, this parameter takes precedence.	`bool\|None`	`None`
`use_formula_recognition`	Whether to use the formula recognition sub-pipeline during inference. Set to `None` to use the instantiated parameter; otherwise, this parameter takes precedence.	`bool\|None`	`None`
`use_chart_recognition`	Whether to use the chart parsing module. Set to `None` to use the instantiated parameter; otherwise, this parameter takes precedence.	`bool\|None`	`False`
`use_region_detection`	Whether to use the sub-pipeline for document region detection. Set to `None` to use the instantiation parameter; otherwise, this parameter takes precedence.	`bool\|None`	`None`
`layout_threshold`	The parameter meaning is basically the same as the instantiation parameter. Set to `None` to use the instantiation parameter; otherwise, this parameter takes precedence.	`float\|dict\|None`	`None`
`layout_nms`	The parameter meaning is basically the same as the instantiation parameter. Set to `None` to use the instantiation parameter; otherwise, this parameter takes precedence.	`bool\|None`	`None`
`layout_unclip_ratio`	The parameter meaning is basically the same as the instantiation parameter. Set to `None` to use the instantiation parameter; otherwise, this parameter takes precedence.	`float\|Tuple[float,float]\|dict\|None`	`None`
`layout_merge_bboxes_mode`	The parameter meaning is basically the same as the instantiation parameter. Set to `None` to use the instantiation parameter; otherwise, this parameter takes precedence.	`str\|dict\|None`	`None`
`text_det_limit_side_len`	The parameter meaning is basically the same as the instantiation parameter. Set to `None` to use the instantiation parameter; otherwise, this parameter takes precedence.	`int\|None`	`None`
`text_det_limit_type`	The parameter meaning is basically the same as the instantiation parameter. Set to `None`It indicates the use of instantiation parameters; otherwise, this parameter takes precedence.	`str\|None`	`None`
`text_det_thresh`	The parameter meaning is basically the same as the instantiation parameter. Set to`None`It indicates the use of instantiation parameters; otherwise, this parameter takes precedence.	`float\|None`	`None`
`text_det_box_thresh`	The parameter meaning is basically the same as the instantiation parameter. Set to`None`It indicates the use of instantiation parameters; otherwise, this parameter takes precedence.	`float\|None`	`None`
`text_det_unclip_ratio`	The parameter meaning is basically the same as the instantiation parameter. Set to`None`It indicates the use of instantiation parameters; otherwise, this parameter takes precedence.	`float\|None`	`None`
`text_rec_score_thresh`	The parameter meaning is basically the same as the instantiation parameter. Set to`None`It indicates the use of instantiation parameters; otherwise, this parameter takes precedence.	`float\|None`	`None`
`seal_det_limit_side_len`	The parameter meaning is basically the same as the instantiation parameter. Set to`None`It indicates the use of instantiation parameters; otherwise, this parameter takes precedence.	`int\|None`	`None`
`seal_det_limit_type`	The parameter meaning is basically the same as the instantiation parameter. Set to`None`It indicates the use of instantiation parameters; otherwise, this parameter takes precedence.	`str\|None`	`None`
`seal_det_thresh`	The parameter meaning is basically the same as the instantiation parameter. Set to `None`to use the instantiation parameter; otherwise, this parameter takes precedence.	`float\|None`	`None`
`seal_det_box_thresh`	The parameter meaning is basically the same as the instantiation parameter. Set to `None`to use the instantiation parameter; otherwise, this parameter takes precedence.	`float\|None`	`None`
`seal_det_unclip_ratio`	The parameter meaning is basically the same as the instantiation parameter. Set to `None`to use the instantiation parameter; otherwise, this parameter takes precedence.	`float\|None`	`None`
`seal_rec_score_thresh`	The parameter meaning is basically the same as the instantiation parameter. Set to `None`to use the instantiation parameter; otherwise, this parameter takes precedence.	`float\|None`	`None`
`use_wired_table_cells_trans_to_html`	Whether to enable direct conversion of wired table cell detection results to HTML. If enabled, HTML is constructed directly based on the geometric relationships of wired table cell detection results.	`bool`	`False`
`use_wireless_table_cells_trans_to_html`	Whether to enable direct conversion of wireless table cell detection results to HTML. If enabled, HTML is constructed directly based on the geometric relationships of wireless table cell detection results.	`bool`	`False`
`use_table_orientation_classify`	Whether to enable table orientation classification. When enabled, if the table in the image is rotated by 90/180/270 degrees, the orientation can be corrected and table recognition can be completed correctly.	`bool`	`True`
`use_ocr_results_with_table_cells`	Whether to enable cell-segmented OCR. When enabled, OCR detection results will be segmented and re-recognized based on cell prediction results to avoid missing text.	`bool`	`True`
`use_e2e_wired_table_rec_model`	Whether to enable the end-to-end wired table recognition mode. If enabled, the cell detection model will not be used, and only the table structure recognition model will be used.	`bool`	`False`
`use_e2e_wireless_table_rec_model`	Whether to enable the end-to-end wireless table recognition mode. If enabled, the cell detection model will not be used, and only the table structure recognition model will be used.	`bool`	`True`

(3) Processing visual prediction results: The prediction result for each sample is a corresponding Result object, and it supports operations such as printing, saving as an image, and saving as a json file:

Method	Method Description	Parameter	Parameter Type	Parameter Description	Default Value
`print()`	Print the result to the terminal	`format_json`	`bool`	Whether to use indentation formatting for the output content in `JSON` format	`True`
		`indent`	`int`	Specify the indentation level to beautify the output`JSON`data to make it more readable, valid only when`format_json`is`True`.	4
		`ensure_ascii`	`bool`	controls whether non-`ASCII`characters are escaped to`Unicode`. When set to`True`, all non-`ASCII`characters will be escaped;`False`will retain the original characters, valid only when`format_json`is`True`.	`False`
`save_to_json()`	Saves the result as a file in json format	`save_path`	`str`	The path where the file is saved. When it is a directory, the saved file name is consistent with the input file type name.	None
		`indent`	`int`	Specifies the indentation level to beautify the output`JSON`data to make it more readable, valid only when`format_json`is`True`.	4
		`ensure_ascii`	`bool`	controls whether non-`ASCII`characters are escaped to`Unicode`. When set to`True`, all non-`ASCII`characters will be escaped;`False`will retain the original characters, valid only when`format_json`Valid when`True`is set	`False`
`save_to_img()`	Saves the visualized images of each intermediate module in PNG format	`save_path`	`str`	The file path for saving, which supports directory or file path	None
`save_to_markdown()`	Saves each page of an image or PDF file as a separate file in markdown format	`save_path`	`str`	The file path for saving, which supports directory or file path	None
`save_to_html()`	Saves tables in a file as a file in html format	`save_path`	`str`	The file path for saving, which supports directory or file path	None
`save_to_xlsx()`	Saves tables in a file as a file in xlsx format	`save_path`	`str`	The file path for saving, which supports directory or file path	None

- Calling the `print()` method will print the results to the terminal, and the content printed to the terminal is explained as follows: - `input_path`: `(str)` The input path of the image or PDF to be predicted - `page_index`: `(Union[int, None])` If the input is a PDF file, it indicates which page of the PDF it is; otherwise, it is `None` - `model_settings`: `(Dict[str, bool])` Configure the model parameters required for the pipeline - `use_doc_preprocessor`: `(bool)` Controls whether to enable the document preprocessing sub-pipeline - `use_general_ocr`: `(bool)` Controls whether to enable the OCR sub-pipeline - `use_seal_recognition`: `(bool)` Controls whether to enable the seal recognition sub-pipeline - `use_table_recognition`: `(bool)` Controls whether to enable the table recognition sub-pipeline - `use_formula_recognition`: `(bool)` Controls whether to enable the formula recognition sub-pipeline - `doc_preprocessor_res`: `(Dict[str, Union[List[float], str]])` A dictionary of document preprocessing results, which only exists when `use_doc_preprocessor=True` - `input_path`: `(str)` The image path accepted by the document preprocessing sub-pipeline. When the input is `numpy.ndarray`, it is saved as `None`, and it is `None` here - `page_index`: `None`, as the input here is `numpy.ndarray`, so the value is `None` - `model_settings`: `(Dict[str, bool])` The model configuration parameters for the document preprocessing sub-pipeline - `use_doc_orientation_classify`: `(bool)` Controls whether to enable the document image orientation classification submodule. - `use_doc_unwarping`: `(bool)` Controls whether to enable the text image unwarping submodule. - `angle`: `(int)` The prediction result of the document image orientation classification submodule. Returns the actual angle value when enabled. - `parsing_res_list`: `(List[Dict])` A list of parsing results, where each element is a dictionary. The list is in the reading order after parsing. - `block_bbox`: `(np.ndarray)` The bounding box of the layout area. - `block_label`: `(str)` The label of the layout area, such as `text`, `table`, etc. - `block_content`: `(str)` The content within the layout area. - `seg_start_flag`: `(bool)` Indicates whether this layout area is the start of a paragraph. - `seg_end_flag`: `(bool)` Indicates whether this layout area is the end of a paragraph. - `sub_label`: `(str)` The sub-label of the layout area. For example, the sub-label of `text` might be `title_text`. - `sub_index`: `(int)` The sub-index of the layout area, used for restoring Markdown. - `index`: `(int)` The index of the layout area, used for displaying the layout sorting results. - `overall_ocr_res`: `(Dict[str, Union[List[str], List[float], numpy.ndarray]])` A dictionary of global OCR results. - `input_path`: `(Union[str, None])` The image path accepted by the image OCR sub-pipeline. When the input is `numpy.ndarray`, it is saved as `None`. - `page_index`: `None`. The input here is `numpy.ndarray`, so the value is `None`. - `model_settings`: `(Dict)` Model configuration parameters for the OCR sub-pipeline. - `dt_polys`: `(List[numpy.ndarray])` List of polygon bounding boxes for text detection. Each bounding box is represented by a numpy array consisting of 4 vertex coordinates, with an array shape of (4, 2) and a data type of int16. - `dt_scores`: `(List[float])` List of confidence scores for text detection bounding boxes. - `text_det_params`: `(Dict[str, Dict[str, int, float]])` Configuration parameters for the text detection module. - `limit_side_len`: `(int)` The side length limit value during image preprocessing. - `limit_type`: `(str)` The processing method for the side length limit. - `thresh`: `(float)` The confidence threshold for text pixel classification. - `box_thresh`: `(float)` The confidence threshold for text detection bounding boxes. - `unclip_ratio`: `(float)` The dilation coefficient for text detection bounding boxes. - `text_type`: `(str)` The type of text detection, currently fixed as "general". - `text_type`: `(str)` The type of text detection, currently fixed as "general". - `textline_orientation_angles`: `(List[int])` The prediction results for text line orientation classification. Returns the actual angle value when enabled (e.g., [0,0,1]) - `text_rec_score_thresh`: `(float)` The filtering threshold for text recognition results - `rec_texts`: `(List[str])` A list of text recognition results, containing only texts with confidence scores exceeding `text_rec_score_thresh` - `rec_scores`: `(List[float])` A list of confidence scores for text recognition, filtered by `text_rec_score_thresh` - `rec_polys`: `(List[numpy.ndarray])` A list of text detection bounding boxes filtered by confidence scores, with the same format as `dt_polys` - `formula_res_list`: `(List[Dict[str, Union[numpy.ndarray, List[float], str]]])` A list of formula recognition results, with each element being a dictionary - `rec_formula`: `(str)` The recognized formula result - `rec_polys`: `(numpy.ndarray)` The bounding box of the recognized formula, with a shape of (4, 2) and a dtype of int16 - `formula_region_id`: `(int)` The region number where the formula is located - `seal_res_list`: `(List[Dict[str, Union[numpy.ndarray, List[float], str]]])` A list of seal recognition results, with each element being a dictionary - `input_path`: `(str)` The input path of the seal image - `page_index`: `None`, as the input here is `numpy.ndarray`, so the value is `None` - `model_settings`: `(Dict)` Model configuration parameters for the seal recognition sub-pipeline - `dt_polys`: `(List[numpy.ndarray])` A list of detected bounding boxes for seals, with the same format as `dt_polys` - `text_det_params`: `(Dict[str, Dict[str, int, float]])` Configuration parameters for the seal detection module, with the same parameter meanings as above - `text_type`: `(str)` The type of seal detection, currently fixed as "seal" - `text_rec_score_thresh`: `(float)` The filtering threshold for seal recognition results - `rec_texts`: `(List[str])` A list of seal recognition results, containing only texts with confidence scores exceeding `text_rec_score_thresh` - `rec_scores`: `(List[float])` A list of confidence scores for seal recognition, filtered by `text_rec_score_thresh` - `rec_polys`: `(List[numpy.ndarray])` A list of detected bounding boxes for seals after confidence filtering, with the same format as `dt_polys` - `rec_boxes`: `(numpy.ndarray)` An array of rectangular bounding boxes for detected boxes, with a shape of (n, 4) and dtype of int16. Each row represents a rectangle - `table_res_list`: `(List[Dict[str, Union[numpy.ndarray, List[float], str]]])` A list of table recognition results, with each element being a dictionary - `cell_box_list`: `(List[numpy.ndarray])` A list of bounding boxes for table cells - `pred_html`: `(str)` An HTML-formatted string for the table - `table_ocr_pred`: `(dict)` OCR recognition results for the table - `rec_polys`: `(List[numpy.ndarray])` A list of detection bounding boxes for cells - `rec_texts`: `(List[str])` Recognition results for cells - `rec_scores`: `(List[float])` Recognition confidence scores for cells - `rec_boxes`: `(numpy.ndarray)` An array of rectangular bounding boxes for detection boxes, with a shape of (n, 4) and a dtype of int16. Each row represents a rectangle - Calling the `save_to_json()` method will save the above content to the specified `save_path`. If a directory is specified, the saved path will be `save_path/{your_img_basename}_res.json`. If a file is specified, it will be saved directly to that file. Since JSON files do not support saving numpy arrays, the `numpy.array` types within will be converted to list form. - Calling the `save_to_img()` method will save the visualization results to the specified `save_path`. If a directory is specified, it will save the visualization images for layout region detection, global OCR, layout reading order, etc. If a file is specified, it will be saved directly to that file. (The pipeline usually contains many result images, and it is not recommended to directly specify a specific file path; otherwise, multiple images will be overwritten, and only the last image will be retained.) - Calling the `save_to_markdown()` method will save the converted Markdown file to the specified `save_path`, with the saved file path being `save_path/{your_img_basename}.md`. If the input is a PDF file, it is recommended to directly specify a directory; otherwise, multiple Markdown files will be overwritten. - Calling the `concatenate_markdown_pages()` method combines the multi-page Markdown content `markdown_list` output by the PP-DocTranslation pipeline into a single complete document and returns the combined Markdown content.

(4) Calltranslate()method to perform document translation. This method returns the original markdown text and the translated text as a markdown object. You can save the required parts locally by executing thesave_to_markdown()method. Below are the parameter descriptions for thetranslate()method:

Parameter	Description	Parameter Type	Default Value
`ori_md_info_list`	A data list in the original Markdown format, containing the content to be translated. It must be a list composed of dictionaries, with each dictionary representing a document block.	`List[Dict]`	No default value (required)
`target_language`	Target language (ISO 639-1 language code, such as `"en"`/`"ja"`/`"fr"`).	`str`	`"zh"`
`chunk_size`	The character count threshold for chunking the text to be translated.	`int`	`5000`
`task_description`	Custom task description prompt.	`str\|None`	`None`
`output_format`	Specify the output format requirements, such as "maintain the original Markdown structure".	`str\|None`	`None`
`rules_str`	Custom translation rule description.	`str\|None`	`None`
`few_shot_demo_text_content`	Example text content for few-shot learning.	`str\|None`	`None`
`few_shot_demo_key_value_list`	Structured few-shot example data. Example data in key-value pair format, which can include a glossary of technical terms.	`str\|None`	`None`
`chat_bot_config`	Large language model configuration. Set to `None` to use instantiation parameters; otherwise, this parameter takes precedence.	`dict\|None`	`None`
`llm_request_interval`	The time interval, in seconds, for sending requests to the large language model. This parameter can be used to prevent overly frequent calls to the large language model.	`float`	`0`

3. Development Integration/Deployment¶

If the pipeline can meet your requirements for inference speed and accuracy, you can proceed directly with development integration/deployment.

If you need to directly apply the pipeline in your Python project, you can refer to the sample code in 2.2 Python Script Approach.

In addition, PaddleOCR also offers two other deployment methods, detailed as follows:

🚀 High-Performance Inference: In real-world production environments, many applications have stringent performance criteria (especially response speed) for deployment strategies to ensure efficient system operation and a smooth user experience. To this end, PaddleOCR provides high-performance inference capabilities, aiming to deeply optimize model inference and pre/post-processing, achieving significant acceleration in the end-to-end process. For detailed information on the high-performance inference process, please refer to High-Performance Inference.

☁️ Serving: Serving is a common deployment form in real-world production environments. By encapsulating inference functions as services, clients can access these services through network requests to obtain inference results. For detailed information on the pipeline serving process, please refer to Serving.

Below are the API references for basic serving and examples of multilingual service invocation:

API reference

Main operations provided by the service:

The HTTP request method is POST.
Both the request body and response body are JSON data (JSON objects).
When the request is processed successfully, the response status code is200, and the properties of the response body are as follows:

Name	Type	Meaning
`logId`	`string`	The UUID of the request.
`errorCode`	`integer`	Error code. Fixed as`0`.
`errorMsg`	`string`	Error description. Fixed as`"Success"`.
`result`	`object`	Operation result.

When the request is not processed successfully, the properties of the response body are as follows:

Name	Type	Meaning
`logId`	`string`	The UUID of the request.
`errorCode`	`integer`	Error code. Same as the response status code.
`errorMsg`	`string`	Error description.

The main operations provided by the service are as follows:

analyzeImages

Analyze images using computer vision models to obtain OCR, table recognition results, etc.

POST /doctrans-visual

The properties of the request body are as follows:

Name	Type	Meaning	Required
`file`	`string`	The URL of an image file or PDF file accessible to the server, or the Base64-encoded result of the content of the aforementioned file types. By default, for PDF files with more than 10 pages, only the first 10 pages will be processed. To remove the page limit, add the following configuration to the pipeline configuration file: `Serving: extra: max_num_input_imgs: null`	Yes
`fileType`	`integer`\|`null`	File type.`0`indicates a PDF file,`1`indicates an image file. If this property is not present in the request body, the file type will be inferred from the URL.	No
`useDocOrientationClassify`	`boolean`\|`null`	Refer to the description of the `use_doc_orientation_classify`parameter in the `predict`method of the pipeline object.	No
`useDocUnwarping`	`boolean`\|`null`	Refer to the description of the `use_doc_unwarping`parameter in the `predict`method of the pipeline object.	No
`useTextlineOrientation`	`boolean`\|`null`	Refer to the description of the `use_textline_orientation`parameter in the `predict`Parameter description.	No
`useSealRecognition`	`boolean`\|`null`	Refer to the parameter description of `use_seal_recognition` in the `predict` method of the pipeline object.	No
`useTableRecognition`	`boolean`\|`null`	Refer to the parameter description of `use_table_recognition` in the `predict` method of the pipeline object.	No
`useFormulaRecognition`	`boolean`\|`null`	Refer to the parameter description of `use_formula_recognition` in the `predict` method of the pipeline object.	No
`useChartRecognition`	`boolean`\|`null`	Refer to the parameter description of `use_chart_recognition` in the `predict` method of the pipeline object.	No
`useRegionDetection`	`boolean`\|`null`	Refer to the parameter description in the `predict` method of the pipeline object.`use_region_detection`Parameter description.	No
`layoutThreshold`	`number`\|`object`\|`null`	Refer to the parameter description of `layout_threshold` in the `predict` method of the pipeline object.	No
`layoutNms`	`boolean`\|`null`	Refer to the parameter description of `layout_nms` in the `predict` method of the pipeline object.	No
`layoutUnclipRatio`	`number`\|`array`\|`object`\|`null`	Refer to the parameter description of `layout_unclip_ratio` in the `predict` method of the pipeline object.	No
`layoutMergeBboxesMode`	`string`\|`object`\|`null`	Refer to the parameter description of `layout_merge_bboxes_mode` in the `predict` method of the pipeline object.	No
`textDetLimitSideLen`	`integer`\|`null`	Refer to the description of the `predict` method's `text_det_limit_side_len` parameter in the pipeline object.	No
`textDetLimitType`	`string`\|`null`	Refer to the description of the `predict` method's `text_det_limit_type` parameter in the pipeline object.	No
`textDetThresh`	`number`\|`null`	Refer to the description of the `predict` method's `text_det_thresh` parameter in the pipeline object.	No
`textDetBoxThresh`	`number`\|`null`	Refer to the description of the `predict` method's `text_det_box_thresh` parameter in the pipeline object.	No
`textDetUnclipRatio`	`number`\|`null`	Refer to the description of the `predict` method's `text_det_unclip_ratio` parameter in the pipeline object.	No
`textRecScoreThresh`	`number`\|`null`	Refer to the description of the `predict` method's `text_rec_score_thresh` parameter in the pipeline object.	No
`sealDetLimitSideLen`	`integer`\|`null`	Refer to the description of the `predict` method's `seal_det_limit_side_len` parameter in the pipeline object.	No
`sealDetLimitType`	`string`\|`null`	Refer to the description of the `predict` method's `seal_det_limit_type` parameter in the pipeline object.	No
`sealDetThresh`	`number`\|`null`	Refer to the description of the `predict` method's `seal_det_thresh` parameter in the pipeline object.	No
`sealDetBoxThresh`	`number`\|`null`	Refer to the description of the `predict` method's `seal_det_box_thresh` parameter in the pipeline object.	No
`sealDetUnclipRatio`	`number`\|`null`	Refer to the description of the `predict` method's `seal_det_unclip_ratio` parameter in the pipeline object.	No
`sealRecScoreThresh`	`number`\|`null`	Refer to the description of the `predict` method's `seal_rec_score_thresh` parameter in the pipeline object.	No
`useWiredTableCellsTransToHtml`	`boolean`	Refer to the description of the `predict` method's `use_wired_table_cells_trans_to_html` parameter in the pipeline object.	No
`useWirelessTableCellsTransToHtml`	`boolean`	Refer to the description of the `predict` method's `use_wireless_table_cells_trans_to_html` parameter in the pipeline object.	No
`useTableOrientationClassify`	`boolean`	Refer to the description of the `predict` method's `use_table_orientation_classify` parameter in the pipeline object.	No
`useOcrResultsWithTableCells`	`boolean`	See the description of the `use_ocr_results_with_table_cells`parameter for the `predict`method in the pipeline object.	No
`useE2eWiredTableRecModel`	`boolean`	See the description of the `use_e2e_wired_table_rec_model`parameter for the `predict`method in the pipeline object.	No
`useE2eWirelessTableRecModel`	`boolean`	See the description of the `use_e2e_wireless_table_rec_model`parameter for the `predict`method in the pipeline object.	No
`visualize`	`boolean`\|`null`	Whether to return visualization result charts and intermediate images during processing, etc. Pass in `true`: Return images. Pass in `false`: Do not return images. If this parameter is not provided in the request body or `null`is passed in: Follow the setting in the pipeline configuration file `Serving.visualize`. For example, add the following field in the pipeline configuration file: `Serving: visualize: False` Images will not be returned by default, and can be controlled by the `visualize`Parameters can override the default behavior. If neither the request body nor the configuration file is set (or `null` is passed in the request body and the configuration file is not set), the image is returned by default.	No

When the request is processed successfully, the result in the response body has the following properties:

Name	Type	Meaning
`layoutParsingResults`	`array`	Layout parsing results. The array length is 1 (for image input) or the actual number of processed document pages (for PDF input). For PDF input, each element in the array represents the result of each actual processed page in the PDF file in sequence.
`dataInfo`	`object`	Input data information.

Each element in layoutParsingResults is an object

with the following properties:	Name	Type
`Meaning`	`prunedResult`	object`A simplified version of the` res `field in the JSON representation of the` layout_parsing_result `generated by the` visual_predict `method of the` pipeline `object, where the` input_path
`and`	`page_index`	fields are removed.
`markdown`	`object`Markdown results.`outputImages`	object`img`property description. The image is in JPEG format and encoded with Base64.
`inputImage`	`string`\|`null`	Input image. The image is in JPEG format and encoded with Base64.

markdownis anobjectwith the following properties:

Name	Type	Meaning
`text`	`string`	Markdown text.
`images`	`object`	Key-value pairs of relative paths of Markdown images and Base64-encoded images.
`isStart`	`boolean`	Whether the first element on the current page is the start of a paragraph.
`isEnd`	`boolean`	Whether the last element on the current page is the end of a paragraph.

translate

Translate documents using a large model.

POST /doctrans-translate

The properties of the request body are as follows:

Name	Type	Meaning	Required
`markdownList`	`array`	List of Markdown documents to be translated. Can be obtained from the results of the `analyzeImages`operation.`The` images	property will not be used.
`Yes`	`targetLanguage`	string`Please refer to the` translate`target_language`Parameter description.	No
`chunkSize`	`integer`	See the parameter description of `chunk_size`for the `translate`method in the pipeline object.	No
`taskDescription`	`string`\|`null`	See the parameter description of `task_description`for the `translate`method in the pipeline object.	No
`outputFormat`	`string`\|`null`	See the parameter description of `output_format`for the `translate`method in the pipeline object.	No
`rulesStr`	`string`\|`null`	See the parameter description of `rules_str`for the `translate`method in the pipeline object.	No
`fewShotDemoTextContent`	`string`\|`null`	See the parameter description of `few_shot_demo_text_content`for the `translate`method in the pipeline object.	No
`fewShotDemoKeyValueList`	`string`\|`null`	Refer to the description of the `few_shot_demo_key_value_list` parameter in the `translate` method of the pipeline object.	No
`chatBotConfig`	`object`\|`null`	Refer to the description of the `chat_bot_config` parameter in the `translate` method of the pipeline object.	No
`llmRequestInterval`	`number`\|`null`	Refer to the description of the `llm_request_interval` parameter in the `translate` method of the pipeline object.	No

When the request is processed successfully, the result in the response body has the following properties:

Name	Type	Meaning
`translationResults`	`array`	Translation results.

Each element in translationResults is an object

with the following properties:	Name	Type
`Meaning`	`language`	string
`Target language.`	`markdown`	Markdown results. The object definition is consistent with the `analyzeImages` operation's returned `markdown`.

Note:

Including sensitive parameters such as the API key for large model calls in the request body may pose security risks. If not necessary, set these parameters in the configuration file and do not pass them during the request.

Example of multilingual service invocation

Python

import base64
import pathlib
import pprint
import sys

import requests


API_BASE_URL = "http://127.0.0.1:8080"

file_path = "./demo.jpg"
target_language = "en"

with open(file_path, "rb") as file:
    file_bytes = file.read()
    file_data = base64.b64encode(file_bytes).decode("ascii")

payload = {
    "file": file_data,
    "fileType": 1,
}
resp_visual = requests.post(url=f"{API_BASE_URL}/doctrans-visual", json=payload)
if resp_visual.status_code != 200:
    print(
        f"Request to doctrans-visual failed with status code {resp_visual.status_code}."
    )
    pprint.pp(resp_visual.json())
    sys.exit(1)
result_visual = resp_visual.json()["result"]

markdown_list = []
for i, res in enumerate(result_visual["layoutParsingResults"]):
    md_dir = pathlib.Path(f"markdown_{i}")
    md_dir.mkdir(exist_ok=True)
    (md_dir / "doc.md")
write_text(res["markdown"]["text"])
    for img_path, img in res["markdown"]["images"].items():
        img_path = md_dir / img_path
        img_path.parent.mkdir(parents=True, exist_ok=True)
        img_path.write_bytes(base64.b64decode(img))
    print(f"The Markdown document to be translated is saved at {md_dir / 'doc.md'}")
    del res["markdown"]["images"]
    markdown_list.append(res["markdown"])
    for img_name, img in res["outputImages"].items():
        img_path = f"{img_name}_{i}.jpg"
        with open(img_path, "wb") as f:
            f.write(base64.b64decode(img))
        print(f"Output image saved at {img_path}")

payload = {
    "markdownList": markdown_list,
"targetLanguage": target_language,
}
resp_translate = requests.post(url=f"{API_BASE_URL}/doctrans-translate", json=payload)
if resp_translate.status_code != 200:
    print(
        f"Request to doctrans-translate failed with status code {resp_translate.status_code}."
    )
    pprint.pprint(resp_translate.json())  # Corrected 'pp' to 'pprint' for proper function call
    sys.exit(1)
result_translate = resp_translate.json()["result"]

for i, res in enumerate(result_translate["translationResults"]):
    md_dir = pathlib.Path(f"markdown_{i}")
    (md_dir / "doc_translated.md").write_text(res["markdown"]["text"])
    print(f"Translated markdown document saved at {md_dir / 'doc_translated.md'}")

4. Secondary Development¶

If the default model weights provided by the PP-DocTranslation pipeline do not meet your accuracy or speed requirements in your scenario, you can try to useyour own data from specific domains or application scenariosto furtherfine-tunethe existing model to improve the recognition effect in your scenario.

4.1 Model Fine-tuning¶

Since the PP-DocTranslation pipeline contains several modules, if the performance of the model pipeline does not meet expectations, the issue may originate from any one of these modules. You can analyze cases with poor extraction results, use visualized images to determine which module has the problem, and refer to the corresponding fine-tuning tutorial links in the following table to fine-tune the model.

Scenario	Fine-tuning module	Fine-tuning reference link
Inaccurate detection of layout areas, such as failure to detect seals and tables	Layout area detection module	Link
Inaccurate recognition of table structures	Table structure recognition module	Link
Inaccurate recognition of formulas	Formula recognition module	Link
Omission in detecting seal texts	Seal text detection module	Link
Omission in detecting texts	Text detection module	Link
Inaccurate text content	Text recognition module	Link
Inaccurate correction of vertical or rotated text lines	Text line orientation classification module	Link
Inaccurate correction of whole image rotation	Document image orientation classification module	Link
Inaccurate correction of image distortion	Text image unwarping module	Fine-tuning is temporarily not supported

4.2 Model Application¶

After completing fine-tuning training with your private dataset, you can obtain a local model weight file. Then, you can use the fine-tuned model weights by customizing the pipeline configuration file.

Obtain the pipeline configuration file

You can call the export_paddlex_config_to_yaml method of the PP-DocTranslation pipeline object in PaddleOCR to export the current pipeline configuration to a YAML file:

from paddleocr import PPDocTranslation

pipeline = PPDocTranslation()
pipeline.export_paddlex_config_to_yaml("PP-DocTranslation.yaml")

Modify the configuration file

After obtaining the default pipeline configuration file, replace the local path of the fine-tuned model weights with the corresponding location in the pipeline configuration file. For example,

......
SubModules:
    TextDetection:
    module_name: text_detection
    model_name: PP-OCRv5_server_det
    model_dir: null # Replace with the path to the weights of the fine-tuned text detection model
    limit_side_len: 960
    limit_type: max
    thresh: 0.3
    box_thresh: 0.6
    unclip_ratio: 1.5

    TextRecognition:
    module_name: text_recognition
    model_name: PP-OCRv5_server_rec
    model_dir: null # Replace with the path to the weights of the fine-tuned text recognition model
    batch_size: 1
            score_thresh: 0
......

The pipeline configuration file not only includes parameters supported by PaddleOCR CLI and Python API but also allows for more advanced configurations. Detailed information can be found in the corresponding pipeline usage tutorial in the Overview of PaddleX Model Pipeline Usage. Refer to the detailed instructions therein and adjust the configurations according to your needs.

Load the pipeline configuration file in CLI

After modifying the configuration file, specify the path to the modified pipeline configuration file using the --paddlex_config parameter in the command line. PaddleOCR will then read its contents as the pipeline configuration. Here is an example:

paddleocr pp_doctranslation --paddlex_config PP-DocTranslation.yaml ...

Load the pipeline configuration file in the Python API

When initializing the pipeline object, you can pass the path of the PaddleX pipeline configuration file or a configuration dict through the paddlex_config parameter, and PaddleOCR will read its content as the pipeline configuration. The example is as follows:

from paddleocr import PPDocTranslation

pipeline = PPDocTranslation(paddlex_config="PP-DocTranslation.yaml")