PP-StructureV3 Production Line User Guide¶
1. Introduction to PP-StructureV3 Production Line¶
Layout analysis is a technique used to extract structured information from document images. It is primarily used to convert complex document layouts into machine-readable data formats. This technology has broad applications in document management, information extraction, and data digitization. Layout analysis combines Optical Character Recognition (OCR), image processing, and machine learning algorithms to identify and extract text blocks, titles, paragraphs, images, tables, and other layout elements from documents. This process generally includes three main steps: layout analysis, element analysis, and data formatting. The final result is structured document data, which enhances the efficiency and accuracy of data processing. PP-StructureV3 improves upon the general layout analysis v1 production line by enhancing layout region detection, table recognition, and formula recognition. It also adds capabilities such as multi-column reading order recovery and result conversion to Markdown files. It performs excellently across various document types and can handle complex document data. This production line also provides flexible service deployment options, supporting invocation using multiple programming languages on various hardware. In addition, it offers secondary development capabilities, allowing you to train and fine-tune models on your own dataset and integrate the trained models seamlessly.
PP-StructureV3 includes the following six modules. Each module can be independently trained and inferred, and contains multiple models. Click the corresponding module for more documentation.
- Layout Detection Module
- General OCR Subline
- Document Image Preprocessing Subline (Optional)
- Table Recognition Subline (Optional)
- Seal Recognition Subline (Optional)
- Formula Recognition Subline (Optional)
- Chart Parsing Module (Optional)
In this pipeline, you can choose the model to use based on the benchmark data below.
Document Image Orientation Classification Module :
Model | Download Link | Top-1 Acc (%) | GPU Inference Time (ms) [Standard Mode / High Performance Mode] |
CPU Inference Time (ms) [Standard Mode / High Performance Mode] |
Model Size (M) | Description |
---|---|---|---|---|---|---|
PP-LCNet_x1_0_doc_ori | Inference Model/Pretrained Model | 99.06 | 2.31 / 0.43 | 3.37 / 1.27 | 7 | Document image classification model based on PP-LCNet_x1_0, supporting four categories: 0°, 90°, 180°, 270° |
Text Image Rectification Module:
Text Image Rectification Module (Optional):
Model | Download Link | CER | Model Size (M) | Description |
---|---|---|---|---|
UVDoc | Inference Model/Pretrained Model | 0.179 | 30.3 M | High-precision text image rectification model |
Layout Detection Module Model:
* The layout detection model includes 20 common categories: document title, paragraph title, text, page number, abstract, table, references, footnotes, header, footer, algorithm, formula, formula number, image, table, seal, figure_table title, chart, and sidebar text and lists of referencesModel | Model Download Link | mAP(0.5) (%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (M) | Introduction |
---|---|---|---|---|---|---|
PP-DocLayout_plus-L | Inference Model/Training Model | 83.2 | 34.6244 / 10.3945 | 510.57 / - | 126.01 M | A higher-precision layout area localization model trained on a self-built dataset containing Chinese and English papers, PPT, multi-layout magazines, contracts, books, exams, ancient books and research reports using RT-DETR-L |
Model | Model Download Link | mAP(0.5) (%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (M) | Introduction |
---|---|---|---|---|---|---|
PP-DocBlockLayout | Inference Model/Training Model | 95.9 | 34.6244 / 10.3945 | 510.57 / - | 123.92 M | A layout block localization model trained on a self-built dataset containing Chinese and English papers, PPT, multi-layout magazines, contracts, books, exams, ancient books and research reports using RT-DETR-L |
Model | Download Link | mAP(0.5) (%) | GPU Inference Time (ms) [Standard Mode / High Performance Mode] |
CPU Inference Time (ms) [Standard Mode / High Performance Mode] |
Model Size (M) | Description |
---|---|---|---|---|---|---|
PP-DocLayout-L | Inference Model/Pretrained Model | 90.4 | 34.6244 / 10.3945 | 510.57 / - | 123.76 M | A high-precision layout area localization model trained on a self-built dataset containing Chinese and English papers, magazines, contracts, books, exams, and research reports using RT-DETR-L. |
PP-DocLayout-M | Inference Model/Pretrained Model | 75.2 | 13.3259 / 4.8685 | 44.0680 / 44.0680 | 22.578 | A layout area localization model with balanced precision and efficiency, trained on a self-built dataset containing Chinese and English papers, magazines, contracts, books, exams, and research reports using PicoDet-L. |
PP-DocLayout-S | Inference Model/Pretrained Model | 70.9 | 8.3008 / 2.3794 | 10.0623 / 9.9296 | 4.834 | A high-efficiency layout area localization model trained on a self-built dataset containing Chinese and English papers, magazines, contracts, books, exams, and research reports using PicoDet-S. |
👉 Details of Model List
* Table Layout Detection ModelModel | Model Download Link | mAP(0.5) (%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (M) | Introduction |
---|---|---|---|---|---|---|
PicoDet_layout_1x_table | Inference Model/Training Model | 97.5 | 8.02 / 3.09 | 23.70 / 20.41 | 7.4 M | A high-efficiency layout area localization model trained on a self-built dataset using PicoDet-1x, capable of detecting table regions. |
Model | Model Download Link | mAP(0.5) (%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (M) | Introduction |
---|---|---|---|---|---|---|
PicoDet-S_layout_3cls | Inference Model/Training Model | 88.2 | 8.99 / 2.22 | 16.11 / 8.73 | 4.8 | A high-efficiency layout area localization model trained on a self-built dataset of Chinese and English papers, magazines, and research reports using PicoDet-S. |
PicoDet-L_layout_3cls | Inference Model/Training Model | 89.0 | 13.05 / 4.50 | 41.30 / 41.30 | 22.6 | A balanced efficiency and precision layout area localization model trained on a self-built dataset of Chinese and English papers, magazines, and research reports using PicoDet-L. |
Table Classification Module Models:
RT-DETR-H_layout_3cls | Inference Model/Training Model | 95.8 | 114.93 / 27.71 | 947.56 / 947.56 | 470.1 | A high-precision layout area localization model trained on a self-built dataset of Chinese and English papers, magazines, and research reports using RT-DETR-H. |
Model | Model Download Link | mAP(0.5) (%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (M) | Introduction |
---|---|---|---|---|---|---|
PicoDet_layout_1x | Inference Model/Training Model | 97.8 | 9.03 / 3.10 | 25.82 / 20.70 | 7.4 | A high-efficiency English document layout area localization model trained on the PubLayNet dataset using PicoDet-1x. |
Model | Model Download Link | mAP(0.5) (%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (M) | Introduction |
---|---|---|---|---|---|---|
PicoDet-S_layout_17cls | Inference Model/Training Model | 87.4 | 9.11 / 2.12 | 15.42 / 9.12 | 4.8 | A high-efficiency layout area localization model trained on a self-built dataset of Chinese and English papers, magazines, and research reports using PicoDet-S. |
PicoDet-L_layout_17cls | Inference Model/Training Model | 89.0 | 13.50 / 4.69 | 43.32 / 43.32 | 22.6 | A balanced efficiency and precision layout area localization model trained on a self-built dataset of Chinese and English papers, magazines, and research reports using PicoDet-L. |
RT-DETR-H_layout_17cls | Inference Model/Training Model | 98.3 | 115.29 / 104.09 | 995.27 / 995.27 | 470.2 | A high-precision layout area localization model trained on a self-built dataset of Chinese and English papers, magazines, and research reports using RT-DETR-H. |
Table Structure Recognition Module (Optional):
Model | Download Link | mAP (%) | GPU Inference Time (ms) [Standard Mode / High Performance Mode] |
CPU Inference Time (ms) [Standard Mode / High Performance Mode] |
Model Size (M) | Description |
---|---|---|---|---|---|---|
RT-DETR-L_wired_table_cell_det | Inference Model/Pretrained Model | 82.7 | 35.00 / 10.45 | 495.51 / 495.51 | 124M | RT-DETR is the first real-time end-to-end object detection model. Based on RT-DETR-L, the PaddlePaddle Vision Team pre-trained the model on a custom table cell detection dataset, achieving good performance for both wired and wireless tables. |
RT-DETR-L_wireless_table_cell_det | Inference Model/Pretrained Model |
Text Detection Module (Required):
Model | Download Link | Detection Hmean (%) | GPU Inference Time (ms) [Standard Mode / High Performance Mode] |
CPU Inference Time (ms) [Standard Mode / High Performance Mode] |
Model Size (M) | Description |
---|---|---|---|---|---|---|
PP-OCRv4_server_det | Inference Model/Training Model | 82.56 | 83.34 / 80.91 | 442.58 / 442.58 | 109 | The server-side text detection model of PP-OCRv4, with higher accuracy, suitable for deployment on high-performance servers. |
PP-OCRv4_mobile_det | Inference Model/Training Model | 77.35 | 8.79 / 3.13 | 51.00 / 28.58 | 4.7 | The mobile text detection model of PP-OCRv4, with higher efficiency, suitable for deployment on edge devices. |
PP-OCRv3_mobile_det | Inference Model/Training Model | 78.68 | 8.44 / 2.91 | 27.87 / 27.87 | 2.1 | The mobile text detection model of PP-OCRv3, with higher efficiency, suitable for deployment on edge devices. |
PP-OCRv3_server_det | Inference Model/Training Model | 80.11 | 65.41 / 13.67 | 305.07 / 305.07 | 102.1 | The server-side text detection model of PP-OCRv3, with higher accuracy, suitable for deployment on high-performance servers. |
Text Recognition Module Model (Required):
👉Full Model List
* PP-OCRv5 Multi-Scenario ModelsModel | Download Link | Chinese Avg Accuracy (%) | English Avg Accuracy (%) | Traditional Chinese Avg Accuracy (%) | Japanese Avg Accuracy (%) | GPU Inference Time (ms) [Standard Mode / High Performance Mode] |
CPU Inference Time (ms) [Standard Mode / High Performance Mode] |
Model Size (M) | Description |
---|---|---|---|---|---|---|---|---|---|
PP-OCRv5_server_rec | Inference Model/Pretrained Model | 86.38 | 64.70 | 93.29 | 60.35 | - | - | 205 M | PP-OCRv5_server_rec is a new-generation text recognition model. It efficiently and accurately supports four major languages: Simplified Chinese, Traditional Chinese, English, and Japanese, as well as handwriting, vertical text, pinyin, and rare characters, offering robust and efficient support for document understanding. |
PP-OCRv5_mobile_rec | Inference Model/Pretrained Model | 81.29 | 66.00 | 83.55 | 54.65 | - | - | 136 M | PP-OCRv5_mobile_rec is a new-generation text recognition model. It efficiently and accurately supports four major languages: Simplified Chinese, Traditional Chinese, English, and Japanese, as well as handwriting, vertical text, pinyin, and rare characters, offering robust and efficient support for document understanding. |
Model | Download Link | Avg Accuracy (%) | GPU Inference Time (ms) [Standard Mode / High Performance Mode] |
CPU Inference Time (ms) [Standard Mode / High Performance Mode] |
Model Size (M) | Description |
---|---|---|---|---|---|---|
PP-OCRv4_server_rec_doc | Inference Model/Pretrained Model | 86.58 | 6.65 / 2.38 | 32.92 / 32.92 | 91 M | Based on PP-OCRv4_server_rec, trained on additional Chinese documents and PP-OCR mixed data. It supports over 15,000 characters including Traditional Chinese, Japanese, and special symbols, enhancing both document-specific and general text recognition accuracy. |
PP-OCRv4_mobile_rec | Inference Model/Pretrained Model | 83.28 | 4.82 / 1.20 | 16.74 / 4.64 | 11 M | Lightweight model of PP-OCRv4 with high inference efficiency, suitable for deployment on various edge devices. |
PP-OCRv4_server_rec | Inference Model/Pretrained Model | 85.19 | 6.58 / 2.43 | 33.17 / 33.17 | 87 M | Server-side model of PP-OCRv4 with high recognition accuracy, suitable for deployment on various servers. |
PP-OCRv3_mobile_rec | Inference Model/Pretrained Model | 75.43 | 5.87 / 1.19 | 9.07 / 4.28 | 11 M | Lightweight model of PP-OCRv3 with high inference efficiency, suitable for deployment on various edge devices. |
Model | Download Link | Avg Accuracy (%) | GPU Inference Time (ms) [Standard Mode / High Performance Mode] |
CPU Inference Time (ms) [Standard Mode / High Performance Mode] |
Model Size (M) | Description |
---|---|---|---|---|---|---|
ch_SVTRv2_rec | Inference Model/Pretrained Model | 68.81 | 8.08 / 2.74 | 50.17 / 42.50 | 73.9 M | SVTRv2 is a server-side recognition model developed by the OpenOCR team at Fudan University’s FVL Lab. It won first place in the OCR End-to-End Recognition task of the PaddleOCR Model Challenge, improving end-to-end accuracy on Benchmark A by 6% compared to PP-OCRv4. |
Model | Download Link | Avg Accuracy (%) | GPU Inference Time (ms) [Standard Mode / High Performance Mode] |
CPU Inference Time (ms) [Standard Mode / High Performance Mode] |
Model Size (M) | Description |
---|---|---|---|---|---|---|
ch_RepSVTR_rec | Inference Model/Pretrained Model | 65.07 | 5.93 / 1.62 | 20.73 / 7.32 | 22.1 M | RepSVTR is a mobile text recognition model based on SVTRv2. It won first place in the OCR End-to-End Recognition task of the PaddleOCR Model Challenge, improving accuracy on Benchmark B by 2.5% over PP-OCRv4 with comparable inference speed. |
Model | Download Link | Avg Accuracy (%) | GPU Inference Time (ms) [Standard Mode / High Performance Mode] |
CPU Inference Time (ms) [Standard Mode / High Performance Mode] |
Model Size (M) | Description |
---|---|---|---|---|---|---|
en_PP-OCRv4_mobile_rec | Inference Model/Pretrained Model | 70.39 | 4.81 / 0.75 | 16.10 / 5.31 | 6.8 M | Ultra-lightweight English recognition model trained on PP-OCRv4, supporting English and number recognition. |
en_PP-OCRv3_mobile_rec | Inference Model/Pretrained Model | 70.69 | 5.44 / 0.75 | 8.65 / 5.57 | 7.8 M | Ultra-lightweight English recognition model trained on PP-OCRv3, supporting English and number recognition. |
Model | Model Download Link | Recognition Avg Accuracy(%) | GPU Inference Time (ms) [Normal / High Performance] |
CPU Inference Time (ms) [Normal / High Performance] |
Model Size (M) | Description |
---|---|---|---|---|---|---|
korean_PP-OCRv3_mobile_rec | Inference Model/Pretrained Model | 60.21 | 5.40 / 0.97 | 9.11 / 4.05 | 8.6 M | An ultra-lightweight Korean text recognition model trained based on PP-OCRv3, supporting Korean and digits recognition |
japan_PP-OCRv3_mobile_rec | Inference Model/Pretrained Model | 45.69 | 5.70 / 1.02 | 8.48 / 4.07 | 8.8 M | An ultra-lightweight Japanese text recognition model trained based on PP-OCRv3, supporting Japanese and digits recognition |
chinese_cht_PP-OCRv3_mobile_rec | Inference Model/Pretrained Model | 82.06 | 5.90 / 1.28 | 9.28 / 4.34 | 9.7 M | An ultra-lightweight Traditional Chinese text recognition model trained based on PP-OCRv3, supporting Traditional Chinese and digits recognition |
te_PP-OCRv3_mobile_rec | Inference Model/Pretrained Model | 95.88 | 5.42 / 0.82 | 8.10 / 6.91 | 7.8 M | An ultra-lightweight Telugu text recognition model trained based on PP-OCRv3, supporting Telugu and digits recognition |
ka_PP-OCRv3_mobile_rec | Inference Model/Pretrained Model | 96.96 | 5.25 / 0.79 | 9.09 / 3.86 | 8.0 M | An ultra-lightweight Kannada text recognition model trained based on PP-OCRv3, supporting Kannada and digits recognition |
ta_PP-OCRv3_mobile_rec | Inference Model/Pretrained Model | 76.83 | 5.23 / 0.75 | 10.13 / 4.30 | 8.0 M | An ultra-lightweight Tamil text recognition model trained based on PP-OCRv3, supporting Tamil and digits recognition |
latin_PP-OCRv3_mobile_rec | Inference Model/Pretrained Model | 76.93 | 5.20 / 0.79 | 8.83 / 7.15 | 7.8 M | An ultra-lightweight Latin text recognition model trained based on PP-OCRv3, supporting Latin and digits recognition |
arabic_PP-OCRv3_mobile_rec | Inference Model/Pretrained Model | 73.55 | 5.35 / 0.79 | 8.80 / 4.56 | 7.8 M | An ultra-lightweight Arabic script recognition model trained based on PP-OCRv3, supporting Arabic script and digits recognition |
cyrillic_PP-OCRv3_mobile_rec | Inference Model/Pretrained Model | 94.28 | 5.23 / 0.76 | 8.89 / 3.88 | 7.9 M | An ultra-lightweight Cyrillic script recognition model trained based on PP-OCRv3, supporting Cyrillic script and digits recognition |
devanagari_PP-OCRv3_mobile_rec | Inference Model/Pretrained Model | 96.44 | 5.22 / 0.79 | 8.56 / 4.06 | 7.9 M | An ultra-lightweight Devanagari script recognition model trained based on PP-OCRv3, supporting Devanagari script and digits recognition |
Text Line Orientation Classification Module (Optional):
Model | Model Download Link | Top-1 Acc (%) | GPU Inference Time (ms) [Normal / High Performance] |
CPU Inference Time (ms) [Normal / High Performance] |
Model Size (M) | Description |
---|---|---|---|---|---|---|
PP-LCNet_x0_25_textline_ori | Inference Model/Pretrained Model | 95.54 | - | - | 0.32 | A text line classification model based on PP-LCNet_x0_25, containing two categories: 0 degrees and 180 degrees |
Formula Recognition Module (Optional):
Model | Model Download Link | En-BLEU(%) | Zh-BLEU(%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (M) | Introduction | UniMERNet | Inference Model/Training Model | 85.91 | 43.50 | 2266.96/- | -/- | 1.53 G | UniMERNet is a formula recognition model developed by Shanghai AI Lab. It uses Donut Swin as the encoder and MBartDecoder as the decoder. The model is trained on a dataset of one million samples, including simple formulas, complex formulas, scanned formulas, and handwritten formulas, significantly improving the recognition accuracy of real-world formulas. |
---|---|---|---|---|---|---|---|
PP-FormulaNet-S | Inference Model/Training Model | 87.00 | 45.71 | 202.25/- | -/- | 224 M | PP-FormulaNet is an advanced formula recognition model developed by the Baidu PaddlePaddle Vision Team. The PP-FormulaNet-S version uses PP-HGNetV2-B4 as its backbone network. Through parallel masking and model distillation techniques, it significantly improves inference speed while maintaining high recognition accuracy, making it suitable for applications requiring fast inference. The PP-FormulaNet-L version, on the other hand, uses Vary_VIT_B as its backbone network and is trained on a large-scale formula dataset, showing significant improvements in recognizing complex formulas compared to PP-FormulaNet-S. | PP-FormulaNet-L | Inference Model/Training Model | 90.36 | 45.78 | 1976.52/- | -/- | 695 M |
PP-FormulaNet_plus-S | Inference Model/Training Model | 88.71 | 53.32 | 191.69/- | -/- | 248 M | PP-FormulaNet_plus is an enhanced version of the formula recognition model developed by the Baidu PaddlePaddle Vision Team, building upon the original PP-FormulaNet. Compared to the original version, PP-FormulaNet_plus utilizes a more diverse formula dataset during training, including sources such as Chinese dissertations, professional books, textbooks, exam papers, and mathematics journals. This expansion significantly improves the model’s recognition capabilities. Among the models, PP-FormulaNet_plus-M and PP-FormulaNet_plus-L have added support for Chinese formulas and increased the maximum number of predicted tokens for formulas from 1,024 to 2,560, greatly enhancing the recognition performance for complex formulas. Meanwhile, the PP-FormulaNet_plus-S model focuses on improving the recognition of English formulas. With these improvements, the PP-FormulaNet_plus series models perform exceptionally well in handling complex and diverse formula recognition tasks. |
PP-FormulaNet_plus-M | Inference Model/Training Model | 91.45 | 89.76 | 1301.56/- | -/- | 592 M | |
PP-FormulaNet_plus-L | Inference Model/Training Model | 92.22 | 90.64 | 1745.25/- | -/- | 698 M | |
LaTeX_OCR_rec | Inference Model/Training Model | 74.55 | 39.96 | 1244.61/- | -/- | 99 M | LaTeX-OCR is a formula recognition algorithm based on an autoregressive large model. It uses Hybrid ViT as the backbone network and a transformer as the decoder, significantly improving the accuracy of formula recognition. |
Seal Text Detection Module (Optional):
Model | Model Download Link | Detection Hmean (%) | GPU Inference Time (ms) [Normal / High Performance] |
CPU Inference Time (ms) [Normal / High Performance] |
Model Size (M) | Description |
---|---|---|---|---|---|---|
PP-OCRv4_server_seal_det | Inference Model/Pretrained Model | 98.21 | 74.75 / 67.72 | 382.55 / 382.55 | 109 | Server-side seal text detection model based on PP-OCRv4, offering higher accuracy and suitable for deployment on high-performance servers |
PP-OCRv4_mobile_seal_det | Inference Model/Pretrained Model | 96.47 | 7.82 / 3.09 | 48.28 / 23.97 | 4.6 | Mobile-side seal text detection model based on PP-OCRv4, offering higher efficiency and suitable for edge-side deployment |
Chart Parsing Model Module:
Model | Model Download Link | Model parameter size(B) | Model Storage Size (GB) | Model Score | Description |
---|---|---|---|---|---|
PP-Chart2Table | Inference Model | 0.58 | 1.4 | 75.98 | PP-Chart2Table is a self-developed multimodal model by the PaddlePaddle team, focusing on chart parsing, demonstrating outstanding performance in both Chinese and English chart parsing tasks. The team adopted a carefully designed data generation strategy, constructing a high-quality multimodal dataset of nearly 700,000 entries covering common chart types like pie charts, bar charts, stacked area charts, and various application scenarios. They also designed a two-stage training method, utilizing large model distillation to fully leverage massive unlabeled OOD data. In internal business tests in both Chinese and English scenarios, PP-Chart2Table not only achieved the SOTA level among models of the same parameter scale but also reached accuracy comparable to 7B parameter scale VLM models in critical scenarios. |
Test Environment Description:
- Performance Test Environment
- Test Dataset:
- Document Image Orientation Classification Module: A self-built dataset using PaddleX, covering multiple scenarios such as ID cards and documents, containing 1000 images.
- Text Image Rectification Model: DocUNet
- Layout Region Detection Model: A self-built layout detection dataset using PaddleOCR, containing 10,000 images of common document types such as Chinese and English papers, magazines, and research reports.
- Table Structure Recognition Model: A self-built English table recognition dataset using PaddleX.
- Text Detection Model: A self-built Chinese dataset using PaddleOCR, covering multiple scenarios such as street scenes, web images, documents, and handwriting, with 500 images for detection.
- Chinese Recognition Model: A self-built Chinese dataset using PaddleOCR, covering multiple scenarios such as street scenes, web images, documents, and handwriting, with 11,000 images for text recognition.
- ch_SVTRv2_rec: Evaluation set A for "OCR End-to-End Recognition Task" in the PaddleOCR Algorithm Model Challenge
- ch_RepSVTR_rec: Evaluation set B for "OCR End-to-End Recognition Task" in the PaddleOCR Algorithm Model Challenge.
- English Recognition Model: A self-built English dataset using PaddleX.
- Multilingual Recognition Model: A self-built multilingual dataset using PaddleX.
- Text Line Orientation Classification Model: A self-built dataset using PaddleX, covering various scenarios such as ID cards and documents, containing 1000 images.
- Seal Text Detection Model: A self-built dataset using PaddleX, containing 500 images of circular seal textures.
- Hardware Configuration:
- GPU: NVIDIA Tesla T4
- CPU: Intel Xeon Gold 6271C @ 2.60GHz
- Other Environments: Ubuntu 20.04 / cuDNN 8.6 / TensorRT 8.5.2.2
- Test Dataset:
- Inference Mode Description
Mode | GPU Configuration | CPU Configuration | Acceleration Technology Combination |
---|---|---|---|
Normal Mode | FP32 Precision / No TRT Acceleration | FP32 Precision / 8 Threads | PaddleInference |
High-Performance Mode | Optimal combination of pre-selected precision types and acceleration strategies | FP32 Precision / 8 Threads | Pre-selected optimal backend (Paddle/OpenVINO/TRT, etc.) |
2. Quick Start¶
All the model pipelines provided by PaddleX can be quickly experienced. You can use the command line or Python on your local machine to experience the effect of the PP-StructureV3 pipeline.
Before using the PP-StructureV3 pipeline locally, please ensure that you have completed the installation of the PaddleX wheel package according to the PaddleOCR Local Installation Guide. If you wish to selectively install dependencies, please refer to the relevant instructions in the installation guide. The dependency group corresponding to this pipeline is ocr
.
When performing GPU inference, the default configuration may use more than 16 GB of VRAM. Please ensure that your GPU has sufficient memory. To reduce VRAM usage, you can modify the configuration file as described below to disable unnecessary features.
2.1 Experiencing via Command Line¶
You can quickly experience the PP-StructureV3 pipeline with a single command.
paddleocr pp_structurev3 -i https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/pp_structure_v3_demo.png
paddleocr pp_structurev3 -i ./pp_structure_v3_demo.png --use_doc_orientation_classify True
paddleocr pp_structurev3 -i ./pp_structure_v3_demo.png --use_doc_unwarping True
paddleocr pp_structurev3 -i ./pp_structure_v3_demo.png --use_textline_orientation False
paddleocr pp_structurev3 -i ./pp_structure_v3_demo.png --device gpu
The parameter description can be found in 2.2 Python Script Integration. Supports specifying multiple devices simultaneously for parallel inference. For details, please refer to Pipeline Parallel Inference.
After running, the result will be printed to the terminal, as follows:
👉Click to Expand
{'res': {'input_path': 'pp_structure_v3_demo.png', 'model_settings': {'use_doc_preprocessor': False, 'use_general_ocr': True, 'use_seal_recognition': True, 'use_table_recognition': True, 'use_formula_recognition': True}, 'layout_det_res': {'input_path': None, 'page_index': None, 'boxes': [{'cls_id': 2, 'label': 'text', 'score': 0.9853514432907104, 'coordinate': [770.9531, 776.6814, 1122.6057, 1058.7322]}, {'cls_id': 1, 'label': 'image', 'score': 0.9848673939704895, 'coordinate': [775.7434, 202.27979, 1502.8113, 686.02136]}, {'cls_id': 2, 'label': 'text', 'score': 0.983731746673584, 'coordinate': [1152.3197, 1113.3275, 1503.3029, 1346.586]}, {'cls_id': 2, 'label': 'text', 'score': 0.9832221865653992, 'coordinate': [1152.5602, 801.431, 1503.8436, 986.3563]}, {'cls_id': 2, 'label': 'text', 'score': 0.9829439520835876, 'coordinate': [9.549545, 849.5713, 359.1173, 1058.7488]}, {'cls_id': 2, 'label': 'text', 'score': 0.9811657667160034, 'coordinate': [389.58298, 1137.2659, 740.66235, 1346.7488]}, {'cls_id': 2, 'label': 'text', 'score': 0.9775941371917725, 'coordinate': [9.1302185, 201.85, 359.0409, 339.05692]}, {'cls_id': 2, 'label': 'text', 'score': 0.9750366806983948, 'coordinate': [389.71454, 752.96924, 740.544, 889.92456]}, {'cls_id': 2, 'label': 'text', 'score': 0.9738152027130127, 'coordinate': [389.94565, 298.55988, 740.5585, 435.5124]}, {'cls_id': 2, 'label': 'text', 'score': 0.9737328290939331, 'coordinate': [771.50256, 1065.4697, 1122.2582, 1178.7324]}, {'cls_id': 2, 'label': 'text', 'score': 0.9728517532348633, 'coordinate': [1152.5154, 993.3312, 1503.2349, 1106.327]}, {'cls_id': 2, 'label': 'text', 'score': 0.9725610017776489, 'coordinate': [9.372787, 1185.823, 359.31738, 1298.7227]}, {'cls_id': 2, 'label': 'text', 'score': 0.9724331498146057, 'coordinate': [389.62848, 610.7389, 740.83234, 746.2377]}, {'cls_id': 2, 'label': 'text', 'score': 0.9720287322998047, 'coordinate': [389.29898, 897.0936, 741.41516, 1034.6616]}, {'cls_id': 2, 'label': 'text', 'score': 0.9713053703308105, 'coordinate': [10.323685, 1065.4663, 359.6786, 1178.8872]}, {'cls_id': 2, 'label': 'text', 'score': 0.9689728021621704, 'coordinate': [9.336395, 537.6609, 359.2901, 652.1881]}, {'cls_id': 2, 'label': 'text', 'score': 0.9684857130050659, 'coordinate': [10.7608185, 345.95068, 358.93616, 434.64087]}, {'cls_id': 2, 'label': 'text', 'score': 0.9681928753852844, 'coordinate': [9.674866, 658.89075, 359.56528, 770.4319]}, {'cls_id': 2, 'label': 'text', 'score': 0.9634978175163269, 'coordinate': [770.9464, 1281.1785, 1122.6522, 1346.7156]}, {'cls_id': 2, 'label': 'text', 'score': 0.96304851770401, 'coordinate': [390.0113, 201.28055, 740.1684, 291.53073]}, {'cls_id': 2, 'label': 'text', 'score': 0.962053120136261, 'coordinate': [391.21393, 1040.952, 740.5046, 1130.32]}, {'cls_id': 2, 'label': 'text', 'score': 0.9565253853797913, 'coordinate': [10.113251, 777.1482, 359.439, 842.437]}, {'cls_id': 2, 'label': 'text', 'score': 0.9497362375259399, 'coordinate': [390.31357, 537.86285, 740.47595, 603.9285]}, {'cls_id': 2, 'label': 'text', 'score': 0.9371236562728882, 'coordinate': [10.2034, 1305.9753, 359.5958, 1346.7295]}, {'cls_id': 0, 'label': 'paragraph_title', 'score': 0.9338151216506958, 'coordinate': [791.6062, 1200.8479, 1103.3257, 1259.9324]}, {'cls_id': 0, 'label': 'paragraph_title', 'score': 0.9326773285865784, 'coordinate': [408.0737, 457.37024, 718.9509, 516.63464]}, {'cls_id': 0, 'label': 'paragraph_title', 'score': 0.9274250864982605, 'coordinate': [29.448685, 456.6762, 340.99194, 515.6999]}, {'cls_id': 2, 'label': 'text', 'score': 0.8742568492889404, 'coordinate': [1154.7095, 777.3624, 1330.3086, 794.5853]}, {'cls_id': 2, 'label': 'text', 'score': 0.8442489504814148, 'coordinate': [586.49316, 160.15454, 927.468, 179.64203]}, {'cls_id': 11, 'label': 'doc_title', 'score': 0.8332607746124268, 'coordinate': [133.80017, 37.41908, 1380.8601, 124.1429]}, {'cls_id': 6, 'label': 'figure_title', 'score': 0.6770150661468506, 'coordinate': [812.1718, 705.1199, 1484.6973, 747.1692]}]}, 'overall_ocr_res': {'input_path': None, 'page_index': None, 'model_settings': {'use_doc_preprocessor': False, 'use_textline_orientation': False}, 'dt_polys': array([[[ 133, 35],
...,
[ 133, 131]],
...,
[[1154, 1323],
...,
[1152, 1355]]], dtype=int16), 'text_det_params': {'limit_side_len': 960, 'limit_type': 'max', 'thresh': 0.3, 'box_thresh': 0.6, 'unclip_ratio': 2.0}, 'text_type': 'general', 'textline_orientation_angles': array([-1, ..., -1]), 'text_rec_score_thresh': 0.0, 'rec_texts': ['助力双方交往', '搭建友谊桥梁', '本报记者', '沈小晓', '任', '彦', '黄培昭', '身着中国传统民族服装的厄立特里亚青', '厄立特里亚高等教育与研究院合作建立,开', '年依次登台表演中国民族舞、现代舞、扇子舞', '设了中国语言课程和中国文化课程,注册学', '等,曼妙的舞姿赢得现场观众阵阵掌声。这', '生2万余人次。10余年来,厄特孔院已成为', '是日前厄立特里亚高等教育与研究院孔子学', '当地民众了解中国的一扇窗口。', '院(以下简称"厄特孔院")举办"喜迎新年"中国', '黄鸣飞表示,随着来学习中文的人日益', '歌舞比赛的场景。', '增多,阿斯马拉大学教学点已难以满足教学', '中国和厄立特里亚传统友谊深厚。近年', '需要。2024年4月,由中企蜀道集团所属四', '来,在高质量共建"一带一路"框架下,中厄两', '川路桥承建的孔院教学楼项目在阿斯马拉开', '国人文交流不断深化,互利合作的民意基础', '工建设,预计今年上半年峻工,建成后将为厄', '日益深厚。', '特孔院提供全新的办学场地。', '“学好中文,我们的', '“在中国学习的经历', '未来不是梦”', '让我看到更广阔的世界”', '“鲜花曾告诉我你怎样走过,大地知道你', '多年来,厄立特里亚广大赴华留学生和', '心中的每一个角落…"厄立特里亚阿斯马拉', '培训人员积极投身国家建设,成为助力该国', '大学综合楼二层,一阵优美的歌声在走廊里回', '发展的人才和厄中友好的见证者和推动者。', '响。循着熟悉的旋律轻轻推开一间教室的门,', '在厄立特里亚全国妇女联盟工作的约翰', '学生们正跟着老师学唱中文歌曲《同一首歌》。', '娜·特韦尔德·凯莱塔就是其中一位。她曾在', '这是厄特孔院阿斯马拉大学教学点的一', '中华女子学院攻读硕士学位,研究方向是女', '节中文歌曲课。为了让学生们更好地理解歌', '性领导力与社会发展。其间,她实地走访中国', '词大意,老师尤斯拉·穆罕默德萨尔·侯赛因逐', '多个地区,获得了观察中国社会发展的第一', '在厄立特里亚不久前举办的第六届中国风筝文化节上,当地小学生体验风筝制作。', '字翻译和解释歌词。随着伴奏声响起,学生们', '手资料。', '中国驻厄立特里亚大使馆供图', '边唱边随着节拍摇动身体,现场气氛热烈。', '谈起在中国求学的经历,约翰娜记忆犹', '“这是中文歌曲初级班,共有32人。学', '新:"中国的发展在当今世界是独一无二的。', '“不管远近都是客人,请不用客气;相约', '瓦的北红海省博物馆。', '生大部分来自首都阿斯马拉的中小学,年龄', '沿着中国特色社会主义道路坚定前行,中国', '好了在一起我们欢迎你"在一场中厄青', '博物馆二层陈列着一个发掘自阿杜利', '最小的仅有6岁。”尤斯拉告诉记者。', '创造了发展奇迹,这一切都离不开中国共产党', '年联谊活动上,四川路桥中方员工同当地大', '斯古城的中国古代陶制酒器,罐身上写着', '尤斯拉今年23岁,是厄立特里亚一所公立', '的领导。中国的发展经验值得许多国家学习', '学生合唱《北京欢迎你》。厄立特里亚技术学', '“万""和""禅"“山"等汉字。“这件文物证', '学校的艺术老师。她12岁开始在厄特孔院学', '借鉴。”', '院计算机科学与工程专业学生鲁夫塔·谢拉', '明,很早以前我们就通过海上丝绸之路进行', '习中文,在2017年第十届"汉语桥"世界中学生', '正在西南大学学习的厄立特里亚博士生', '是其中一名演唱者,她很早便在孔院学习中', '贸易往来与文化交流。这也是厄立特里亚', '中文比赛中获得厄立特里亚赛区第一名,并和', '穆卢盖塔·泽穆伊对中国怀有深厚感情。8', '文,一直在为去中国留学作准备。"这句歌词', '与中国友好交往历史的有力证明。"北红海', '同伴代表厄立特里亚前往中国参加决赛,获得', '年前,在北京师范大学获得硕士学位后,穆卢', '是我们两国人民友谊的生动写照。无论是投', '省博物馆研究与文献部负责人伊萨亚斯·特', '团体优胜奖。2022年起,尤斯拉开始在厄特孔', '盖塔在社交媒体上写下这样一段话:"这是我', '身于厄立特里亚基础设施建设的中企员工,', '斯法兹吉说。', '院兼职教授中文歌曲,每周末两个课时。“中国', '人生的重要一步,自此我拥有了一双坚固的', '还是在中国留学的厄立特里亚学子,两国人', '厄立特里亚国家博物馆考古学和人类学', '文化博大精深,我希望我的学生们能够通过中', '鞋子,赋予我穿越荆棘的力量。”', '民携手努力,必将推动两国关系不断向前发', '研究员菲尔蒙·特韦尔德十分喜爱中国文', '文歌曲更好地理解中国文化。"她说。', '穆卢盖塔密切关注中国在经济、科技、教', '展。"鲁夫塔说。', '化。他表示:“学习彼此的语言和文化,将帮', '“姐姐,你想去中国吗?""非常想!我想', '育等领域的发展,中国在科研等方面的实力', '厄立特里亚高等教育委员会主任助理萨', '助厄中两国人民更好地理解彼此,助力双方', '去看故宫、爬长城。"尤斯拉的学生中有一对', '与日俱增。在中国学习的经历让我看到更广', '马瑞表示:"每年我们都会组织学生到中国访', '交往,搭建友谊桥梁。"', '能歌善舞的姐妹,姐姐露娅今年15岁,妹妹', '阔的世界,从中受益匪浅。', '问学习,目前有超过5000名厄立特里亚学生', '厄立特里亚国家博物馆馆长塔吉丁·努', '莉娅14岁,两人都已在厄特孔院学习多年,', '23岁的莉迪亚·埃斯蒂法诺斯已在厄特', '在中国留学。学习中国的教育经验,有助于', '里达姆·优素福曾多次访问中国,对中华文明', '中文说得格外流利。', '孔院学习3年,在中国书法、中国画等方面表', '提升厄立特里亚的教育水平。”', '的传承与创新、现代化博物馆的建设与发展', '露娅对记者说:"这些年来,怀着对中文', '现十分优秀,在2024年厄立特里亚赛区的', '“共同向世界展示非', '印象深刻。“中国博物馆不仅有许多保存完好', '和中国文化的热爱,我们姐妹俩始终相互鼓', '“汉语桥"比赛中获得一等奖。莉迪亚说:"学', '的文物,还充分运用先进科技手段进行展示,', '励,一起学习。我们的中文一天比一天好,还', '习中国书法让我的内心变得安宁和纯粹。我', '洲和亚洲的灿烂文明”', '帮助人们更好理解中华文明。"塔吉丁说,"厄', '学会了中文歌和中国舞。我们一定要到中国', '也喜欢中国的服饰,希望未来能去中国学习,', '立特里亚与中国都拥有悠久的文明,始终相', '去。学好中文,我们的未来不是梦!"', '把中国不同民族元素融入服装设计中,创作', '从阿斯马拉出发,沿着蜿蜓曲折的盘山', '互理解、相互尊重。我希望未来与中国同行', '据厄特孔院中方院长黄鸣飞介绍,这所', '出更多精美作品,也把厄特文化分享给更多', '公路一路向东寻找丝路印迹。驱车两个小', '加强合作,共同向世界展示非洲和亚洲的灿', '孔院成立于2013年3月,由贵州财经大学和', '的中国朋友。”', '时,记者来到位于厄立特里亚港口城市马萨', '烂文明。”'], 'rec_scores': array([0.99943757, ..., 0.98181838]), 'rec_polys': array([[[ 133, 35],
...,
[ 133, 131]],
...,
[[1154, 1323],
...,
[1152, 1355]]], dtype=int16), 'rec_boxes': array([[ 133, ..., 131],
...,
[1152, ..., 1359]], dtype=int16)}, 'text_paragraphs_ocr_res': {'rec_polys': array([[[ 133, 35],
...,
[ 133, 131]],
...,
[[1154, 1323],
...,
[1152, 1355]]], dtype=int16), 'rec_texts': ['助力双方交往', '搭建友谊桥梁', '本报记者', '沈小晓', '任', '彦', '黄培昭', '身着中国传统民族服装的厄立特里亚青', '厄立特里亚高等教育与研究院合作建立,开', '年依次登台表演中国民族舞、现代舞、扇子舞', '设了中国语言课程和中国文化课程,注册学', '等,曼妙的舞姿赢得现场观众阵阵掌声。这', '生2万余人次。10余年来,厄特孔院已成为', '是日前厄立特里亚高等教育与研究院孔子学', '当地民众了解中国的一扇窗口。', '院(以下简称"厄特孔院")举办"喜迎新年"中国', '黄鸣飞表示,随着来学习中文的人日益', '歌舞比赛的场景。', '增多,阿斯马拉大学教学点已难以满足教学', '中国和厄立特里亚传统友谊深厚。近年', '需要。2024年4月,由中企蜀道集团所属四', '来,在高质量共建"一带一路"框架下,中厄两', '川路桥承建的孔院教学楼项目在阿斯马拉开', '国人文交流不断深化,互利合作的民意基础', '工建设,预计今年上半年峻工,建成后将为厄', '日益深厚。', '特孔院提供全新的办学场地。', '“学好中文,我们的', '“在中国学习的经历', '未来不是梦”', '让我看到更广阔的世界”', '“鲜花曾告诉我你怎样走过,大地知道你', '多年来,厄立特里亚广大赴华留学生和', '心中的每一个角落…"厄立特里亚阿斯马拉', '培训人员积极投身国家建设,成为助力该国', '大学综合楼二层,一阵优美的歌声在走廊里回', '发展的人才和厄中友好的见证者和推动者。', '响。循着熟悉的旋律轻轻推开一间教室的门,', '在厄立特里亚全国妇女联盟工作的约翰', '学生们正跟着老师学唱中文歌曲《同一首歌》。', '娜·特韦尔德·凯莱塔就是其中一位。她曾在', '这是厄特孔院阿斯马拉大学教学点的一', '中华女子学院攻读硕士学位,研究方向是女', '节中文歌曲课。为了让学生们更好地理解歌', '性领导力与社会发展。其间,她实地走访中国', '词大意,老师尤斯拉·穆罕默德萨尔·侯赛因逐', '多个地区,获得了观察中国社会发展的第一', '在厄立特里亚不久前举办的第六届中国风筝文化节上,当地小学生体验风筝制作。', '字翻译和解释歌词。随着伴奏声响起,学生们', '手资料。', '中国驻厄立特里亚大使馆供图', '边唱边随着节拍摇动身体,现场气氛热烈。', '谈起在中国求学的经历,约翰娜记忆犹', '“这是中文歌曲初级班,共有32人。学', '新:"中国的发展在当今世界是独一无二的。', '“不管远近都是客人,请不用客气;相约', '瓦的北红海省博物馆。', '生大部分来自首都阿斯马拉的中小学,年龄', '沿着中国特色社会主义道路坚定前行,中国', '好了在一起我们欢迎你"在一场中厄青', '博物馆二层陈列着一个发掘自阿杜利', '最小的仅有6岁。”尤斯拉告诉记者。', '创造了发展奇迹,这一切都离不开中国共产党', '年联谊活动上,四川路桥中方员工同当地大', '斯古城的中国古代陶制酒器,罐身上写着', '尤斯拉今年23岁,是厄立特里亚一所公立', '的领导。中国的发展经验值得许多国家学习', '学生合唱《北京欢迎你》。厄立特里亚技术学', '“万""和""禅"“山"等汉字。“这件文物证', '学校的艺术老师。她12岁开始在厄特孔院学', '借鉴。”', '院计算机科学与工程专业学生鲁夫塔·谢拉', '明,很早以前我们就通过海上丝绸之路进行', '习中文,在2017年第十届"汉语桥"世界中学生', '正在西南大学学习的厄立特里亚博士生', '是其中一名演唱者,她很早便在孔院学习中', '贸易往来与文化交流。这也是厄立特里亚', '中文比赛中获得厄立特里亚赛区第一名,并和', '穆卢盖塔·泽穆伊对中国怀有深厚感情。8', '文,一直在为去中国留学作准备。"这句歌词', '与中国友好交往历史的有力证明。"北红海', '同伴代表厄立特里亚前往中国参加决赛,获得', '年前,在北京师范大学获得硕士学位后,穆卢', '是我们两国人民友谊的生动写照。无论是投', '省博物馆研究与文献部负责人伊萨亚斯·特', '团体优胜奖。2022年起,尤斯拉开始在厄特孔', '盖塔在社交媒体上写下这样一段话:"这是我', '身于厄立特里亚基础设施建设的中企员工,', '斯法兹吉说。', '院兼职教授中文歌曲,每周末两个课时。“中国', '人生的重要一步,自此我拥有了一双坚固的', '还是在中国留学的厄立特里亚学子,两国人', '厄立特里亚国家博物馆考古学和人类学', '文化博大精深,我希望我的学生们能够通过中', '鞋子,赋予我穿越荆棘的力量。”', '民携手努力,必将推动两国关系不断向前发', '研究员菲尔蒙·特韦尔德十分喜爱中国文', '文歌曲更好地理解中国文化。"她说。', '穆卢盖塔密切关注中国在经济、科技、教', '展。"鲁夫塔说。', '化。他表示:“学习彼此的语言和文化,将帮', '“姐姐,你想去中国吗?""非常想!我想', '育等领域的发展,中国在科研等方面的实力', '厄立特里亚高等教育委员会主任助理萨', '助厄中两国人民更好地理解彼此,助力双方', '去看故宫、爬长城。"尤斯拉的学生中有一对', '与日俱增。在中国学习的经历让我看到更广', '马瑞表示:"每年我们都会组织学生到中国访', '交往,搭建友谊桥梁。"', '能歌善舞的姐妹,姐姐露娅今年15岁,妹妹', '阔的世界,从中受益匪浅。', '问学习,目前有超过5000名厄立特里亚学生', '厄立特里亚国家博物馆馆长塔吉丁·努', '莉娅14岁,两人都已在厄特孔院学习多年,', '23岁的莉迪亚·埃斯蒂法诺斯已在厄特', '在中国留学。学习中国的教育经验,有助于', '里达姆·优素福曾多次访问中国,对中华文明', '中文说得格外流利。', '孔院学习3年,在中国书法、中国画等方面表', '提升厄立特里亚的教育水平。”', '的传承与创新、现代化博物馆的建设与发展', '露娅对记者说:"这些年来,怀着对中文', '现十分优秀,在2024年厄立特里亚赛区的', '“共同向世界展示非', '印象深刻。“中国博物馆不仅有许多保存完好', '和中国文化的热爱,我们姐妹俩始终相互鼓', '“汉语桥"比赛中获得一等奖。莉迪亚说:"学', '的文物,还充分运用先进科技手段进行展示,', '励,一起学习。我们的中文一天比一天好,还', '习中国书法让我的内心变得安宁和纯粹。我', '洲和亚洲的灿烂文明”', '帮助人们更好理解中华文明。"塔吉丁说,"厄', '学会了中文歌和中国舞。我们一定要到中国', '也喜欢中国的服饰,希望未来能去中国学习,', '立特里亚与中国都拥有悠久的文明,始终相', '去。学好中文,我们的未来不是梦!"', '把中国不同民族元素融入服装设计中,创作', '从阿斯马拉出发,沿着蜿蜓曲折的盘山', '互理解、相互尊重。我希望未来与中国同行', '据厄特孔院中方院长黄鸣飞介绍,这所', '出更多精美作品,也把厄特文化分享给更多', '公路一路向东寻找丝路印迹。驱车两个小', '加强合作,共同向世界展示非洲和亚洲的灿', '孔院成立于2013年3月,由贵州财经大学和', '的中国朋友。”', '时,记者来到位于厄立特里亚港口城市马萨', '烂文明。”'], 'rec_scores': array([0.99943757, ..., 0.98181838]), 'rec_boxes': array([[ 133, ..., 131],
...,
[1152, ..., 1359]], dtype=int16)}}}
The result parameter description can be found in the result interpretation in 2.2.2 Python Script Integration.
Note: Since the default model of the pipeline is relatively large, the inference speed may be slow. You can refer to the model list in Section 1 and replace it with a model that has faster inference speed.
2.2 Python Script Integration¶
Just a few lines of code can complete the quick inference of the pipeline. Taking the PP-StructureV3 pipeline as an example:
from paddlex import create_pipeline
pipeline = create_pipeline(pipeline="PP-StructureV3")
## 2. Quick Start
Before using the PP-StructureV3 pipeline locally, please make sure you have completed the installation of the wheel package according to the [installation guide](../installation.en.md). After installation, you can use it via command line or Python integration.
### 2.1 Command Line Usage
Use a single command to quickly experience the PP-StructureV3 pipeline:
```bash
paddleocr pp_structurev3 -i https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/pp_structure_v3_demo.png
# Use --use_doc_orientation_classify to enable document orientation classification
paddleocr pp_structurev3 -i ./pp_structure_v3_demo.png --use_doc_orientation_classify True
# Use --use_doc_unwarping to enable document unwarping module
paddleocr pp_structurev3 -i ./pp_structure_v3_demo.png --use_doc_unwarping True
# Use --use_textline_orientation to enable text line orientation classification
paddleocr pp_structurev3 -i ./pp_structure_v3_demo.png --use_textline_orientation False
# Use --device to specify GPU for inference
paddleocr pp_structurev3 -i ./pp_structure_v3_demo.png --device gpu
Command line supports more parameters. Click to expand for detailed parameter descriptions
Parameter | Description | Type | Default |
---|---|---|---|
input |
Data to be predicted. Required. Supports multiple input types.
|
Python Var|str|list |
|
save_path |
Path to save inference results. If set to None , results will not be saved locally. |
str |
None |
layout_detection_model_name |
Name of the layout detection model. If set to None , the default model will be used. |
str |
None |
layout_detection_model_dir |
Directory path of the layout detection model. If set to None , the official model will be downloaded. |
str |
None |
layout_threshold |
Score threshold for the layout model.
|
float|dict |
None |
layout_nms |
Whether to apply NMS post-processing for layout detection model. | bool |
None |
layout_unclip_ratio |
Unclip ratio for detected boxes in layout detection model.
|
float|Tuple[float,float]|dict |
None |
layout_merge_bboxes_mode |
Merge mode for overlapping boxes in layout detection.
|
str|dict |
None |
chart_recognition_model_name |
Name of the chart recognition model. If set to None , the default model will be used. |
str |
None |
chart_recognition_model_dir |
Directory path of the chart recognition model. If set to None , the official model will be downloaded. |
str |
None |
chart_recognition_batch_size |
Batch size for the chart recognition model. If set to None , the default batch size is 1 . |
int |
None |
region_detection_model_name |
Name of the region detection model. If set to None , the default model will be used. |
str |
None |
region_detection_model_dir |
Directory path of the region detection model. If set to None , the official model will be downloaded. |
str |
None |
doc_orientation_classify_model_name |
Name of the document orientation classification model. If set to None , the default model will be used. |
str |
None |
doc_orientation_classify_model_dir |
Directory path of the document orientation classification model. If set to None , the official model will be downloaded. |
str |
None |
doc_unwarping_model_name |
Name of the document unwarping model. If set to None , the default model will be used. |
str |
None |
doc_unwarping_model_dir |
Directory path of the document unwarping model. If set to None , the official model will be downloaded. |
str |
None |
text_detection_model_name |
Name of the text detection model. If set to None , the default model will be used. |
str |
None |
text_detection_model_dir |
Directory path of the text detection model. If set to None , the official model will be downloaded. |
str |
None |
text_det_limit_side_len |
Maximum side length limit for text detection.
|
int |
None |
text_det_limit_type |
|
str |
None |
text_det_thresh |
Pixel threshold for detection. Pixels with scores above this value in the probability map are considered text.
|
float |
None |
text_det_box_thresh |
Box threshold. A bounding box is considered text if the average score of pixels inside is greater than this value.
|
float |
None |
text_det_unclip_ratio |
Expansion ratio for text detection. The higher the value, the larger the expansion area.
|
float |
None |
textline_orientation_model_name |
Name of the text line orientation model. If set to None , the default model will be used. |
str |
None |
textline_orientation_model_dir |
Directory of the text line orientation model. If set to None , the official model will be downloaded. |
str |
None |
textline_orientation_batch_size |
Batch size for the text line orientation model. If set to None , default is 1 . |
int |
None |
text_recognition_model_name |
Name of the text recognition model. If set to None , the default model will be used. |
str |
None |
text_recognition_model_dir |
Directory of the text recognition model. If set to None , the official model will be downloaded. |
str |
None |
text_recognition_batch_size |
Batch size for text recognition. If set to None , default is 1 . |
int |
None |
text_rec_score_thresh |
Score threshold for text recognition. Only results above this value will be kept.
|
float |
None |
table_classification_model_name |
Name of the table classification model. If set to None , the default model will be used. |
str |
None |
table_classification_model_dir |
Directory of the table classification model. If set to None , the official model will be downloaded. |
str |
None |
wired_table_structure_recognition_model_name |
Name of the wired table structure recognition model. If set to None , the default model will be used. |
str |
None |
wired_table_structure_recognition_model_dir |
Directory of the wired table structure recognition model. If set to None , the official model will be downloaded. |
str |
None |
wireless_table_structure_recognition_model_name |
Name of the wireless table structure recognition model. If set to None , the default model will be used. |
str |
None |
wireless_table_structure_recognition_model_dir |
Directory of the wireless table structure recognition model. If set to None , the official model will be downloaded. |
str |
None |
wired_table_cells_detection_model_name |
Name of the wired table cell detection model. If set to None , the default model will be used. |
str |
None |
wired_table_cells_detection_model_dir |
Directory of the wired table cell detection model. If set to None , the official model will be downloaded. |
str |
None |
wireless_table_cells_detection_model_name |
Name of the wireless table cell detection model. If set to None , the default model will be used. |
str |
None |
wireless_table_cells_detection_model_dir |
Directory of the wireless table cell detection model. If set to None , the official model will be downloaded. |
str |
None |
seal_text_detection_model_name |
Name of the seal text detection model. If set to None , the default model will be used. |
str |
None |
seal_text_detection_model_dir |
Directory of the seal text detection model. If set to None , the official model will be downloaded. |
str |
None |
seal_det_limit_side_len |
Image side length limit for seal text detection.
|
int |
None |
seal_det_limit_type |
Limit type for image side in seal text detection.
|
str |
None |
seal_det_thresh |
Pixel threshold. Pixels with scores above this value in the probability map are considered text.
|
float |
None |
seal_det_box_thresh |
Box threshold. Boxes with average pixel scores above this value are considered text regions.
|
float |
None |
seal_det_unclip_ratio |
Expansion ratio for seal text detection. Higher value means larger expansion area.
|
float |
None |
seal_text_recognition_model_name |
Name of the seal text recognition model. If set to None , the default model will be used. |
str |
None |
seal_text_recognition_model_dir |
Directory of the seal text recognition model. If set to None , the official model will be downloaded. |
str |
None |
seal_text_recognition_batch_size |
Batch size for seal text recognition. If set to None , default is 1 . |
int |
None |
seal_rec_score_thresh |
Recognition score threshold. Text results above this value will be kept.
|
float |
None |
formula_recognition_model_name |
Name of the formula recognition model. If set to None , the default model will be used. |
str |
None |
formula_recognition_model_dir |
Directory of the formula recognition model. If set to None , the official model will be downloaded. |
str |
None |
formula_recognition_batch_size |
Batch size of the formula recognition model. If set to None , the default is 1 . |
int |
None |
use_doc_orientation_classify |
Whether to enable document orientation classification. If set to None , default is True . |
bool |
None |
use_doc_unwarping |
Whether to enable document unwarping. If set to None , default is True . |
bool |
None |
use_seal_recognition |
Whether to enable seal recognition subpipeline. If set to None , default is True . |
bool |
None |
use_table_recognition |
Whether to enable table recognition subpipeline. If set to None , default is True . |
bool |
None |
use_formula_recognition |
Whether to enable formula recognition subpipeline. If set to None , default is True . |
bool |
None |
use_chart_recognition |
Whether to enable chart recognition model. If set to None , default is True . |
bool |
None |
use_region_detection |
Whether to enable region detection submodule for document images. If set to None , default is True . |
bool |
None |
device |
Device for inference. You can specify a device ID.
|
str |
None |
enable_hpi |
Whether to enable high performance inference. | bool |
False |
use_tensorrt |
Whether to use TensorRT for inference acceleration. | bool |
False |
min_subgraph_size |
Minimum subgraph size for optimizing subgraph execution. | int |
3 |
precision |
Computation precision, e.g., fp32, fp16. | str |
fp32 |
enable_mkldnn |
Whether to enable MKL-DNN. If set to None , enabled by default. |
bool |
None |
cpu_threads |
Number of threads to use when inferring on CPU. | int |
8 |
paddlex_config |
Path to the PaddleX pipeline configuration file. | str |
None |
The inference result will be printed in the terminal. The default output of the PP-StructureV3 pipeline is as follows:
👉Click to expand
{'res': {'input_path': '/root/.paddlex/predict_input/pp_structure_v3_demo.png', 'page_index': None, 'model_settings': {'use_doc_preprocessor': True, 'use_seal_recognition': True, 'use_table_recognition': True, 'use_formula_recognition': True}, 'doc_preprocessor_res': {'input_path': None, 'page_index': None, 'model_settings': {'use_doc_orientation_classify': True, 'use_doc_unwarping': True}, 'angle': 0}, 'layout_det_res': {'input_path': None, 'page_index': None, 'boxes': [{'cls_id': 2, 'label': 'text', 'score': 0.9848763942718506, 'coordinate': [743.2788696289062, 777.3158569335938, 1115.24755859375, 1067.84228515625]}, {'cls_id': 2, 'label': 'text', 'score': 0.9827454686164856, 'coordinate': [1137.95556640625, 1127.66943359375, 1524, 1367.6356201171875]}, {'cls_id': 1, 'label': 'image', 'score': 0.9813530445098877, 'coordinate': [755.2349243164062, 184.64149475097656, 1523.7294921875, 684.6146392822266]}, {'cls_id': 2, 'label': 'text', 'score': 0.980336606502533, 'coordinate': [350.7603759765625, 1148.5648193359375, 706.8020629882812, 1367.00341796875]}, {'cls_id': 2, 'label': 'text', 'score': 0.9798877239227295, 'coordinate': [1147.3890380859375, 802.6549072265625, 1523.9051513671875, 994.9046630859375]}, {'cls_id': 2, 'label': 'text', 'score': 0.9724758863449097, 'coordinate': [741.2205810546875, 1074.2657470703125, 1110.120849609375, 1191.2010498046875]}, {'cls_id': 2, 'label': 'text', 'score': 0.9724437594413757, 'coordinate': [355.6563415527344, 899.6616821289062, 710.9073486328125, 1042.1270751953125]}, {'cls_id': 2, 'label': 'text', 'score': 0.9723313450813293, 'coordinate': [0, 181.92404174804688, 334.43384313583374, 330.294677734375]}, {'cls_id': 2, 'label': 'text', 'score': 0.9720360636711121, 'coordinate': [356.7376403808594, 753.35302734375, 714.37841796875, 892.6129760742188]}, {'cls_id': 2, 'label': 'text', 'score': 0.9711183905601501, 'coordinate': [1144.5242919921875, 1001.2548217773438, 1524, 1120.6578369140625]}, {'cls_id': 2, 'label': 'text', 'score': 0.9707457423210144, 'coordinate': [0, 849.873291015625, 325.0664693713188, 1067.2911376953125]}, {'cls_id': 2, 'label': 'text', 'score': 0.9700680375099182, 'coordinate': [363.04437255859375, 289.2635498046875, 719.1571655273438, 427.5818786621094]}, {'cls_id': 2, 'label': 'text', 'score': 0.9693533182144165, 'coordinate': [359.4466857910156, 606.0006103515625, 717.9885864257812, 746.55126953125]}, {'cls_id': 2, 'label': 'text', 'score': 0.9682930111885071, 'coordinate': [0.050221771001815796, 1073.1942138671875, 323.85799154639244, 1191.3121337890625]}, {'cls_id': 2, 'label': 'text', 'score': 0.9649553894996643, 'coordinate': [0.7939082384109497, 1198.5465087890625, 321.2581721544266, 1317.218017578125]}, {'cls_id': 2, 'label': 'text', 'score': 0.9644040465354919, 'coordinate': [0, 337.225830078125, 332.2462143301964, 428.298583984375]}, {'cls_id': 2, 'label': 'text', 'score': 0.9637495279312134, 'coordinate': [365.5925598144531, 188.2151336669922, 718.556640625, 283.7483215332031]}, {'cls_id': 2, 'label': 'text', 'score': 0.9603620767593384, 'coordinate': [355.30633544921875, 1048.5457763671875, 708.771484375, 1141.828369140625]}, {'cls_id': 2, 'label': 'text', 'score': 0.9508902430534363, 'coordinate': [361.0450744628906, 530.7780151367188, 719.6325073242188, 599.1027221679688]}, {'cls_id': 2, 'label': 'text', 'score': 0.9459834694862366, 'coordinate': [0.035085976123809814, 532.7417602539062, 330.5401824116707, 772.7175903320312]}, {'cls_id': 0, 'label': 'paragraph_title', 'score': 0.9400503635406494, 'coordinate': [760.1524658203125, 1214.560791015625, 1085.24853515625, 1274.7890625]}, {'cls_id': 2, 'label': 'text', 'score': 0.9341079592704773, 'coordinate': [1.025873064994812, 777.8804931640625, 326.99016749858856, 844.8532104492188]}, {'cls_id': 0, 'label': 'paragraph_title', 'score': 0.9259933233261108, 'coordinate': [0.11050379276275635, 450.3547058105469, 311.77746546268463, 510.5243835449219]}, {'cls_id': 0, 'label': 'paragraph_title', 'score': 0.9208691716194153, 'coordinate': [380.79510498046875, 447.859130859375, 698.1744384765625, 509.0489807128906]}, {'cls_id': 2, 'label': 'text', 'score': 0.8683002591133118, 'coordinate': [1149.1656494140625, 778.3809814453125, 1339.960205078125, 796.5060424804688]}, {'cls_id': 2, 'label': 'text', 'score': 0.8455104231834412, 'coordinate': [561.3448486328125, 140.87547302246094, 915.4432983398438, 162.76724243164062]}, {'cls_id': 11, 'label': 'doc_title', 'score': 0.735536515712738, 'coordinate': [76.71978759765625, 0, 1400.4561157226562, 98.32131713628769]}, {'cls_id': 6, 'label': 'figure_title', 'score': 0.7187536954879761, 'coordinate': [790.4249267578125, 704.4551391601562, 1509.9013671875, 747.6876831054688]}, {'cls_id': 2, 'label': 'text', 'score': 0.6218013167381287, 'coordinate': [737.427001953125, 1296.2047119140625, 1104.2994384765625, 1368]}]}, 'overall_ocr_res': {'input_path': None, 'page_index': None, 'model_settings': {'use_doc_preprocessor': False, 'use_textline_orientation': True}, 'dt_polys': array([[[ 77, 0],
...,
[ 76, 98]],
...,
[[1142, 1350],
...,
[1142, 1367]]], dtype=int16), 'text_det_params': {'limit_side_len': 736, 'limit_type': 'min', 'thresh': 0.3, 'box_thresh': 0.6, 'unclip_ratio': 1.5}, 'text_type': 'general', 'textline_orientation_angles': array([0, ..., 0]), 'text_rec_score_thresh': 0.0, 'rec_texts': ['助力双方交往', '搭建友谊桥梁', '本报记者沈小晓任彦', '黄培照', '身着中国传统民族服装的厄立特里亚青', '厄立特里亚高等教育与研究院合作建立,开', '次登台表演中国民族舞、现代舞、扇子舞', '设了中国语言课程和中国文化课程,注册学', '曼妙的舞姿赢得现场观众阵阵掌声。这', '生2万余人次。10余年来,厄特孔院已成为', '日前厄立特里亚高等教育与研究院孔子学', '当地民众了解中国的一扇窗口。', '以下简称"厄特孔院")举办“喜迎新年"中国', '黄鸣飞表示,随着来学习中文的人日益', '舞比赛的场景。', '增多,阿斯马拉大学教学点已难以满足教学', '中国和厄立特里亚传统友谊深厚。近年', '需要。2024年4月,由中企蜀道集团所属四', '在高质量共建"一带一路"框架下,中厄两', '川路桥承建的孔院教学楼项目在阿斯马拉开', '人文交流不断深化,互利合作的民意基础', '工建设,预计今年上半年竣工,建成后将为厄', '益深厚。', '特孔院提供全新的办学场地。', '学好中文,我们的', '□', '在中国学习的经历', '未来不是梦”', '让我看到更广阔的世界”', '“鲜花曾告诉我你怎样走过,大地知道你', '多年来,厄立特里亚广大赴华留学生和', '中的每一个角落"厄立特里亚阿斯马拉', '培训人员积极投身国家建设,成为助力该国', '综合楼二层,一阵优美的歌声在走廊里回', '发展的人才和厄中友好的见证者和推动者。', '循着熟悉的旋律轻轻推开一间教室的门,', '在厄立特里亚全国妇女联盟工作的约翰', '们正跟着老师学唱中文歌曲《同一首歌》。', '娜·特韦尔德·凯莱塔就是其中一位。她曾在', '这是厄特孔院阿斯马拉大学教学点的一', '中华女子学院攻读硕士学位,研究方向是女', '中文歌曲课。为了让学生们更好地理解歌', '性领导力与社会发展。其间,她实地走访中国', '大意,老师尤斯拉·穆罕默德萨尔·侯赛因逐', '多个地区,获得了观察中国社会发展的第一', '在厄立特里亚不久前举办的第六届中国风筝文化节上,当地小学生体验风筝制作。', '译和解释歌词。随着伴奏声响起,学生们', '手资料。', '中国驻厄立特里亚大使馆供图', '昌边随着节拍摇动身体,现场气氛热烈。', '谈起在中国求学的经历,约翰娜记忆犹', '“这是中文歌曲初级班,共有32人。学', '新:“中国的发展在当今世界是独一无二的。', '“不管远近都是客人,请不用客气;相约', '瓦的北红海省博物馆。', '大部分来自首都阿斯马拉的中小学,年龄', '沿着中国特色社会主义道路坚定前行,中国', '好了在一起,我们欢迎你…"在一场中厄青', '博物馆二层陈列着一个发掘自阿杜禾', '小的仅有6岁。"尤斯拉告诉记者。', '创造了发展奇迹,这一切都离不开中国共产党', '年联谊活动上,四川路桥中方员工同当地大', '斯古城的中国古代陶制酒器,罐身上写', '尤斯拉今年23岁,是厄立特里亚一所公立', '的领导。中国的发展经验值得许多国家学习', '学生合唱《北京欢迎你》。厄立特里亚技术学', '“万”“和”“禅”“山"等汉字。“这件文物证', '交的艺术老师。她12岁开始在厄特孔院学', '借鉴。”', '院计算机科学与工程专业学生鲁夫塔·谢拉', '明,很早以前我们就通过海上丝绸之路进行', '中文,在2017年第十届“汉语桥"世界中学生', '正在西南大学学习的厄立特里亚博士生', '是其中一名演唱者,她很早便在孔院学习中', '贸易往来与文化交流。这也是厄立特里亚', '文比赛中获得厄立特里亚赛区第一名,并和', '穆卢盖塔·泽穆伊对中国怀有深厚感情。8', '文,一直在为去中国留学作准备。“这句歌词', '与中国友好交往历史的有力证明。”北红海', '半代表厄立特里亚前往中国参加决赛,获得', '年前,在北京师范大学获得硕士学位后,穆卢', '是我们两国人民友谊的生动写照。无论是投', '省博物馆研究与文献部负责人伊萨亚斯·特', '本优胜奖。2022年起,尤斯拉开始在厄特孔', '盖塔在社交媒体上写下这样一段话:“这是我', '身于厄立特里亚基础设施建设的中企员工,', '斯法兹吉说。', '兼职教授中文歌曲,每周末两个课时。“中国', '人生的重要一步,自此我拥有了一双坚固的', '还是在中国留学的厄立特里亚学子,两国人', '厄立特里亚国家博物馆考古学和人类学', '化博大精深,我希望我的学生们能够通过中', '鞋子,赋予我穿越荆棘的力量。”', '民携手努力,必将推动两国关系不断向前发', '研究员菲尔蒙·特韦尔德十分喜爱中国文', '软曲更好地理解中国文化。"她说。', '穆卢盖塔密切关注中国在经济、科技、教', '展。"鲁夫塔说。', '化。他表示:“学习彼此的语言和文化,将帮', '“姐姐,你想去中国吗?"“非常想!我想', '育等领域的发展,“中国在科研等方面的实力', '厄立特里亚高等教育委员会主任助理萨', '助厄中两国人民更好地理解彼此,助力双方', '看故宫、爬长城。"尤斯拉的学生中有一对', '与日俱增。在中国学习的经历让我看到更广', '马瑞表示:“每年我们都会组织学生到中国访', '交往,搭建友谊桥梁。”', '软善舞的姐妹,姐姐露娅今年15岁,妹妹', '阔的世界,从中受益匪浅。”', '问学习,目前有超过5000名厄立特里亚学生', '厄立特里亚国家博物馆馆长塔吉丁·', '亚14岁,两人都已在厄特孔院学习多年,', '23岁的莉迪亚·埃斯蒂法诺斯已在厄特', '在中国留学。学习中国的教育经验,有助于', '里达姆·优素福曾多次访问中国,对中华文明', '文说得格外流利。', '孔院学习3年,在中国书法、中国画等方面表', '提升厄立特里亚的教育水平。”', '的传承与创新、现代化博物馆的建设与发用', '露娅对记者说:“这些年来,怀着对中文', '现十分优秀,在2024年厄立特里亚赛区的', '印象深刻。“中国博物馆不仅有许多保存完好', '“共同向世界展示非', '中国文化的热爱,我们姐妹俩始终相互鼓', '“汉语桥”比赛中获得一等奖。莉迪亚说:“学', '的文物,还充分运用先进科技手段进行展示', '一起学习。我们的中文一天比一天好,还', '习中国书法让我的内心变得安宁和纯粹。我', '洲和亚洲的灿烂文明”', '帮助人们更好理解中华文明。"塔吉丁说,“', '了中文歌和中国舞。我们一定要到中国', '也喜欢中国的服饰,希望未来能去中国学习,', '立特里亚与中国都拥有悠久的文明,始终木', '学好中文,我们的未来不是梦!”', '把中国不同民族元素融入服装设计中,创作', '从阿斯马拉出发,沿着蜿蜒曲折的盘山', '互理解、相互尊重。我希望未来与中国同行', '据厄特孔院中方院长黄鸣飞介绍,这所', '出更多精美作品,也把厄特文化分享给更多', '公路一路向东寻找丝路印迹。驱车两个小', '加强合作,共同向世界展示非洲和亚洲的灿', '中贵州财经大学和', '的中国朋友。”', '时,记者来到位于厄立特里亚港口城市马萨', '烂文明。”'], 'rec_scores': array([0.99875408, ..., 0.98324996]), 'rec_polys': array([[[ 77, 0],
...,
[ 76, 98]],
...,
[[1142, 1350],
...,
[1142, 1367]]], dtype=int16), 'rec_boxes': array([[ 76, ..., 103],
...,
[1142, ..., 1367]], dtype=int16)}}}
For explanation of the result parameters, refer to 2.2 Python Script Integration.
Note: Due to the large size of the default model in the pipeline, the inference speed may be slow. You can refer to the model list in Section 1 to replace it with a faster model.
2.2 Python Script Integration¶
The command line method is for quick testing and visualization. In actual projects, you usually need to integrate the model via code. You can perform pipeline inference with just a few lines of code as shown below:
from paddleocr import PPStructureV3
pipeline = PPStructureV3()
# ocr = PPStructureV3(use_doc_orientation_classify=True) # Use use_doc_orientation_classify to enable/disable document orientation classification model
# ocr = PPStructureV3(use_doc_unwarping=True) # Use use_doc_unwarping to enable/disable document unwarping module
# ocr = PPStructureV3(use_textline_orientation=True) # Use use_textline_orientation to enable/disable textline orientation classification model
# ocr = PPStructureV3(device="gpu") # Use device to specify GPU for model inference
output = pipeline.predict("./pp_structure_v3_demo.png")
for res in output:
res.print() ## Print the structured prediction output
res.save_to_json(save_path="output") ## Save the current image's structured result in JSON format
res.save_to_markdown(save_path="output") ## Save the current image's result in Markdown format
For PDF files, each page will be processed individually and generate a separate Markdown file. If you want to convert the entire PDF to a single Markdown file, use the following method:
from pathlib import Path
from paddleocr import PPStructureV3
input_file = "./your_pdf_file.pdf"
output_path = Path("./output")
pipeline = PPStructureV3()
output = pipeline.predict("./pp_structure_v3_demo.png")
markdown_list = []
markdown_images = []
for res in output:
md_info = res.markdown
markdown_list.append(md_info)
markdown_images.append(md_info.get("markdown_images", {}))
markdown_texts = pipeline.concatenate_markdown_pages(markdown_list)
mkd_file_path = output_path / f"{Path(input_file).stem}.md"
mkd_file_path.parent.mkdir(parents=True, exist_ok=True)
with open(mkd_file_path, "w", encoding="utf-8") as f:
f.write(markdown_texts)
for item in markdown_images:
if item:
for path, image in item.items():
file_path = output_path / path
file_path.parent.mkdir(parents=True, exist_ok=True)
image.save(file_path)
Note:
-
The default text recognition model used by PP-StructureV3 is a Chinese-English recognition model, which has limited accuracy for purely English texts. For English-only scenarios, you can set the
text_recognition_model_name
parameter to an English model such asen_PP-OCRv4_mobile_rec
to achieve better recognition performance. For other languages, refer to the model list above and select the appropriate language recognition model for replacement. -
In the example code, the parameters
use_doc_orientation_classify
,use_doc_unwarping
, anduse_textline_orientation
are all set toFalse
by default. These indicate that document orientation classification, document image unwarping, and textline orientation classification are disabled. You can manually set them toTrue
if needed.
The above Python script performs the following steps:
(1) Instantiate PPStructureV3
to create the pipeline object. The parameter descriptions are as follows:
Parameter | Description | Type | Default |
---|---|---|---|
layout_detection_model_name |
Name of the layout detection model. If set to None , the pipeline default model is used. |
str |
None |
layout_detection_model_dir |
Directory path of the layout detection model. If set to None , the official model will be downloaded. |
str |
None |
layout_threshold |
Score threshold for the layout model.
|
float|dict |
None |
layout_nms |
Whether to use NMS post-processing for the layout detection model. | bool |
None |
layout_unclip_ratio |
Expansion ratio for the bounding boxes from the layout detection model.
|
float|Tuple[float,float]|dict |
None |
layout_merge_bboxes_mode |
Filtering method for overlapping boxes in layout detection.
|
str|dict |
None |
chart_recognition_model_name |
Name of the chart recognition model. If set to None , the pipeline default model is used. |
str |
None |
chart_recognition_model_dir |
Directory path of the chart recognition model. If set to None , the official model will be downloaded. |
str |
None |
chart_recognition_batch_size |
Batch size for the chart recognition model. If set to None , the default is 1 . |
int |
None |
region_detection_model_name |
Name of the region detection model for sub-modules in document layout. If set to None , the pipeline default model is used. |
str |
None |
region_detection_model_dir |
Directory path of the region detection model. If set to None , the official model will be downloaded. |
str |
None |
doc_orientation_classify_model_name |
Name of the document orientation classification model. If set to None , the pipeline default model is used. |
str |
None |
doc_orientation_classify_model_dir |
Directory path of the document orientation classification model. If set to None , the official model will be downloaded. |
str |
None |
doc_unwarping_model_name |
Name of the document unwarping model. If set to None , the pipeline default model is used. |
str |
None |
doc_unwarping_model_dir |
Directory path of the document unwarping model. If set to None , the official model will be downloaded. |
str |
None |
text_detection_model_name |
Name of the text detection model. If set to None , the pipeline default model is used. |
str |
None |
text_detection_model_dir |
Directory path of the text detection model. If set to None , the official model will be downloaded. |
str |
None |
text_det_limit_side_len |
Maximum side length limit for text detection.
|
int |
None |
text_det_limit_type |
|
str |
None |
text_det_thresh |
Pixel threshold for detection. Pixels in the output probability map with scores above this value are considered as text pixels.
|
float |
None |
text_det_box_thresh |
Bounding box threshold. If the average score of all pixels inside the box exceeds this threshold, it is considered a text region.
|
float |
None |
text_det_unclip_ratio |
Expansion ratio for text detection. The larger the value, the more the text region is expanded.
|
float |
None |
textline_orientation_model_name |
Name of the textline orientation model. If set to None , the pipeline default model is used. |
str |
None |
textline_orientation_model_dir |
Directory path of the textline orientation model. If set to None , the official model will be downloaded. |
str |
None |
textline_orientation_batch_size |
Batch size for the textline orientation model. If set to None , the default batch size is 1 . |
int |
None |
text_recognition_model_name |
Name of the text recognition model. If set to None , the pipeline default model is used. |
str |
None |
text_recognition_model_dir |
Directory path of the text recognition model. If set to None , the official model will be downloaded. |
str |
None |
text_recognition_batch_size |
Batch size for the text recognition model. If set to None , the default batch size is 1 . |
int |
None |
text_rec_score_thresh |
Score threshold for text recognition. Only results with scores above this threshold will be retained.
|
float |
None |
table_classification_model_name |
Name of the table classification model. If set to None , the pipeline default model is used. |
str |
None |
table_classification_model_dir |
Directory path of the table classification model. If set to None , the official model will be downloaded. |
str |
None |
wired_table_structure_recognition_model_name |
Name of the wired table structure recognition model. If set to None , the pipeline default model is used. |
str |
None |
wired_table_structure_recognition_model_dir |
Directory path of the wired table structure recognition model. If set to None , the official model will be downloaded. |
str |
None |
wireless_table_structure_recognition_model_name |
Name of the wireless table structure recognition model. If set to None , the pipeline default model is used. |
str |
None |
wireless_table_structure_recognition_model_dir |
Directory path of the wireless table structure recognition model. If set to None , the official model will be downloaded. |
str |
None |
wired_table_cells_detection_model_name |
Name of the wired table cell detection model. If set to None , the pipeline default model is used. |
str |
None |
wired_table_cells_detection_model_dir |
Directory path of the wired table cell detection model. If set to None , the official model will be downloaded. |
str |
None |
wireless_table_cells_detection_model_name |
Name of the wireless table cell detection model. If set to None , the pipeline default model is used. |
str |
None |
wireless_table_cells_detection_model_dir |
Directory path of the wireless table cell detection model. If set to None , the official model will be downloaded. |
str |
None |
seal_text_detection_model_name |
Name of the seal text detection model. If set to None , the pipeline default model is used. |
str |
None |
seal_text_detection_model_dir |
Directory path of the seal text detection model. If set to None , the official model will be downloaded. |
str |
None |
seal_det_limit_side_len |
Image side length limit for seal text detection.
|
int |
None |
seal_det_limit_type |
Limit type for seal text detection image side length.
|
str |
None |
seal_det_thresh |
Pixel threshold for detection. Pixels with scores greater than this value in the probability map are considered text pixels.
|
float |
None |
seal_det_box_thresh |
Bounding box threshold. If the average score of all pixels inside a detection box exceeds this threshold, it is considered a text region.
|
float |
None |
seal_det_unclip_ratio |
Expansion ratio for seal text detection. The larger the value, the larger the expanded area.
|
float |
None |
seal_text_recognition_model_name |
Name of the seal text recognition model. If set to None , the pipeline default model is used. |
str |
None |
seal_text_recognition_model_dir |
Directory path of the seal text recognition model. If set to None , the official model will be downloaded. |
str |
None |
seal_text_recognition_batch_size |
Batch size for the seal text recognition model. If set to None , the default value is 1 . |
int |
None |
seal_rec_score_thresh |
Score threshold for seal text recognition. Text results with scores above this threshold will be retained.
|
float |
None |
formula_recognition_model_name |
Name of the formula recognition model. If set to None , the pipeline default model is used. |
str |
None |
formula_recognition_model_dir |
Directory path of the formula recognition model. If set to None , the official model will be downloaded. |
str |
None |
formula_recognition_batch_size |
Batch size for the formula recognition model. If set to None , the default value is 1 . |
int |
None |
use_doc_orientation_classify |
Whether to enable the document orientation classification module. If set to None , the default value is True . |
bool |
None |
use_doc_unwarping |
Whether to enable the document image unwarping module. If set to None , the default value is True . |
bool |
None |
use_chart_recognition |
Whether to enable the chart recognition model. If set to None , the default value is True . |
bool |
None |
use_region_detection |
Whether to enable the region detection model for document layout. If set to None , the default value is True . |
bool |
None |
device |
Device used for inference. Supports specifying device ID.
|
str |
None |
enable_hpi |
Whether to enable high-performance inference. | bool |
False |
use_tensorrt |
Whether to use TensorRT for accelerated inference. | bool |
False |
min_subgraph_size |
Minimum subgraph size used to optimize model subgraph computation. | int |
3 |
precision |
Computation precision, e.g., fp32, fp16. | str |
fp32 |
enable_mkldnn |
Whether to enable MKL-DNN acceleration. If set to None , MKL-DNN is enabled by default. |
bool |
None |
cpu_threads |
Number of threads used for inference on CPU. | int |
8 |
paddlex_config |
Path to the PaddleX pipeline configuration file. | str |
None |
(2) Call the predict()
method of the PP-StructureV3 pipeline object for inference. This method returns a result list. The pipeline also provides a predict_iter()
method. Both methods accept the same parameters and return the same type of results. The only difference is that predict_iter()
returns a generator
that allows incremental processing and retrieval of prediction results, which is useful for handling large datasets or saving memory. Choose the method that fits your needs. Below are the parameters of the predict()
method:
Parameter | Description | Type | Default |
---|---|---|---|
input |
Input data to be predicted. Required. Supports multiple types:
|
Python Var|str|list |
|
device |
Same as the parameter used during initialization. | str |
None |
use_doc_orientation_classify |
Whether to use document orientation classification during inference. | bool |
None |
use_doc_unwarping |
Whether to use document image unwarping during inference. | bool |
None |
use_textline_orientation |
Whether to use textline orientation classification during inference. | bool |
None |
use_seal_recognition |
Whether to use the seal recognition sub-pipeline during inference. | bool |
None |
use_table_recognition |
Whether to use the table recognition sub-pipeline during inference. | bool |
None |
use_formula_recognition |
Whether to use the formula recognition sub-pipeline during inference. | bool |
None |
layout_threshold |
Same as the parameter used during initialization. | float|dict |
None |
layout_nms |
Same as the parameter used during initialization. | bool |
None |
layout_unclip_ratio |
Same as the parameter used during initialization. | float|Tuple[float,float]|dict |
None |
layout_merge_bboxes_mode |
Same as the parameter used during initialization. | str|dict |
None |
text_det_limit_side_len |
Same as the parameter used during initialization. | int |
None |
text_det_limit_type |
Same as the parameter used during initialization. | str |
None |
text_det_thresh |
Same as the parameter used during initialization. | float |
None |
text_det_box_thresh |
Same as the parameter used during initialization. | float |
None |
text_det_unclip_ratio |
Same as the parameter used during initialization. | float |
None |
text_rec_score_thresh |
Same as the parameter used during initialization. | float |
None |
seal_det_limit_side_len |
Same as the parameter used during initialization. | int |
None |
seal_det_limit_type |
Same as the parameter used during initialization. | str |
None |
seal_det_thresh |
Same as the parameter used during initialization. | float |
None |
seal_det_box_thresh |
Same as the parameter used during initialization. | float |
None |
seal_det_unclip_ratio |
Same as the parameter used during initialization. | float |
None |
seal_rec_score_thresh |
Same as the parameter used during initialization. | float |
None |
(3) Process the prediction results: each prediction result corresponds to a Result object, which supports printing, saving as image, or saving as a json
file:
Method | Description | Parameter | Type | Parameter Description | Default |
---|---|---|---|---|---|
print() |
Print result to terminal | format_json |
bool |
Whether to format output as indented JSON |
True |
indent |
int |
Indentation level to beautify the JSON output. Only effective when format_json=True |
4 | ||
ensure_ascii |
bool |
Whether to escape non-ASCII characters to Unicode . When True , all non-ASCII characters are escaped. When False , original characters are retained. Only effective when format_json=True |
False |
||
save_to_json() |
Save result as a JSON file | save_path |
str |
Path to save the file. If a directory, the filename will be based on the input type | None |
indent |
int |
Indentation level for beautified JSON output. Only effective when format_json=True |
4 | ||
ensure_ascii |
bool |
Whether to escape non-ASCII characters to Unicode . Only effective when format_json=True |
False |
||
save_to_img() |
Save intermediate visualization results as PNG image files | save_path |
str |
Path to save the file, supports directory or file path | None |
save_to_markdown() |
Save each page of an image or PDF file as a markdown file | save_path |
str |
Path to save the file, supports directory or file path | None |
save_to_html() |
Save tables in the file as HTML format | save_path |
str |
Path to save the file, supports directory or file path | None |
save_to_xlsx() |
Save tables in the file as XLSX format | save_path |
str |
Path to save the file, supports directory or file path | None |
concatenate_markdown_pages() |
Concatenate multiple markdown pages into a single document | markdown_list |
list |
List of markdown data for each page | Returns the merged markdown text and image list |
Attribute | Description |
---|---|
json |
Get the prediction result in json format |
img |
Get visualized image results as a dict |
markdown |
Get markdown results as a dict |
3. Development Integration / Deployment¶
If the pipeline meets your requirements for inference speed and accuracy, you can proceed with development integration or deployment.
If you want to directly use the pipeline in your Python project, refer to the example code in 2.2 Python script mode.
In addition, PaddleOCR provides two other deployment options described in detail below:
🚀 High-Performance Inference: In production environments, many applications have strict performance requirements (especially response speed) to ensure system efficiency and smooth user experience. PaddleOCR offers a high-performance inference option that deeply optimizes model inference and pre/post-processing for significant end-to-end acceleration. For detailed high-performance inference workflow, refer to High Performance Inference.
☁️ Service Deployment: Service-based deployment is common in production. It encapsulates the inference logic as a service, allowing clients to access it via network requests to obtain results. For detailed instructions on service deployment, refer to Service Deployment.
Below is the API reference and multi-language service invocation examples for basic service deployment:
API Reference
Main operations provided by the service:
- HTTP method: POST
- Request and response bodies are both JSON objects.
- When the request is successful, the response status code is
200
, and the response body contains:
Name | Type | Description |
---|---|---|
logId |
string |
UUID of the request |
errorCode |
integer |
Error code, fixed to 0 |
errorMsg |
string |
Error message, fixed to "Success" |
result |
object |
Operation result |
- When the request fails, the response body includes:
Name | Type | Description |
---|---|---|
logId |
string |
UUID of the request |
errorCode |
integer |
Error code, same as HTTP status code |
errorMsg |
string |
Error message |
Main operation provided:
infer
Perform layout parsing.
POST /layout-parsing
- Request body parameters:
Name | Type | Description | Required |
---|---|---|---|
file |
string |
URL of image or PDF file accessible to the server, or base64-encoded file content. By default, only the first 10 pages of a PDF are processed. To remove this limit, add the following to the pipeline config:
|
Yes |
fileType |
integer |null |
File type. 0 for PDF, 1 for image. If omitted, the type is inferred from the URL. |
No |
useDocOrientationClassify |
boolean | null |
Refer to the use_doc_orientation_classify parameter in the pipeline’s predict method. |
No |
useDocUnwarping |
boolean | null |
Refer to the use_doc_unwarping parameter in the pipeline’s predict method. |
No |
useTextlineOrientation |
boolean | null |
Refer to the use_textline_orientation parameter in the pipeline’s predict method. |
No |
useSealRecognition |
boolean | null |
Refer to the use_seal_recognition parameter in the pipeline’s predict method. |
No |
useTableRecognition |
boolean | null |
Refer to the use_table_recognition parameter in the pipeline’s predict method. |
No |
useFormulaRecognition |
boolean | null |
Refer to the use_formula_recognition parameter in the pipeline’s predict method. |
No |
layoutThreshold |
number | null |
Refer to the layout_threshold parameter in the pipeline’s predict method. |
No |
layoutNms |
boolean | null |
Refer to the layout_nms parameter in the pipeline’s predict method. |
No |
layoutUnclipRatio |
number | array | object | null |
Refer to the layout_unclip_ratio parameter in the pipeline’s predict method. |
No |
layoutMergeBboxesMode |
string | object | null |
Refer to the layout_merge_bboxes_mode parameter in the pipeline’s predict method. |
No |
textDetLimitSideLen |
integer | null |
Refer to the text_det_limit_side_len parameter in the pipeline’s predict method. |
No |
textDetLimitType |
string | null |
Refer to the text_det_limit_type parameter in the pipeline’s predict method. |
No |
textDetThresh |
number | null |
Refer to the text_det_thresh parameter in the pipeline’s predict method. |
No |
textDetBoxThresh |
number | null |
Refer to the text_det_box_thresh parameter in the pipeline’s predict method. |
No |
textDetUnclipRatio |
number | null |
Refer to the text_det_unclip_ratio parameter in the pipeline’s predict method. |
No |
textRecScoreThresh |
number | null |
Refer to the text_rec_score_thresh parameter in the pipeline’s predict method. |
No |
sealDetLimitSideLen |
integer | null |
Refer to the seal_det_limit_side_len parameter in the pipeline’s predict method. |
No |
sealDetLimitType |
string | null |
Refer to the seal_det_limit_type parameter in the pipeline’s predict method. |
No |
sealDetThresh |
number | null |
Refer to the seal_det_thresh parameter in the pipeline’s predict method. |
No |
sealDetBoxThresh |
number | null |
Refer to the seal_det_box_thresh parameter in the pipeline’s predict method. |
No |
sealDetUnclipRatio |
number | null |
Refer to the seal_det_unclip_ratio parameter in the pipeline’s predict method. |
No |
sealRecScoreThresh |
number | null |
Refer to the seal_rec_score_thresh parameter in the pipeline’s predict method. |
No |
useTableCellsOcrResults |
boolean |
Refer to the use_table_cells_ocr_results parameter in the pipeline’s predict method. |
No |
useE2eWiredTableRecModel |
boolean |
Refer to the use_e2e_wired_table_rec_model parameter in the pipeline’s predict method. |
No |
useE2eWirelessTableRecModel |
boolean |
Refer to the use_e2e_wireless_table_rec_model parameter in the pipeline’s predict method. |
No |
- When the request is successful, the
result
field of the response contains the following attributes:
Name | Type | Description |
---|---|---|
layoutParsingResults |
array |
Layout parsing results. The array length is 1 (for image input) or the number of processed pages (for PDF input). For PDF input, each element corresponds to one processed page. |
dataInfo |
object |
Information about the input data. |
Each element in layoutParsingResults
is an object
with the following attributes:
Name | Type | Description |
---|---|---|
prunedResult |
object |
A simplified version of the res field from the JSON output of the pipeline’s predict method, with input_path and page_index removed. |
markdown |
object |
Markdown result. |
outputImages |
object | null |
Refer to the pipeline’s img attribute. Images are JPEG encoded in Base64. |
inputImage |
string | null |
Input image. JPEG encoded in Base64. |
The markdown
object has the following attributes:
Name | Type | Description |
---|---|---|
text |
string |
Markdown text. |
images |
object |
Key-value pairs of image relative paths and base64-encoded image content. |
isStart |
boolean |
Whether the first element on the current page is the start of a paragraph. |
isEnd |
boolean |
Whether the last element on the current page is the end of a paragraph. |
Multi-language Service Call Examples
Python
import base64
import requests
import pathlib
API_URL = "http://localhost:8080/layout-parsing" # Service URL
image_path = "./demo.jpg"
# Encode the local image to Base64
with open(image_path, "rb") as file:
image_bytes = file.read()
image_data = base64.b64encode(image_bytes).decode("ascii")
payload = {
"file": image_data, # Base64-encoded file content or file URL
"fileType": 1, # File type, 1 indicates image file
}
# Call the API
response = requests.post(API_URL, json=payload)
# Handle the response data
assert response.status_code == 200
result = response.json()["result"]
for i, res in enumerate(result["layoutParsingResults"]):
print(res["prunedResult"])
md_dir = pathlib.Path(f"markdown_{i}")
md_dir.mkdir(exist_ok=True)
(md_dir / "doc.md").write_text(res["markdown"]["text"])
for img_path, img in res["markdown"]["images"].items():
img_path = md_dir / img_path
img_path.parent.mkdir(parents=True, exist_ok=True)
img_path.write_bytes(base64.b64decode(img))
print(f"Markdown document saved at {md_dir / 'doc.md'}")
for img_name, img in res["outputImages"].items():
img_path = f"{img_name}_{i}.jpg"
with open(img_path, "wb") as f:
f.write(base64.b64decode(img))
print(f"Output image saved at {img_path}")
4. Secondary Development¶
If the default model weights provided by the PP-StructureV3 pipeline do not meet your accuracy or speed requirements in your scenario, you can try fine-tuning the existing model using your own domain-specific or application-specific data to improve the performance of the PP-StructureV3 pipeline for your use case.
4.1 Model Fine-tuning¶
Since the PP-StructureV3 pipeline contains multiple modules, unsatisfactory results may originate from any individual module. You can analyze the problematic cases with poor extraction performance, visualize the images, identify the specific module causing the issue, and then refer to the fine-tuning tutorials linked in the table below to perform model fine-tuning.
Scenario | Fine-tuning Module | Fine-tuning Reference Link |
---|---|---|
Inaccurate layout detection, such as missing seals or tables | Layout Detection Module | Link |
Inaccurate table structure recognition | Table Structure Recognition Module | Link |
Inaccurate formula recognition | Formula Recognition Module | Link |
Missing seal text detection | Seal Text Detection Module | Link |
Missing text detection | Text Detection Module | Link |
Incorrect text recognition results | Text Recognition Module | Link |
Incorrect correction of vertical or rotated text lines | Text Line Orientation Classification Module | Link |
Incorrect correction of full image orientation | Document Image Orientation Classification Module | Link |
Inaccurate image distortion correction | Text Image Correction Module | Fine-tuning not supported yet |
4.2 Model Deployment¶
Once you have completed fine-tuning with your private dataset, you will obtain the local model weights. You can then use these fine-tuned weights by customizing the pipeline configuration file.
- Export the pipeline configuration file
You can call the export_paddlex_config_to_yaml
method of the PPStructureV3 object in PaddleOCR to export the current pipeline configuration as a YAML file:
from paddleocr import PPStructureV3
pipeline = PPStructureV3()
pipeline.export_paddlex_config_to_yaml("PP-StructureV3.yaml")
- Modify the configuration file
After obtaining the default pipeline configuration file, replace the corresponding path in the configuration with the local path of your fine-tuned model weights. For example:
...... SubModules: LayoutDetection: module_name: layout_detection model_name: PP-DocLayout_plus-L model_dir: null # Replace with the path to the fine-tuned layout detection model weights ...... SubPipelines: GeneralOCR: pipeline_name: OCR text_type: general use_doc_preprocessor: False use_textline_orientation: False SubModules: TextDetection: module_name: text_detection model_name: PP-OCRv5_server_det model_dir: null # Replace with the path to the fine-tuned text detection model weights limit_side_len: 960 limit_type: max max_side_limit: 4000 thresh: 0.3 box_thresh: 0.6 unclip_ratio: 1.5 TextRecognition: module_name: text_recognition model_name: PP-OCRv5_server_rec model_dir: null # Replace with the path to the fine-tuned text recognition model weights batch_size: 1 score_thresh: 0 ......
The pipeline configuration file not only includes parameters supported by the PaddleOCR CLI and Python API but also allows for more advanced configurations. For more details, refer to the corresponding pipeline usage tutorial in the PaddleX Pipeline Usage Overview, and adjust the configurations as needed based on your requirements.
- Load the pipeline configuration file via CLI
After modifying the configuration file, specify the updated pipeline configuration path using the --paddlex_config
parameter in the command line. PaddleOCR will load its content as the pipeline configuration. Example:
- Load the pipeline configuration file via Python API When initializing the pipeline object, you can pass the PaddleX pipeline configuration file path or a configuration dictionary using the paddlex_config parameter. PaddleOCR will load its content as the pipeline configuration. Example: