Introduction to PP-ChatOCRV4¶

PP-ChatOCRv4 is a unique document and image intelligent analysis solution from PaddlePaddle, combining LLM, MLLM, and OCR technologies to address complex document information extraction challenges such as layout analysis, rare characters, multi-page PDFs, tables, and seal recognition. Integrated with ERNIE Bot, it fuses massive data and knowledge, achieving high accuracy and wide applicability. This pipeline also provides flexible service deployment options, supporting deployment on various hardware. Furthermore, it offers custom development capabilities, allowing you to train and fine-tune models on your own datasets, with seamless integration of trained models.

Key Metrics¶

Solution	Avg Recall
GPT-4o	63.47%
PP-ChatOCRv3	70.08%
Qwen2.5-VL-72B	80.26%
PP-ChatOCRv4	85.55%

Demo¶

FAQ¶

Does support other multimodal models?

Yes, only set on pipeline configuration.

How to reduce latency and improve throughput?

Use the High-performance inference plugin, and deploy multi instances.

How to further improve accuracy?

Firstly, it is necessary to check whether the extracted visual information is correct. If the visual information is incorrect, it is necessary to visualize the visual prediction results to determine which model performs poorly, and then fine-tune train the model with more data. If the visual information is correct but cannot extract the correct information, the prompt needs to be adjusted according to the analysing about the question and answer.