Chart Parsing Model Module Usage Tutorial¶
I. Overview¶
Multimodal chart parsing is a cutting-edge technology in the OCR field, focusing on automatically converting various types of visual charts (such as bar charts, line charts, pie charts, etc.) into underlying data tables and formatting the output. Traditional methods rely on complex orchestration of models like chart key point detection, which involves many prior assumptions and lacks robustness. The models in this module utilize the latest VLM technology, driven by data, learning robust features from massive real-world data. Its application scenarios cover financial analysis, academic research, business reports, etc. — for instance, quickly extracting growth trend data from financial statements, experimental comparison values from scientific papers, or user distribution statistics from market research, assisting users in shifting from "viewing charts" to "using data."
II. Supported Model List¶
Model | Model Download Link | Model parameter size(B) | Model Storage Size (GB) | Model Score | Description |
---|---|---|---|---|---|
PP-Chart2Table | Inference Model | 0.58 | 1.4 | 75.98 | PP-Chart2Table is a self-developed multimodal model by the PaddlePaddle team, focusing on chart parsing, demonstrating outstanding performance in both Chinese and English chart parsing tasks. The team adopted a carefully designed data generation strategy, constructing a high-quality multimodal dataset of nearly 700,000 entries covering common chart types like pie charts, bar charts, stacked area charts, and various application scenarios. They also designed a two-stage training method, utilizing large model distillation to fully leverage massive unlabeled OOD data. In internal business tests in both Chinese and English scenarios, PP-Chart2Table not only achieved the SOTA level among models of the same parameter scale but also reached accuracy comparable to 7B parameter scale VLM models in critical scenarios. |
Note: The above model scores are the results of internal evaluation set model testing, with a total of 1801 data points, including various chart types such as bar charts, line charts, and pie charts for testing samples under various scenarios such as financial reports, laws and regulations, contracts, etc. There are currently no plans to make them public.
III. Quick Integration¶
❗ Before quick integration, please install the PaddleX wheel package. For details, please refer to PaddleX Local Installation Tutorial
After completing the installation of the whl package, inference of the document-like visual language model module can be completed with just a few lines of code. You can freely switch models under this module, and you can also integrate model inference from the open document-like visual language model module into your project. Before running the following code, please download the sample image locally.
from paddlex import create_model
model = create_model('PP-Chart2Table')
results = model.predict(
input={"image": "chart_parsing_02.png"},
batch_size=1
)
for res in results:
res.print()
res.save_to_json(f"./output/res.json")
After running, the result is:
{'res': {'image': 'chart_parsing_02.png', 'result': '年份 | 单家五星级旅游饭店年平均营收 (百万元) | 单家五星级旅游饭店年平均利润 (百万元)\n2018 | 104.22 | 9.87\n2019 | 99.11 | 7.47\n2020 | 57.87 | -3.87\n2021 | 68.99 | -2.9\n2022 | 56.29 | -9.48\n2023 | 87.99 | 5.96'}}
image
: Indicates the path of the input image to be predicted
- result
: The result information predicted by the model
The visualized printed prediction result is as follows:
年份 | 单家五星级旅游饭店年平均营收 (百万元) | 单家五星级旅游饭店年平均利润 (百万元)
2018 | 104.22 | 9.87
2019 | 99.11 | 7.47
2020 | 57.87 | -3.87
2021 | 68.99 | -2.9
2022 | 56.29 | -9.48
2023 | 87.99 | 5.96
Related methods, parameters, and descriptions are as follows:
create_model
instantiates the document-like visual language model (takingPP-Chart2Table
as an example here), with specific explanations as follows:
Parameter | Description | Type | Options | Default |
---|---|---|---|---|
model_name |
Model name | str |
None | None |
model_dir |
Model storage path | str |
None | None |
device |
Model inference device | str |
Support specifying specific GPU card number, such as "gpu:0", other hardware specific card numbers, such as "npu:0", CPU as "cpu". | gpu:0 |
use_hpip |
Whether to enable high-performance inference plugins. Currently not supported. | bool |
None | False |
hpi_config |
High-performance inference configuration. Currently not supported. | dict | None |
None | None |
-
Among them,
model_name
must be specified. After specifyingmodel_name
, the default PaddleX built-in model parameters are used. On this basis, ifmodel_dir
is specified, the user-defined model is used. -
Call the
predict()
method of the document-like visual language model for inference prediction. Thepredict()
method parameters includeinput
,batch_size
, with specific explanations as follows:
Parameter | Description | Type | Options | Default |
---|---|---|---|---|
input |
Data to be predicted | dict |
Dict , as multimodal models have different input requirements, it needs to be determined based on the specific model. Specifically:
{'image': image_path} |
None |
batch_size |
Batch size | int |
Integer | 1 |
- Process the prediction results. The prediction result for each sample is the corresponding Result object, which supports operations like printing and saving as a
json
file:
Method | Description | Parameter | Type | Description | Default |
---|---|---|---|---|---|
print() |
Print results to terminal | format_json |
bool |
Whether to format the output content using JSON indentation |
True |
indent |
int |
Specify the indentation level to beautify the output JSON data for better readability, only effective when format_json is True |
4 | ||
ensure_ascii |
bool |
Control whether to escape non-ASCII characters to Unicode . When set to True , all non-ASCII characters will be escaped; False will keep the original characters, only effective when format_json is True |
False |
||
save_to_json() |
Save the result as a json formatted file | save_path |
str |
Path to save the file. When it's a directory, the saved file name matches the input file type name | None |
indent |
int |
Specify the indentation level to beautify the output JSON data for better readability, only effective when format_json is True |
4 | ||
ensure_ascii |
bool |
Control whether to escape non-ASCII characters to Unicode . When set to True , all non-ASCII characters will be escaped; False will keep the original characters, only effective when format_json is True |
False |
- Additionally, it is also possible to obtain prediction results through attributes, as follows:
Attribute | Description |
---|---|
json |
Get the prediction result in json format |
For more information on using the API for single model inference in PaddleX, you can refer to PaddleX Single Model Python Script Usage Instructions.
IV. Secondary Development¶
The current module temporarily does not support fine-tuning training, only inference integration. Support for fine-tuning training in this module is planned for the future.