Skip to content

Chart Parsing Model Module Usage Tutorial

I. Overview

Multimodal chart parsing is a cutting-edge technology in the OCR field, focusing on automatically converting various types of visual charts (such as bar charts, line charts, pie charts, etc.) into underlying data tables and formatting the output. Traditional methods rely on complex orchestration of models like chart key point detection, which involves many prior assumptions and lacks robustness. The models in this module utilize the latest VLM technology, driven by data, learning robust features from massive real-world data. Its application scenarios cover financial analysis, academic research, business reports, etc. — for instance, quickly extracting growth trend data from financial statements, experimental comparison values from scientific papers, or user distribution statistics from market research, assisting users in shifting from "viewing charts" to "using data."

II. Supported Model List

ModelModel Download Link Model parameter size(B) Model Storage Size (GB) Model Score Description
PP-Chart2TableInference Model 0.58 1.4 75.98 PP-Chart2Table is a self-developed multimodal model by the PaddlePaddle team, focusing on chart parsing, demonstrating outstanding performance in both Chinese and English chart parsing tasks. The team adopted a carefully designed data generation strategy, constructing a high-quality multimodal dataset of nearly 700,000 entries covering common chart types like pie charts, bar charts, stacked area charts, and various application scenarios. They also designed a two-stage training method, utilizing large model distillation to fully leverage massive unlabeled OOD data. In internal business tests in both Chinese and English scenarios, PP-Chart2Table not only achieved the SOTA level among models of the same parameter scale but also reached accuracy comparable to 7B parameter scale VLM models in critical scenarios.

Note: The above model scores are the results of internal evaluation set model testing, with a total of 1801 data points, including various chart types such as bar charts, line charts, and pie charts for testing samples under various scenarios such as financial reports, laws and regulations, contracts, etc. There are currently no plans to make them public.

III. Quick Integration

❗ Before quick integration, please install the PaddleX wheel package. For details, please refer to PaddleX Local Installation Tutorial

After completing the installation of the whl package, inference of the document-like visual language model module can be completed with just a few lines of code. You can freely switch models under this module, and you can also integrate model inference from the open document-like visual language model module into your project. Before running the following code, please download the sample image locally.

from paddlex import create_model
model = create_model('PP-Chart2Table')
results = model.predict(
    input={"image": "chart_parsing_02.png"},
    batch_size=1
)
for res in results:
    res.print()
    res.save_to_json(f"./output/res.json")

After running, the result is:

{'res': {'image': 'chart_parsing_02.png', 'result': '年份 | 单家五星级旅游饭店年平均营收 (百万元) | 单家五星级旅游饭店年平均利润 (百万元)\n2018 | 104.22 | 9.87\n2019 | 99.11 | 7.47\n2020 | 57.87 | -3.87\n2021 | 68.99 | -2.9\n2022 | 56.29 | -9.48\n2023 | 87.99 | 5.96'}}
The meanings of the result parameters are as follows: - image: Indicates the path of the input image to be predicted - result: The result information predicted by the model

The visualized printed prediction result is as follows:

年份 | 单家五星级旅游饭店年平均营收 (百万元) | 单家五星级旅游饭店年平均利润 (百万元)
2018 | 104.22 | 9.87
2019 | 99.11 | 7.47
2020 | 57.87 | -3.87
2021 | 68.99 | -2.9
2022 | 56.29 | -9.48
2023 | 87.99 | 5.96

Related methods, parameters, and descriptions are as follows:

  • create_model instantiates the document-like visual language model (taking PP-Chart2Table as an example here), with specific explanations as follows:
Parameter Description Type Options Default
model_name Model name str None None
model_dir Model storage path str None None
device Model inference device str Support specifying specific GPU card number, such as "gpu:0", other hardware specific card numbers, such as "npu:0", CPU as "cpu". gpu:0
use_hpip Whether to enable high-performance inference plugins. Currently not supported. bool None False
hpi_config High-performance inference configuration. Currently not supported. dict | None None None
  • Among them, model_name must be specified. After specifying model_name, the default PaddleX built-in model parameters are used. On this basis, if model_dir is specified, the user-defined model is used.

  • Call the predict() method of the document-like visual language model for inference prediction. The predict() method parameters include input, batch_size, with specific explanations as follows:

Parameter Description Type Options Default
input Data to be predicted dict Dict, as multimodal models have different input requirements, it needs to be determined based on the specific model. Specifically:
  • The input form for PP-Chart2Table is {'image': image_path}
  • None
    batch_size Batch size int Integer 1
    • Process the prediction results. The prediction result for each sample is the corresponding Result object, which supports operations like printing and saving as a json file:
    Method Description Parameter Type Description Default
    print() Print results to terminal format_json bool Whether to format the output content using JSON indentation True
    indent int Specify the indentation level to beautify the output JSON data for better readability, only effective when format_json is True 4
    ensure_ascii bool Control whether to escape non-ASCII characters to Unicode. When set to True, all non-ASCII characters will be escaped; False will keep the original characters, only effective when format_json is True False
    save_to_json() Save the result as a json formatted file save_path str Path to save the file. When it's a directory, the saved file name matches the input file type name None
    indent int Specify the indentation level to beautify the output JSON data for better readability, only effective when format_json is True 4
    ensure_ascii bool Control whether to escape non-ASCII characters to Unicode. When set to True, all non-ASCII characters will be escaped; False will keep the original characters, only effective when format_json is True False
    • Additionally, it is also possible to obtain prediction results through attributes, as follows:
    Attribute Description
    json Get the prediction result in json format

    For more information on using the API for single model inference in PaddleX, you can refer to PaddleX Single Model Python Script Usage Instructions.

    IV. Secondary Development

    The current module temporarily does not support fine-tuning training, only inference integration. Support for fine-tuning training in this module is planned for the future.

    Comments