CPPD¶
1. Introduction¶
Paper:
Context Perception Parallel Decoder for Scene Text Recognition Yongkun Du and Zhineng Chen and Caiyan Jia and Xiaoting Yin and Chenxia Li and Yuning Du and Yu-Gang Jiang
Scene text recognition models based on deep learning typically follow an Encoder-Decoder structure, where the decoder can be categorized into two types: (1) CTC and (2) Attention-based. Currently, most state-of-the-art (SOTA) models use an Attention-based decoder, which can be further divided into AR and PD types. In general, AR decoders achieve higher recognition accuracy than PD, while PD decoders are faster than AR. CPPD, with carefully designed CO and CC modules, achieves a balance between the accuracy of AR and the speed of PD.
The accuracy (%) and model files of CPPD on the public dataset of scene text recognition are as follows::
- English dataset from PARSeq.
Model | IC13 857 |
SVT | IIIT5k 3000 |
IC15 1811 |
SVTP | CUTE80 | Avg | Download |
---|---|---|---|---|---|---|---|---|
CPPD Tiny | 97.1 | 94.4 | 96.6 | 86.6 | 88.5 | 90.3 | 92.25 | en |
CPPD Base | 98.2 | 95.5 | 97.6 | 87.9 | 90.0 | 92.7 | 93.80 | en |
CPPD Base 48*160 | 97.5 | 95.5 | 97.7 | 87.7 | 92.4 | 93.7 | 94.10 | en |
- Trained on Synth dataset(MJ+ST), Test on Union14M-L benchmark from U14m.
Model | Curve | Multi- Oriented |
Artistic | Contextless | Salient | Multi- word |
General | Avg | Download |
---|---|---|---|---|---|---|---|---|---|
CPPD Tiny | 52.4 | 12.3 | 48.2 | 54.4 | 61.5 | 53.4 | 61.4 | 49.10 | Same as the table above. |
CPPD Base | 65.5 | 18.6 | 56.0 | 61.9 | 71.0 | 57.5 | 65.8 | 56.63 | Same as the table above. |
CPPD Base 48*160 | 71.9 | 22.1 | 60.5 | 67.9 | 78.3 | 63.9 | 67.1 | 61.69 | Same as the table above. |
- Trained on Union14M-L training dataset.
Model | IC13 857 |
SVT | IIIT5k 3000 |
IC15 1811 |
SVTP | CUTE80 | Avg | Download |
---|---|---|---|---|---|---|---|---|
CPPD Base 32*128 | 98.7 | 98.5 | 99.4 | 91.7 | 96.7 | 99.7 | 97.44 | en |
Model | Curve | Multi- Oriented |
Artistic | Contextless | Salient | Multi- word |
General | Avg | Download |
---|---|---|---|---|---|---|---|---|---|
CPPD Base 32*128 | 87.5 | 70.7 | 78.2 | 82.9 | 85.5 | 85.4 | 84.3 | 82.08 | Same as the table above. |
- Chinese dataset from Chinese Benckmark.
Model | Scene | Web | Document | Handwriting | Avg | Download |
---|---|---|---|---|---|---|
CPPD Base | 74.4 | 76.1 | 98.6 | 55.3 | 76.10 | ch |
CPPD Base + STN | 78.4 | 79.3 | 98.9 | 57.6 | 78.55 | ch |
2. Environment¶
Please refer to "Environment Preparation" to configure the PaddleOCR environment, and refer to "Project Clone"to clone the project code.
Dataset Preparation¶
English dataset download Union14M-Benchmark download Chinese dataset download
3. Model Training / Evaluation / Prediction¶
Please refer to Text Recognition Tutorial. PaddleOCR modularizes the code, and training different recognition models only requires changing the configuration file.
Training¶
Specifically, after the data preparation is completed, the training can be started. The training command is as follows:
Evaluation¶
You can download the model files and configuration files provided by CPPD
: download link, take CPPD-B
as an example, using the following command to evaluate:
Prediction¶
4. Inference and Deployment¶
4.1 Python Inference¶
First, the model saved during the CPPD text recognition training process is converted into an inference model. ( Model download link ), you can use the following command to convert:
Note: If you are training the model on your own dataset and have modified the dictionary file, please pay attention to modify the character_dict_path
in the configuration file to the modified dictionary file.
After the conversion is successful, there are three files in the directory:
4.2 C++ Inference¶
Not supported
4.3 Serving¶
Not supported
4.4 More¶
Not supported
Citation¶
@article{Du2023CPPD,
title = {Context Perception Parallel Decoder for Scene Text Recognition},
author = {Du, Yongkun and Chen, Zhineng and Jia, Caiyan and Yin, Xiaoting and Li, Chenxia and Du, Yuning and Jiang, Yu-Gang},
booktitle = {Arxiv},
year = {2023},
url = {https://arxiv.org/abs/2307.12270}
}