Algorithms¶
This tutorial lists the OCR algorithms supported by PaddleOCR, as well as the models and metrics of each algorithm on English public datasets. It is mainly used for algorithm introduction and algorithm performance comparison. For more models on other datasets including Chinese, please refer to PP-OCRv3 models list.
Developers are welcome to contribute more algorithms! Please refer to add new algorithm guideline.
1. Two-stage OCR Algorithms¶
1.1 Text Detection Algorithms¶
Supported text detection algorithms (Click the link to get the tutorial):
On the ICDAR2015 dataset, the text detection result is as follows:
Model | Backbone | Precision | Recall | Hmean | Download link |
---|---|---|---|---|---|
EAST | ResNet50_vd | 88.71% | 81.36% | 84.88% | trained model |
EAST | MobileNetV3 | 78.20% | 79.10% | 78.65% | trained model |
DB | ResNet50_vd | 86.41% | 78.72% | 82.38% | trained model |
DB | MobileNetV3 | 77.29% | 73.08% | 75.12% | trained model |
SAST | ResNet50_vd | 91.39% | 83.77% | 87.42% | trained model |
PSE | ResNet50_vd | 85.81% | 79.53% | 82.55% | trained model |
PSE | MobileNetV3 | 82.20% | 70.48% | 75.89% | trained model |
DB++ | ResNet50 | 90.89% | 82.66% | 86.58% | pretrained model/trained model |
On Total-Text dataset, the text detection result is as follows:
Model | Backbone | Precision | Recall | Hmean | Download link |
---|---|---|---|---|---|
SAST | ResNet50_vd | 89.63% | 78.44% | 83.66% | trained model |
CT | ResNet18_vd | 88.68% | 81.70% | 85.05% | trained model |
On CTW1500 dataset, the text detection result is as follows:
Model | Backbone | Precision | Recall | Hmean | Download link |
---|---|---|---|---|---|
FCE | ResNet50_dcn | 88.39% | 82.18% | 85.27% | trained model |
DRRG | ResNet50_vd | 89.92% | 80.91% | 85.18% | trained model |
Note: Additional data, like icdar2013, icdar2017, COCO-Text, ArT, was added to the model training of SAST. Download English public dataset in organized format used by PaddleOCR from:
- Baidu Drive (download code: 2bpi).
- Google Drive
1.2 Text Recognition Algorithms¶
Supported text recognition algorithms (Click the link to get the tutorial):
- CRNN
- Rosetta
- STAR-Net
- RARE
- SRN
- NRTR
- SAR
- SEED
- SVTR
- ViTSTR
- ABINet
- VisionLAN
- SPIN
- RobustScanner
- RFL
- ParseQ
- CPPD
- SATRN
Refer to DTRB, the training and evaluation result of these above text recognition (using MJSynth and SynthText for training, evaluate on IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE) is as follow:
Model | Backbone | Avg Accuracy | Module combination | Download link |
---|---|---|---|---|
Rosetta | Resnet34_vd | 79.11% | rec_r34_vd_none_none_ctc | trained model |
Rosetta | MobileNetV3 | 75.80% | rec_mv3_none_none_ctc | trained model |
CRNN | Resnet34_vd | 81.04% | rec_r34_vd_none_bilstm_ctc | trained model |
CRNN | MobileNetV3 | 77.95% | rec_mv3_none_bilstm_ctc | trained model |
StarNet | Resnet34_vd | 82.85% | rec_r34_vd_tps_bilstm_ctc | trained model |
StarNet | MobileNetV3 | 79.28% | rec_mv3_tps_bilstm_ctc | trained model |
RARE | Resnet34_vd | 83.98% | rec_r34_vd_tps_bilstm_att | trained model |
RARE | MobileNetV3 | 81.76% | rec_mv3_tps_bilstm_att | trained model |
SRN | Resnet50_vd_fpn | 86.31% | rec_r50fpn_vd_none_srn | trained model |
NRTR | NRTR_MTB | 84.21% | rec_mtb_nrtr | trained model |
SAR | Resnet31 | 87.20% | rec_r31_sar | trained model |
SEED | Aster_Resnet | 85.35% | rec_resnet_stn_bilstm_att | trained model |
SVTR | SVTR-Tiny | 89.25% | rec_svtr_tiny_none_ctc_en | trained model |
ViTSTR | ViTSTR | 79.82% | rec_vitstr_none_ce | trained model |
ABINet | Resnet45 | 90.75% | rec_r45_abinet | trained model |
VisionLAN | Resnet45 | 90.30% | rec_r45_visionlan | trained model |
SPIN | ResNet32 | 90.00% | rec_r32_gaspin_bilstm_att | trained model |
RobustScanner | ResNet31 | 87.77% | rec_r31_robustscanner | trained model |
RFL | ResNetRFL | 88.63% | rec_resnet_rfl_att | trained model |
ParseQ | VIT | 91.24% | rec_vit_parseq_synth | trained model |
CPPD | SVTR-Base | 93.8% | rec_svtrnet_cppd_base_en | trained model |
SATRN | ShallowCNN | 88.05% | rec_satrn | trained model |
1.3 Text Super-Resolution Algorithms¶
Supported text super-resolution algorithms (Click the link to get the tutorial):
On the TextZoom public dataset, the effect of the algorithm is as follows:
Model | Backbone | PSNR_Avg | SSIM_Avg | Config | Download link |
---|---|---|---|---|---|
Text Gestalt | tsrn | 19.28 | 0.6560 | configs/sr/sr_tsrn_transformer_strock.yml | trained model |
Text Telescope | tbsrn | 21.56 | 0.7411 | configs/sr/sr_telescope.yml | trained model |
1.4 Formula Recognition Algorithm¶
Supported formula recognition algorithms (Click the link to get the tutorial):
On the CROHME handwritten formula dataset, the effect of the algorithm is as follows:
Model | Backbone | Config | ExpRate | Download link |
---|---|---|---|---|
CAN | DenseNet | rec_d28_can.yml | 51.72% | trained model |
2. End-to-end OCR Algorithms¶
Supported end-to-end algorithms (Click the link to get the tutorial):
3. Table Recognition Algorithms¶
Supported table recognition algorithms (Click the link to get the tutorial):
On the PubTabNet dataset, the algorithm result is as follows:
Model | Backbone | Config | Acc | Download link |
---|---|---|---|---|
TableMaster | TableResNetExtra | configs/table/table_master.yml | 77.47% | trained model / inference model |
4. Key Information Extraction Algorithms¶
Supported KIE algorithms (Click the link to get the tutorial):
On wildreceipt dataset, the algorithm result is as follows:
Model | Backbone | Config | Hmean | Download link |
---|---|---|---|---|
SDMGR | VGG6 | configs/kie/sdmgr/kie_unet_sdmgr.yml | 86.70% | trained model |
On XFUND_zh dataset, the algorithm result is as follows:
Model | Backbone | Task | Config | Hmean | Download link |
---|---|---|---|---|---|
VI-LayoutXLM | VI-LayoutXLM-base | SER | ser_vi_layoutxlm_xfund_zh_udml.yml | 93.19% | trained model |
LayoutXLM | LayoutXLM-base | SER | ser_layoutxlm_xfund_zh.yml | 90.38% | trained model |
LayoutLM | LayoutLM-base | SER | ser_layoutlm_xfund_zh.yml | 77.31% | trained model |
LayoutLMv2 | LayoutLMv2-base | SER | ser_layoutlmv2_xfund_zh.yml | 85.44% | trained model |
VI-LayoutXLM | VI-LayoutXLM-base | RE | re_vi_layoutxlm_xfund_zh_udml.yml | 83.92% | trained model |
LayoutXLM | LayoutXLM-base | RE | re_layoutxlm_xfund_zh.yml | 74.83% | trained model |
LayoutLMv2 | LayoutLMv2-base | RE | re_layoutlmv2_xfund_zh.yml | 67.77% | trained model |