DB && DB++¶

1. Introduction¶

Paper:

Real-time Scene Text Detection with Differentiable Binarization Liao, Minghui and Wan, Zhaoyi and Yao, Cong and Chen, Kai and Bai, Xiang AAAI, 2020

Real-Time Scene Text Detection with Differentiable Binarization and Adaptive Scale Fusion Liao, Minghui and Zou, Zhisheng and Wan, Zhaoyi and Yao, Cong and Bai, Xiang TPAMI, 2022

On the ICDAR2015 dataset, the text detection result is as follows:

Model	Backbone	Configuration	Precision	Recall	Hmean	Download
DB	ResNet50_vd	configs/det/det_r50_vd_db.yml	86.41%	78.72%	82.38%	trained model
DB	MobileNetV3	configs/det/det_mv3_db.yml	77.29%	73.08%	75.12%	trained model
DB++	ResNet50	configs/det/det_r50_db++_ic15.yml	90.89%	82.66%	86.58%	pretrained model/trained model

On the TD_TR dataset, the text detection result is as follows:

Model	Backbone	Configuration	Precision	Recall	Hmean	Download
DB++	ResNet50	configs/det/det_r50_db++_td_tr.yml	92.92%	86.48%	89.58%	pretrained model/trained model

2. Environment¶

Please prepare your environment referring to prepare the environment and clone the repo.

3. Model Training / Evaluation / Prediction¶

Please refer to text detection training tutorial. PaddleOCR has modularized the code structure, so that you only need to replace the configuration file to train different detection models.

4. Inference and Deployment¶

4.1 Python Inference¶

First, convert the model saved in the DB text detection training process into an inference model. Taking the model based on the Resnet50_vd backbone network and trained on the ICDAR2015 English dataset as example (model download link), you can use the following command to convert:

python3 tools/export_model.py -c configs/det/det_r50_vd_db.yml -o Global.pretrained_model=./det_r50_vd_db_v2.0_train/best_accuracy  Global.save_inference_dir=./inference/det_db

DB text detection model inference, you can execute the following command:

python3 tools/infer/predict_det.py --image_dir="./doc/imgs_en/img_10.jpg" --det_model_dir="./inference/det_db/"

The visualized text detection results are saved to the ./inference_results folder by default, and the name of the result file is prefixed with det_res. Examples of results are as follows:

Note: Since the ICDAR2015 dataset has only 1,000 training images, mainly for English scenes, the above model has very poor detection result on Chinese text images.

4.2 C++ Inference¶

With the inference model prepared, refer to the cpp infer tutorial for C++ inference.

4.3 Serving¶

With the inference model prepared, refer to the pdserving tutorial for service deployment by Paddle Serving.

4.4 More¶

More deployment schemes supported for DB:

Paddle2ONNX: with the inference model prepared, please refer to the paddle2onnx tutorial.

5. FAQ¶

Citation¶

@inproceedings{liao2020real,
  title={Real-time scene text detection with differentiable binarization},
  author={Liao, Minghui and Wan, Zhaoyi and Yao, Cong and Chen, Kai and Bai, Xiang},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={34},
  number={07},
  pages={11474--11481},
  year={2020}
}

@article{liao2022real,
  title={Real-Time Scene Text Detection with Differentiable Binarization and Adaptive Scale Fusion},
  author={Liao, Minghui and Zou, Zhisheng and Wan, Zhaoyi and Yao, Cong and Bai, Xiang},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
  year={2022},
  publisher={IEEE}
}