Multi-language model¶
Recent Update
- 2022.5.8 update the
PP-OCRv3
version of the multi-language detection and recognition model, and the average recognition accuracy has increased by more than 5%. - 2021.4.9 supports the detection and recognition of 80 languages
- 2021.4.9 supports lightweight high-precision English model detection and recognition
PaddleOCR aims to create a rich, leading, and practical OCR tool library, which not only provides Chinese and English models in general scenarios, but also provides models specifically trained in English scenarios. And multilingual models covering 80 languages.
Among them, the English model supports the detection and recognition of uppercase and lowercase letters and common punctuation, and the recognition of space characters is optimized:
The multilingual models cover Latin, Arabic, Traditional Chinese, Korean, Japanese, etc.:
This document will briefly introduce how to use the multilingual model.
1 Installation¶
1.1 Paddle installation¶
1.2 PaddleOCR package installation¶
Build and install locally
2 Quick use¶
2.1 Command line operation¶
View help information
- Whole image prediction (detection + recognition)
PaddleOCR currently supports 80 languages, which can be specified by the --lang parameter. The supported languages are listed in the table.
The result is a list. Each item contains a text box, text and recognition confidence
- Recognition
The result is a 2-tuple, which contains the recognition result and recognition confidence
- Detection
The result is a list. Each item represents the coordinates of a text box.
2.2 Run with Python script¶
PPOCR is able to run with Python scripts for easy integration with your own code:
- Whole image prediction (detection + recognition)
Visualization of results:
PPOCR also supports direction classification. For more detailed usage, please refer to: whl package instructions.
3 Custom training¶
PPOCR supports using your own data for custom training or fine-tune, where the recognition model can refer to French configuration file Modify the training data path, dictionary and other parameters.
For specific data preparation and training process, please refer to: Text Detection, Text Recognition, more functions such as predictive deployment, For functions such as data annotation, you can read the complete Document Tutorial.
4 Inference and Deployment¶
In addition to installing the whl package for quick forecasting, PPOCR also provides a variety of forecasting deployment methods. If necessary, you can read related documents:
5 Support languages and abbreviations¶
Language | Abbreviation | Language | Abbreviation | |
---|---|---|---|---|
Chinese & English | ch | Arabic | ar | |
English | en | Hindi | hi | |
French | fr | Uyghur | ug | |
German | german | Persian | fa | |
Japanese | japan | Urdu | ur | |
Korean | korean | Serbian(latin) | rs_latin | |
Chinese Traditional | chinese_cht | Occitan | oc | |
Italian | it | Marathi | mr | |
Spanish | es | Nepali | ne | |
Portuguese | pt | Serbian(cyrillic) | rs_cyrillic | |
Russian | ru | Bulgarian | bg | |
Ukranian | uk | Estonian | et | |
Belarusian | be | Irish | ga | |
Telugu | te | Croatian | hr | |
Sanskrit | sa | Hungarian | hu | |
Tamil | ta | Indonesian | id | |
Afrikaans | af | Icelandic | is | |
Azerbaijani | az | Kurdish | ku | |
Bosnian | bs | Lithuanian | lt | |
Czech | cs | Latvian | lv | |
Welsh | cy | Maori | mi | |
Danish | da | Malay | ms | |
Maltese | mt | Adyghe | ady | |
Dutch | nl | Kabardian | kbd | |
Norwegian | no | Avar | ava | |
Polish | pl | Dargwa | dar | |
Romanian | ro | Ingush | inh | |
Slovak | sk | Lak | lbe | |
Slovenian | sl | Lezghian | lez | |
Albanian | sq | Tabassaran | tab | |
Swedish | sv | Bihari | bh | |
Swahili | sw | Maithili | mai | |
Tagalog | tl | Angika | ang | |
Turkish | tr | Bhojpuri | bho | |
Uzbek | uz | Magahi | mah | |
Vietnamese | vi | Nagpur | sck | |
Mongolian | mn | Newari | new | |
Abaza | abq | Goan Konkani | gom |