Text Line Orientation Classification Module Tutorial¶
1. Overview¶
The text line orientation classification module identifies the orientation of text lines and corrects them through post-processing. During processes like document scanning or ID photo capture, users may rotate the shooting device for better clarity, resulting in text lines with varying orientations. Standard OCR workflows often struggle with such data. By employing image classification technology, this module pre-determines text line orientation and adjusts it, thereby enhancing OCR accuracy.
2. Supported Models¶
Model | Download Links | Top-1 Acc (%) | GPU Inference Time (ms) | CPU Inference Time (ms) | Model Size (M) | Description |
---|---|---|---|---|---|---|
PP-LCNet_x0_25_textline_ori | Inference Model/Training Model | 95.54 | - | - | 0.32 | A text line classification model based on PP-LCNet_x0_25, with two classes: 0° and 180°. |
Testing Environment:
- Performance Testing Environment
- Test Dataset: PaddleX's proprietary dataset, covering scenarios like IDs and documents, with 1,000 images.
- Hardware:
- GPU: NVIDIA Tesla T4
- CPU: Intel Xeon Gold 6271C @ 2.60GHz
- Other: Ubuntu 20.04 / cuDNN 8.6 / TensorRT 8.5.2.2
- Inference Mode Description
Mode | GPU Configuration | CPU Configuration | Acceleration Techniques |
---|---|---|---|
Standard Mode | FP32 precision / No TRT acceleration | FP32 precision / 8 threads | PaddleInference |
High-Performance Mode | Optimal combination of precision and acceleration strategies | FP32 precision / 8 threads | Optimal backend selection (Paddle/OpenVINO/TRT, etc.) |
3. Quick Start¶
❗ Before starting, ensure you have installed the PaddleOCR wheel package. Refer to the Installation Guide for details.
Run the following command for a quick demo:
paddleocr text_line_orientation_classification -i https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/textline_rot180_demo.jpg
Alternatively, integrate the module into your project. Download the sample image locally before running the code below.
from paddleocr import TextLineOrientationClassification
model = TextLineOrientationClassification(model_name="PP-LCNet_x0_25_textline_ori")
output = model.predict("textline_rot180_demo.jpg", batch_size=1)
for res in output:
res.print(json_format=False)
res.save_to_img("./output/demo.png")
res.save_to_json("./output/res.json")
The output will be:
{'res': {'input_path': 'textline_rot180_demo.jpg', 'page_index': None, 'class_ids': array([1], dtype=int32), 'scores': array([1.], dtype=float32), 'label_names': ['180_degree']}}
Key output fields:
- input_path
: Path of the input image.
- page_index
: For PDF inputs, indicates the page number; otherwise, None
.
- class_ids
: Predicted class IDs (0° or 180°).
- scores
: Confidence scores.
- label_names
: Predicted class labels.
Visualization:
Method and Parameter Details¶
TextLineOrientationClassification
Initialization (usingPP-LCNet_x0_25_textline_ori
as an example):
Parameter | Description | Type | Options | Default |
---|---|---|---|---|
model_name |
Model name | str |
N/A | None |
model_dir |
Custom model path | str |
N/A | None |
device |
Inference device | str |
E.g., "gpu:0", "npu:0", "cpu" | gpu:0 |
use_hpip |
Enable high-performance inference | bool |
N/A | False |
hpi_config |
HPI configuration | dict | None |
N/A | None |
predict()
Method:input
: Supports various input types (numpy array, file path, URL, directory, or list).-
batch_size
: Batch size (default: 1). -
Result Handling:
Each prediction result is aResult
object with methods likeprint()
,save_to_img()
, andsave_to_json()
.
Method | Description | Parameters | Type | Details | Default |
---|---|---|---|---|---|
print() |
Print results | format_json , indent , ensure_ascii |
bool , int , bool |
Control JSON formatting and ASCII escaping | True , 4, False |
save_to_json() |
Save results as JSON | save_path , indent , ensure_ascii |
str , int , bool |
Same as print() |
N/A, 4, False |
save_to_img() |
Save visualized results | save_path |
str |
Output path | N/A |
- Attributes:
json
: Get results in JSON format.img
: Get visualized images as a dictionary.
4. Custom Development¶
Since PaddleOCR does not natively support training for text line orientation classification, refer to PaddleX's Custom Development Guide for training. Trained models can seamlessly integrate into PaddleOCR's API for inference.