Skip to content

Text Line Orientation Classification Module Tutorial

1. Overview

The text line orientation classification module primarily distinguishes the orientation of text lines and corrects them using post-processing. In processes such as document scanning and license/certificate photography, to capture clearer images, the capture device may be rotated, resulting in text lines in various orientations. Standard OCR pipelines cannot handle such data well. By utilizing image classification technology, the orientation of text lines can be predetermined and adjusted, thereby enhancing the accuracy of OCR processing.

2. Supported Model List

ModelModel Download Link Top-1 Accuracy (%) GPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
CPU Inference Time (ms) Model Storage Size (MB) Description
PP-LCNet_x0_25_textline_ori Inference Model/Training Model 98.85 2.16 / 0.41 2.37 / 0.73 0.96 Text line classification model based on PP-LCNet_x0_25, with two classes: 0 degrees and 180 degrees
PP-LCNet_x1_0_textline_ori Inference Model/Training Model 99.42 - / - 2.98 / 2.98 6.5 Text line classification model based on PP-LCNet_x1_0, with two classes: 0 degrees and 180 degrees

Note: The text line orientation classification model was upgraded on May 26, 2025, and PP-LCNet_x1_0_textline_ori has been added. If you need to use the pre-upgrade model weights, please click the download link.

Test Environment Description:

  • Performance Test Environment
    • Test Dataset: PaddleX Self-built Dataset, Covering Multiple Scenarios Such as Documents and Certificates, Containing 1000 Images.
    • Hardware Configuration:
      • GPU: NVIDIA Tesla T4
      • CPU: Intel Xeon Gold 6271C @ 2.60GHz
      • Other Environments: Ubuntu 20.04 / CUDA 11.8 / cuDNN 8.9 / TensorRT 8.6.1.6
  • Inference Mode Description
Mode GPU Configuration CPU Configuration Acceleration Technology Combination
Normal Mode FP32 Precision / No TRT Acceleration FP32 Precision / 8 Threads PaddleInference
High-Performance Mode Optimal combination of pre-selected precision types and acceleration strategies FP32 Precision / 8 Threads Pre-selected optimal backend (Paddle/OpenVINO/TRT, etc.)

3. Quick Integration

❗ Before starting, please install the wheel package of PaddleOCR. For detailed instructions, refer to the Installation Guide.

You can quickly experience the functionality with a single command:

paddleocr textline_orientation_classification -i https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/textline_rot180_demo.jpg

Note: The official models would be download from HuggingFace by default. If can't access to HuggingFace, please set the environment variable PADDLE_PDX_MODEL_SOURCE="BOS" to change the model source to BOS. In the future, more model sources will be supported.

You can also integrate the text line orientation classification model into your project. Run the following code after downloading the example image to your local machine.

from paddleocr import TextLineOrientationClassification
model = TextLineOrientationClassification(model_name="PP-LCNet_x0_25_textline_ori")
output = model.predict("textline_rot180_demo.jpg",  batch_size=1)
for res in output:
    res.print(json_format=False)
    res.save_to_img("./output/demo.png")
    res.save_to_json("./output/res.json")

After running, the result obtained is:

{'res': {'input_path': 'textline_rot180_demo.jpg', 'page_index': None, 'class_ids': array([1], dtype=int32), 'scores': array([0.99864], dtype=float32), 'label_names': ['180_degree']}}

The meanings of the running results parameters are as follows:

  • input_path:Indicates the path of the input image.
  • page_index:If the input is a PDF file, it indicates the current page number of the PDF; otherwise, it is None.
  • class_ids:Indicates the class ID of the prediction result.
  • scores:Indicates the confidence score of the prediction result.
  • label_names:Indicates the class name of the prediction result. The visualization image is as follows:

The explanations for the methods, parameters, etc., are as follows:

  • TextLineOrientationClassification instantiates a textline classification model (here, PP-LCNet_x0_25_textline_ori is used as an example), and the specific explanations are as follows:
Parameter Description Type Default
model_name Model name. If set to None, PP-LCNet_x0_25_textline_ori will be used. str|None None
model_dir Model storage path. str|None None
device Device for inference.
For example: "cpu", "gpu", "npu", "gpu:0", "gpu:0,1".
If multiple devices are specified, parallel inference will be performed.
By default, GPU 0 is used if available; otherwise, CPU is used.
str|None None
enable_hpi Whether to enable high-performance inference. bool False
use_tensorrt Whether to use the Paddle Inference TensorRT subgraph engine. If the model does not support acceleration through TensorRT, setting this flag will not enable acceleration.
For Paddle with CUDA version 11.8, the compatible TensorRT version is 8.x (x>=6), and it is recommended to install TensorRT 8.6.1.6.
For Paddle with CUDA version 12.6, the compatible TensorRT version is 10.x (x>=5), and it is recommended to install TensorRT 10.5.0.18.
bool False
precision Computation precision when using the TensorRT subgraph engine in Paddle Inference.
Options: "fp32", "fp16".
str "fp32"
enable_mkldnn Whether to enable MKL-DNN acceleration for inference. If MKL-DNN is unavailable or the model does not support it, acceleration will not be used even if this flag is set. bool True
mkldnn_cache_capacity MKL-DNN cache capacity. int 10
cpu_threads Number of threads to use for inference on CPUs. int 10
  • Use the predict() method of the text line direction classification model to perform inference. This method returns a list of results. In addition, this module also provides the predict_iter() method. Both methods accept the same parameters and return the same result format. The difference is that predict_iter() returns a generator, which processes and retrieves prediction results step by step. It is suitable for handling large datasets or memory-efficient scenarios. You can choose either method based on your actual needs. The predict() method accepts the parameters input and batch_size, which are described in detail below:
Parameter Description Type Default
input Input data to be predicted. Required. Supports multiple input types:
  • Python Var: e.g., numpy.ndarray representing image data
  • str: - Local image or PDF file path: /root/data/img.jpg; - URL of image or PDF file: e.g., example; - Local directory: directory containing images for prediction, e.g., /root/data/ (Note: directories containing PDF files are not supported; PDFs must be specified by exact file path)
  • list: Elements must be of the above types, e.g., [numpy.ndarray, numpy.ndarray], ["/root/data/img1.jpg", "/root/data/img2.jpg"], ["/root/data1", "/root/data2"]
Python Var|str|list
batch_size Batch size, positive integer. int 1
  • Call the predict() method of the text line orientation classification model for inference. This method will return a list of results. In addition, this module also provides a predict_iter() method. Both methods accept the same parameters and return the same results, but predict_iter() returns a generator, which is more suitable for processing large datasets or when you want to save memory. You can choose either method according to your needs. The parameters of the predict() method are input and batch_size, as described below:
Parameter Parameter Description Parameter Type Options Default Value
input Data to be predicted, supporting multiple input types Python Var|str|list
  • Python variable, such as image data represented by numpy.ndarray
  • File path, such as the local path of an image file: /root/data/img.jpg
  • URL link, such as the network URL of an image file: Example
  • Local directory, the directory should contain data files to be predicted, such as the local path: /root/data/
  • list, the elements of the list should be of the above-mentioned data types, such as [numpy.ndarray, numpy.ndarray], [\"/root/data/img1.jpg\", \"/root/data/img2.jpg\"], [\"/root/data1\", \"/root/data2\"]
None
batch_size Batch size int Any integer 1
  • The prediction results are processed, and the prediction result for each sample is of type dict. It supports operations such as printing, saving as an image, and saving as a json file:
Method Method Description Parameter Parameter Type Parameter Description Default Value
print() Print the results to the terminal format_json bool Whether to format the output content using JSON indentation True
indent int Specify the indentation level to beautify the output JSON data, making it more readable, only effective when format_json is True 4
ensure_ascii bool Control whether to escape non-ASCII characters to Unicode. If set to True, all non-ASCII characters will be escaped; False retains the original characters, only effective when format_json is True False
save_to_json() Save the results as a JSON file save_path str The path to save the file. If it is a directory, the saved file name will be consistent with the input file name None
indent int Specify the indentation level to beautify the output JSON data, making it more readable, only effective when format_json is True 4
ensure_ascii bool Control whether to escape non-ASCII characters to Unicode. If set to True, all non-ASCII characters will be escaped; False retains the original characters, only effective when format_json is True False
save_to_img() Save the results as an image file save_path str The path to save the file. If it is a directory, the saved file name will be consistent with the input file name None
  • Additionally, it supports obtaining the visualization image with results and the prediction results through attributes, as follows:
Attribute Attribute Description
json Get the prediction result in json format
img Get the visualization image in dict format

4. Custom Development

Since PaddleOCR does not natively support training for text line orientation classification, refer to PaddleX's Custom Development Guide for training. Trained models can seamlessly integrate into PaddleOCR's API for inference.

Comments