Skip to content

General Image Recognition Pipeline Usage Tutorial

1. Introduction to the General Image Recognition Pipeline

The General Image Recognition Pipeline aims to solve the problem of open-domain object localization and recognition. Currently, PaddleX's General Image Recognition Pipeline supports PP-ShiTuV2.

PP-ShiTuV2 is a practical general image recognition system mainly composed of three modules: mainbody detection module, image feature module, and vector retrieval module. The system integrates and improves various strategies in multiple aspects, including backbone network, loss function, data augmentation, learning rate scheduling, regularization, pre-trained model, and model pruning and quantization. It optimizes each module and ultimately achieves better performance in multiple application scenarios.

The General Image Recognition Pipeline includes the mainbody detection module and the image feature module, with several models to choose. You can select the model to use based on the benchmark data below. If you prioritize model precision, choose a model with higher precision. If you prioritize inference speed, choose a model with faster inference. If you prioritize model storage size, choose a model with a smaller storage size.

Object Detection Module:

Model mAP(0.5:0.95) mAP(0.5) GPU Inference Time (ms) CPU Inference Time (ms) Model Size (M) Description
PP-ShiTuV2_det 41.5 62.0 33.7 537.0 27.54 An mainbody detection model based on PicoDet_LCNet_x2_5, which may detect multiple common objects simultaneously.

Note: The above accuracy metrics are based on the private mainbody detection dataset.

Image Feature Module:

Model Recall@1 (%) GPU Inference Time (ms) CPU Inference Time (ms) Model Size (M) Description
PP-ShiTuV2_rec 84.2 5.23428 19.6005 16.3 M PP-ShiTuV2 is a general image feature system consisting of three modules: mainbody detection, feature extraction, and vector retrieval. These models are part of the feature extraction module, and different models can be selected based on system requirements.
PP-ShiTuV2_rec_CLIP_vit_base 88.69 13.1957 285.493 306.6 M
PP-ShiTuV2_rec_CLIP_vit_large 91.03 51.1284 1131.28 1.05 G

Note: The above accuracy metrics are based on AliProducts Recall@1. All GPU inference times are based on NVIDIA Tesla T4 machines with FP32 precision. CPU inference speeds are based on Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision.

2. Quick Start

The pre-trained model pipelines provided by PaddleX can be quickly experienced. You can use Python to experience locally.

2.1 Online Experience

Not supported yet.

2.2 Local Experience

❗ Before using the General Image Recognition Pipeline locally, please ensure you have installed the PaddleX wheel package according to the PaddleX Installation Tutorial.

2.2.1 Command Line Experience

The pipeline does not support command line experience at this time.

By default, the built-in General Image Recognition Pipeline configuration file is used. If you want to change it, you can run the following command to obtain:

👉Click to Expand
paddlex --get_pipeline_config PP-ShiTuV2

After execution, the General Image Recognition Pipeline configuration file will be saved in the current directory. If you want to customize the save location, you can run the following command (assuming the custom save location is ./my_path):

paddlex --get_pipeline_config PP-ShiTuV2 --save_path ./my_path

2.2.2 Python Script Integration

  • In the example of using this pipeline, a feature vector library needs to be built beforehand. You can download the officially provided drink recognition test dataset drink_dataset_v2.0 to build the feature vector library. If you want to use a private dataset, you can refer to Section 2.3 Data Organization for Building the Feature Library. After that, you can quickly build the feature vector library and predict using the General Image Recognition Pipeline with just a few lines of code.
from paddlex import create_pipeline

pipeline = create_pipeline(pipeline="PP-ShiTuV2")

pipeline.build_index(data_root="drink_dataset_v2.0/", index_dir="index_dir")

output = pipeline.predict("./drink_dataset_v2.0/test_images/", index_dir="index_dir")
for res in output:
    res.print()
    res.save_to_img("./output/")

````

In the above Python script, the following steps are executed:

(1) Call the `create_pipeline` function to create a general image recognition pipeline object. The specific parameter descriptions are as follows:

<table>
<thead>
<tr>
<th>Parameter</th>
<th>Parameter Description</th>
<th>Parameter Type</th>
<th>Default Value</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>pipeline</code></td>
<td>The name of the pipeline or the path to the pipeline configuration file. If it is the name of the pipeline, it must be a pipeline supported by PaddleX.</td>
<td><code>str</code></td>
<td>None</td>
</tr>
<tr>
<td><code>index_dir</code></td>
<td>The directory where the retrieval database files used for pipeline inference are located. If this parameter is not passed, <code>index_dir</code> needs to be specified in <code>predict()</code>.</td>
<td><code>str</code></td>
<td>None</td>
</tr>
<tr>
<td><code>device</code></td>
<td>The inference device for the pipeline model. Supports: "gpu", "cpu".</td>
<td><code>str</code></td>
<td><code>gpu</code></td>
</tr>
<tr>
<td><code>use_hpip</code></td>
<td>Whether to enable high-performance inference, which is only available when the pipeline supports it.</td>
<td><code>bool</code></td>
<td><code>False</code></td>
</tr>
</tbody>
</table>

(2) Call the `build_index` function of the general image recognition pipeline object to build the feature vector library. The specific parameters are described as follows:

<table>
<thead>
<tr>
<th>Parameter</th>
<th>Parameter Description</th>
<th>Parameter Type</th>
<th>Default Value</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>data_root</code></td>
<td>The root directory of the dataset. The data organization method refers to <a href="#2.3-Data-Organization-for-Building-the-Feature-Library">Section 2.3 Data Organization for Building the Feature Library</a></td>
<td><code>str</code></td>
<td>None</td>
</tr>
<tr>
<td><code>index_dir</code></td>
<td>The save path for the feature library. After successfully calling the <code>build_index</code> function, two files will be generated in this path: <code>"id_map.pkl"</code> saves the mapping relationship between image IDs and image feature labels; <code>"vector.index"</code> stores the feature vectors of each image.</td>
<td><code>str</code></td>
<td>None</td>
</tr>
</tbody>
</table>

(3) Call the `predict` function of the general image recognition pipeline object for inference prediction: The `predict` function parameter is `input`, which is used to input the data to be predicted, supporting multiple input methods. Specific examples are as follows:

<table>
<thead>
<tr>
<th>Parameter Type</th>
<th>Parameter Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>Python Var</td>
<td>Supports directly passing in Python variables, such as <code>numpy.ndarray</code> representing image data.</td>
</tr>
<tr>
<td>str</td>
<td>Supports passing in the file path of the data to be predicted, such as the local path of an image file: <code>/root/data/img.jpg</code>.</td>
</tr>
<tr>
<td>str</td>
<td>Supports passing in the URL of the data file to be predicted, such as the network URL of an image file: <a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/yuanqisenlin.jpeg">Example</a>.</td>
</tr>
<tr>
<td>str</td>
<td>Supports passing in a local directory that contains the data files to be predicted, such as the local path: <code>/root/data/</code>.</td>
</tr>
<tr>
<td>dict</td>
<td>Supports passing in a dictionary type, where the key needs to correspond to the specific task, such as "img" for image classification tasks. The value of the dictionary supports the above types of data, for example: <code>{"img": "/root/data1"}</code>.</td>
</tr>
<tr>
<td>list</td>
<td>Supports passing in a list, where the elements of the list need to be the above types of data, such as <code>[numpy.ndarray, numpy.ndarray], ["/root/data/img1.jpg", "/root/data/img2.jpg"]</code>, <code>["/root/data1", "/root/data2"]</code>, <code>[{"img": "/root/data1"}, {"img": "/root/data2/img.jpg"}]</code>.</td>
</tr>
</tbody>
</table>

Additionally, the `predict` method supports the `index_dir` parameter for setting the retrieval database:

<table>
<thead>
<tr>
<th>Parameter Type</th>
<th>Parameter Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>index_dir</code></td>
<td>The directory where the retrieval database files used for pipeline inference are located. If this parameter is not passed, the default retrieval database specified through the <code>index_dir</code> parameter in <code>create_pipeline()</code> will be used.</td>
</tr>
</tbody>
</table>

(4) Obtain the prediction results by calling the `predict` method: The `predict` method is a `generator`, so prediction results need to be obtained by iteration. The `predict` method predicts data in batches.

(5) Process the prediction results: The prediction result for each sample is of `dict` type and supports printing or saving to a file. The supported save types are related to the specific pipeline, such as:

<table>
<thead>
<tr>
<th>Method</th>
<th>Description</th>
<th>Method Parameters</th>
</tr>
</thead>
<tbody>
<tr>
<td>print</td>
<td>Print the results to the terminal</td>
<td><code>- format_json</code>: bool type, whether to use json indentation formatting for the output content, default is True;<br><code>- indent</code>: int type, json formatting setting, only effective when format_json is True, default is 4;<br><code>- ensure_ascii</code>: bool type, json formatting setting, only effective when format_json is True, default is False;</td>
</tr>
<tr>
<td>save_to_json</td>
<td>Save the results as a json-formatted file</td>
<td><code>- save_path</code>: str type, the save file path. When it is a directory, the saved file naming is consistent with the input file type naming;<br><code>- indent</code>: int type, json formatting setting, default is 4;<br><code>- ensure_ascii</code>: bool type, json formatting setting, default is False;</td>
</tr>
<tr>
<td>save_to_img</td>
<td>Save the results as an image-formatted file</td>
<td><code>- save_path</code>: str type, the save file path. When it is a directory, the saved file naming is consistent with the input file type naming;</td>
</tr>
</tbody>
</table>

If you have a configuration file, you can customize the configurations for the general image recognition pipeline by modifying the `pipeline` parameter value in the `create_pipeline` method to the path of the pipeline configuration file.

For example, if your configuration file is saved at `./my_path/PP-ShiTuV2.yaml`, you only need to execute:

```python
from paddlex import create_pipeline
pipeline = create_pipeline(pipeline="./my_path/PP-ShiTuV2.yaml", index_dir="index_dir")

output = pipeline.predict("./drink_dataset_v2.0/test_images/")
for res in output:
    res.print()
    res.save_to_img("./output/")

2.2.3 Add or Remove Features from the Feature Library

If you want to add more images to the feature library, you can call the append_index function; to remove image features, you can call the remove_index function.

from paddlex import create_pipeline

pipeline = create_pipeline("PP-ShiTuV2")
pipeline.build_index(data_root="drink_dataset_v2.0/", index_dir="index_dir", index_type="IVF")
pipeline.append_index(data_root="drink_dataset_v2.0/", index_dir="index_dir", index_type="IVF")
pipeline.remove_index(data_root="drink_dataset_v2.0/", index_dir="index_dir", index_type="IVF")

The parameter descriptions for the above methods are as follows:

Parameter Description Type Default Value
data_root The root directory of the dataset to be added. The data organization should be the same as when building the feature library, refer to Section 2.3 Data Organization for Building the Feature Library str None
index_dir The storage directory for the feature library. In append_index and remove_index, it is also the path of the feature library to be modified (or deleted). str None
index_type Supports HNSW32, IVF, Flat. Among them, HNSW32 has faster retrieval speed and higher accuracy but does not support the remove_index() operation; IVF has faster retrieval speed but relatively lower accuracy, and supports append_index() and remove_index() operations; Flat has lower retrieval speed but higher accuracy, and supports append_index() and remove_index() operations. str HNSW32
metric_type Supports: IP, Inner Product; L2, Euclidean Distance. str IP

Notice: There may be some compatibility errors when HNSW32 is used to build or predict on Windows.

2.3 Data Organization for Building the Feature Library

The PaddleX general image recognition pipeline requires a pre-built feature library for feature retrieval. If you want to build a feature vector library with private data, you need to organize the data as follows:

data_root             # Root directory of the dataset, the directory name can be changed
├── images            # Directory for saving images, the directory name can be changed
      ...
└── gallery.txt       # Annotation file for the feature library dataset, the file name cannot be changed. Each line gives the path of the image to be retrieved and the image label, separated by a space, for example: “0/0.jpg label”

3. Development Integration/Deployment

If the pipeline meets your requirements for inference speed and accuracy, you can proceed directly with development integration/deployment.

If you need to apply the pipeline directly in your Python project, refer to the example code in 2.2.2 Python Script Integration.

Additionally, PaddleX provides three other deployment methods, detailed as follows:

🚀 High-Performance Inference: In actual production environments, many applications have stringent standards for the performance metrics of deployment strategies (especially response speed) to ensure efficient system operation and smooth user experience. To this end, PaddleX provides high-performance inference plugins aimed at deeply optimizing model inference and pre/post-processing for significant end-to-end speedups. For detailed high-performance inference procedures, refer to the PaddleX High-Performance Inference Guide.

☁️ Service-Oriented Deployment: Service-oriented deployment is a common deployment form in actual production environments. By encapsulating inference functions as services, clients can access these services through network requests to obtain inference results. PaddleX supports users in achieving low-cost service-oriented deployment of pipelines. For detailed service-oriented deployment procedures, refer to the PaddleX Service-Oriented Deployment Guide.

Below are the API references and multi-language service invocation examples:

API Reference

For main operations provided by the service:

  • The HTTP request method is POST.
  • The request body and the response body are both JSON data (JSON objects).
  • When the request is processed successfully, the response status code is 200, and the response body properties are as follows:
Name Type Meaning
errorCode integer Error code. Fixed to 0.
errorMsg string Error description. Fixed to "Success".

The response body may also have a result property, which is an object type that stores operation result information.

  • When the request is not processed successfully, the properties of the response body are as follows:
Name Type Meaning
errorCode integer Error code. Same as the response status code.
errorMsg string Error description.

The main operations provided by the service are as follows:

  • buildIndex

Build feature vector index.

POST /shitu-index-build

  • The properties of the request body are as follows:
Name Type Meaning Required
imageLabelPairs array Image-label pairs for building the index. Yes

Each element in imageLabelPairs is an object with the following properties:

Name Type Meaning
image string The URL of an image file accessible by the service, or the Base64 encoding result of the image file content.
label string Label.
  • When the request is processed successfully, the result of the response body has the following properties:
Name Type Meaning
indexKey string The key corresponding to the index, used to identify the established index. Can be used as input for other operations.
idMap object Mapping from vector ID to label.
  • addImagesToIndex

Add images (corresponding feature vectors) to the index.

POST /shitu-index-add

  • The properties of the request body are as follows:
Name Type Meaning Required
imageLabelPairs array Image-label pairs for building the index. Yes
indexKey string The key corresponding to the index. Provided by the buildIndex operation. Yes

Each element in imageLabelPairs is an object with the following properties:

Name Type Meaning
image string The URL of an image file accessible by the service, or the Base64 encoding result of the image file content.
label string Label.
  • When the request is processed successfully, the result of the response body has the following properties:
Name Type Meaning
idMap object Mapping from vector ID to label.
  • removeImagesFromIndex

Remove images (corresponding feature vectors) from the index.

POST /shitu-index-remove

  • The properties of the request body are as follows:
Name Type Meaning Required
ids array IDs of the vectors to be removed from the index. Yes
indexKey string The key corresponding to the index. Provided by the buildIndex operation. Yes
  • When the request is processed successfully, the result of the response body has the following properties:
Name Type Meaning
idMap object Mapping from vector ID to label.
  • infer

Perform image recognition.

POST /shitu-infer

  • The properties of the request body are as follows:
Name Type Meaning Required
image string The URL of an image file accessible by the service, or the Base64 encoding result of the image file content. Yes
indexKey string The key corresponding to the index. Provided by the buildIndex operation. No
  • When the request is processed successfully, the result of the response body has the following properties:
Name Type Meaning
detectedObjects array Information of the detected targets.
image string Recognition result image. The image is in JPEG format, encoded with Base64.

Each element in detectedObjects is an object with the following properties:

Name Type Meaning
bbox array Target location. The elements in the array are the x-coordinate of the upper-left corner, the y-coordinate of the upper-left corner, the x-coordinate of the lower-right corner, and the y-coordinate of the lower-right corner, respectively.
recResults array Recognition results.
score number Detection score.

Each element in recResults is an object with the following properties:

Name Type Meaning
label string Label.
score number Recognition score.
Multi-Language Service Invocation Examples
Python
import base64
import pprint
import sys

import requests

API_BASE_URL = "http://0.0.0.0:8080"

base_image_label_pairs = [
    {"image": "./demo0.jpg", "label": "兔子"},
    {"image": "./demo1.jpg", "label": "兔子"},
    {"image": "./demo2.jpg", "label": "小狗"},
]
image_label_pairs_to_add = [
    {"image": "./demo3.jpg", "label": "小狗"},
]
ids_to_remove = [1]
infer_image_path = "./demo4.jpg"
output_image_path = "./out.jpg"

for pair in base_image_label_pairs:
    with open(pair["image"], "rb") as file:
        image_bytes = file.read()
        image_data = base64.b64encode(image_bytes).decode("ascii")
    pair["image"] = image_data

payload = {"imageLabelPairs": base_image_label_pairs}
resp_index_build = requests.post(f"{API_BASE_URL}/shitu-index-build", json=payload)
if resp_index_build.status_code != 200:
    print(f"Request to shitu-index-build failed with status code {resp_index_build}.")
    pprint.pp(resp_index_build.json())
    sys.exit(1)
result_index_build = resp_index_build.json()["result"]
print(f"Number of images indexed: {len(result_index_build['idMap'])}")

for pair in image_label_pairs_to_add:
    with open(pair["image"], "rb") as file:
        image_bytes = file.read()
        image_data = base64.b64encode(image_bytes).decode("ascii")
    pair["image"] = image_data

payload = {"imageLabelPairs": image_label_pairs_to_add, "indexKey": result_index_build["indexKey"]}
resp_index_add = requests.post(f"{API_BASE_URL}/shitu-index-add", json=payload)
if resp_index_add.status_code != 200:
    print(f"Request to shitu-index-add failed with status code {resp_index_add}.")
    pprint.pp(resp_index_add.json())
    sys.exit(1)
result_index_add = resp_index_add.json()["result"]
print(f"Number of images indexed: {len(result_index_add['idMap'])}")

payload = {"ids": ids_to_remove, "indexKey": result_index_build["indexKey"]}
resp_index_remove = requests.post(f"{API_BASE_URL}/shitu-index-remove", json=payload)
if resp_index_remove.status_code != 200:
    print(f"Request to shitu-index-remove failed with status code {resp_index_remove}.")
    pprint.pp(resp_index_remove.json())
    sys.exit(1)
result_index_remove = resp_index_remove.json()["result"]
print(f"Number of images indexed: {len(result_index_remove['idMap'])}")

with open(infer_image_path, "rb") as file:
    image_bytes = file.read()
    image_data = base64.b64encode(image_bytes).decode("ascii")

payload = {"image": image_data, "indexKey": result_index_build["indexKey"]}
resp_infer = requests.post(f"{API_BASE_URL}/shitu-infer", json=payload)
if resp_infer.status_code != 200:
    print(f"Request to shitu-infer failed with status code {resp_infer}.")
    pprint.pp(resp_infer.json())
    sys.exit(1)
result_infer = resp_infer.json()["result"]

with open(output_image_path, "wb") as file:
    file.write(base64.b64decode(result_infer["image"]))
print(f"Output image saved at {output_image_path}")
print("\nDetected objects:")
pprint.pp(result_infer["detectedObjects"])


📱 Edge Deployment: Edge deployment is a method that places computing and data processing functions on user devices themselves, allowing devices to process data directly without relying on remote servers. PaddleX supports deploying models on edge devices such as Android. For detailed edge deployment procedures, refer to the PaddleX Edge Deployment Guide. You can choose the appropriate deployment method for your model pipeline based on your needs and proceed with subsequent AI application integration.

4. Custom Development

If the default model weights provided by the General Image Recognition Pipeline do not meet your expectations in terms of precision or speed. You can further fine-tune the existing models using your own data from specific domains or application scenarios to enhance the recognition performance of the pipeline in your context.

4.1 Model Fine-Tuning

Since the General Image Recognition Pipeline consists of two modules (the mainbody detection module and the image feature module), the suboptimal performance of the pipeline may stem from either module.

You can analyze images with poor recognition results. After analysising, if you find that many mainbody objects are not detected, it may indicate deficiencies in the mainbody detection model. You need to refer to the Custom Development section in the Object Detection Module Development Tutorial and use your private dataset to fine-tune the mainbody detection model. If there are mismatches in the detected mainbody objects, it suggests that the image feature model requires further improvement. You should refer to the Custom Development section in the Image Feature Module Development Tutorial and fine-tune the image feature model.

4.2 Model Application

After you complete the fine-tuning training with your private dataset, you will obtain local model files.

To use the fine-tuned model, you only need to modify the pipeline configuration file by replacing with the paths to your fine-tuned model:

Pipeline:
  device: "gpu:0"
  det_model: "./PP-ShiTuV2_det_infer/"        # Can be modified to the local path of the fine-tuned mainbody detection model
  rec_model: "./PP-ShiTuV2_rec_infer/"        # Can be modified to the local path of the fine-tuned image feature model
  det_batch_size: 1
  rec_batch_size: 1
  device: gpu
Subsequently, refer to the command-line method or Python script method in 2.2 Local Experience to load the modified pipeline configuration file.

5. Multi-Hardware Support

PaddleX supports various mainstream hardware devices such as NVIDIA GPUs, Kunlun XPU, Ascend NPU, and Cambricon MLU. Simply by modifying the --device parameter, seamless switching between different hardware can be achieved.

For example, when running the General Image Recognition Pipeline using Python and changing the running device from an NVIDIA GPU to an Ascend NPU, you only need to modify the device in the script to npu:

from paddlex import create_pipeline

pipeline = create_pipeline(
    pipeline="PP-ShiTuV2",
    device="npu:0" # gpu:0 --> npu:0
)

If you want to use the General Image Recognition Pipeline on more types of hardware, please refer to the PaddleX Multi-Device Usage Guide.

Comments