General Image Recognition Pipeline Usage Tutorial¶
1. Introduction to the General Image Recognition Pipeline¶
The General Image Recognition Pipeline aims to solve the problem of open-domain object localization and recognition. Currently, PaddleX's General Image Recognition Pipeline supports PP-ShiTuV2.
PP-ShiTuV2 is a practical general image recognition system mainly composed of three modules: mainbody detection module, image feature module, and vector retrieval module. The system integrates and improves various strategies in multiple aspects, including backbone network, loss function, data augmentation, learning rate scheduling, regularization, pre-trained model, and model pruning and quantization. It optimizes each module and ultimately achieves better performance in multiple application scenarios.
The General Image Recognition Pipeline includes the mainbody detection module and the image feature module, with several models to choose. You can select the model to use based on the benchmark data below. If you prioritize model precision, choose a model with higher precision. If you prioritize inference speed, choose a model with faster inference. If you prioritize model storage size, choose a model with a smaller storage size.
Object Detection Module:
Model | mAP(0.5:0.95) | mAP(0.5) | GPU Inference Time (ms) | CPU Inference Time (ms) | Model Size (M) | Description |
---|---|---|---|---|---|---|
PP-ShiTuV2_det | 41.5 | 62.0 | 33.7 | 537.0 | 27.54 | An mainbody detection model based on PicoDet_LCNet_x2_5, which may detect multiple common objects simultaneously. |
Note: The above accuracy metrics are based on the private mainbody detection dataset.
Image Feature Module:
Model | Recall@1 (%) | GPU Inference Time (ms) | CPU Inference Time (ms) | Model Size (M) | Description |
---|---|---|---|---|---|
PP-ShiTuV2_rec | 84.2 | 5.23428 | 19.6005 | 16.3 M | PP-ShiTuV2 is a general image feature system consisting of three modules: mainbody detection, feature extraction, and vector retrieval. These models are part of the feature extraction module, and different models can be selected based on system requirements. |
PP-ShiTuV2_rec_CLIP_vit_base | 88.69 | 13.1957 | 285.493 | 306.6 M | |
PP-ShiTuV2_rec_CLIP_vit_large | 91.03 | 51.1284 | 1131.28 | 1.05 G |
Note: The above accuracy metrics are based on AliProducts Recall@1. All GPU inference times are based on NVIDIA Tesla T4 machines with FP32 precision. CPU inference speeds are based on Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision.
2. Quick Start¶
The pre-trained model pipelines provided by PaddleX can be quickly experienced. You can use Python to experience locally.
2.1 Online Experience¶
Not supported yet.
2.2 Local Experience¶
❗ Before using the General Image Recognition Pipeline locally, please ensure you have installed the PaddleX wheel package according to the PaddleX Installation Tutorial.
2.2.1 Command Line Experience¶
The pipeline does not support command line experience at this time.
By default, the built-in General Image Recognition Pipeline configuration file is used. If you want to change it, you can run the following command to obtain:
👉Click to Expand
paddlex --get_pipeline_config PP-ShiTuV2
After execution, the General Image Recognition Pipeline configuration file will be saved in the current directory. If you want to customize the save location, you can run the following command (assuming the custom save location is ./my_path
):
paddlex --get_pipeline_config PP-ShiTuV2 --save_path ./my_path
2.2.2 Python Script Integration¶
- In the example of using this pipeline, a feature vector library needs to be built beforehand. You can download the officially provided drink recognition test dataset drink_dataset_v2.0 to build the feature vector library. If you want to use a private dataset, you can refer to Section 2.3 Data Organization for Building the Feature Library. After that, you can quickly build the feature vector library and predict using the General Image Recognition Pipeline with just a few lines of code.
from paddlex import create_pipeline
pipeline = create_pipeline(pipeline="PP-ShiTuV2")
pipeline.build_index(data_root="drink_dataset_v2.0/", index_dir="index_dir")
output = pipeline.predict("./drink_dataset_v2.0/test_images/", index_dir="index_dir")
for res in output:
res.print()
res.save_to_img("./output/")
````
In the above Python script, the following steps are executed:
(1) Call the `create_pipeline` function to create a general image recognition pipeline object. The specific parameter descriptions are as follows:
<table>
<thead>
<tr>
<th>Parameter</th>
<th>Parameter Description</th>
<th>Parameter Type</th>
<th>Default Value</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>pipeline</code></td>
<td>The name of the pipeline or the path to the pipeline configuration file. If it is the name of the pipeline, it must be a pipeline supported by PaddleX.</td>
<td><code>str</code></td>
<td>None</td>
</tr>
<tr>
<td><code>index_dir</code></td>
<td>The directory where the retrieval database files used for pipeline inference are located. If this parameter is not passed, <code>index_dir</code> needs to be specified in <code>predict()</code>.</td>
<td><code>str</code></td>
<td>None</td>
</tr>
<tr>
<td><code>device</code></td>
<td>The inference device for the pipeline model. Supports: "gpu", "cpu".</td>
<td><code>str</code></td>
<td><code>gpu</code></td>
</tr>
<tr>
<td><code>use_hpip</code></td>
<td>Whether to enable high-performance inference, which is only available when the pipeline supports it.</td>
<td><code>bool</code></td>
<td><code>False</code></td>
</tr>
</tbody>
</table>
(2) Call the `build_index` function of the general image recognition pipeline object to build the feature vector library. The specific parameters are described as follows:
<table>
<thead>
<tr>
<th>Parameter</th>
<th>Parameter Description</th>
<th>Parameter Type</th>
<th>Default Value</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>data_root</code></td>
<td>The root directory of the dataset. The data organization method refers to <a href="#2.3-Data-Organization-for-Building-the-Feature-Library">Section 2.3 Data Organization for Building the Feature Library</a></td>
<td><code>str</code></td>
<td>None</td>
</tr>
<tr>
<td><code>index_dir</code></td>
<td>The save path for the feature library. After successfully calling the <code>build_index</code> function, two files will be generated in this path: <code>"id_map.pkl"</code> saves the mapping relationship between image IDs and image feature labels; <code>"vector.index"</code> stores the feature vectors of each image.</td>
<td><code>str</code></td>
<td>None</td>
</tr>
</tbody>
</table>
(3) Call the `predict` function of the general image recognition pipeline object for inference prediction: The `predict` function parameter is `input`, which is used to input the data to be predicted, supporting multiple input methods. Specific examples are as follows:
<table>
<thead>
<tr>
<th>Parameter Type</th>
<th>Parameter Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>Python Var</td>
<td>Supports directly passing in Python variables, such as <code>numpy.ndarray</code> representing image data.</td>
</tr>
<tr>
<td>str</td>
<td>Supports passing in the file path of the data to be predicted, such as the local path of an image file: <code>/root/data/img.jpg</code>.</td>
</tr>
<tr>
<td>str</td>
<td>Supports passing in the URL of the data file to be predicted, such as the network URL of an image file: <a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/yuanqisenlin.jpeg">Example</a>.</td>
</tr>
<tr>
<td>str</td>
<td>Supports passing in a local directory that contains the data files to be predicted, such as the local path: <code>/root/data/</code>.</td>
</tr>
<tr>
<td>dict</td>
<td>Supports passing in a dictionary type, where the key needs to correspond to the specific task, such as "img" for image classification tasks. The value of the dictionary supports the above types of data, for example: <code>{"img": "/root/data1"}</code>.</td>
</tr>
<tr>
<td>list</td>
<td>Supports passing in a list, where the elements of the list need to be the above types of data, such as <code>[numpy.ndarray, numpy.ndarray], ["/root/data/img1.jpg", "/root/data/img2.jpg"]</code>, <code>["/root/data1", "/root/data2"]</code>, <code>[{"img": "/root/data1"}, {"img": "/root/data2/img.jpg"}]</code>.</td>
</tr>
</tbody>
</table>
Additionally, the `predict` method supports the `index_dir` parameter for setting the retrieval database:
<table>
<thead>
<tr>
<th>Parameter Type</th>
<th>Parameter Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>index_dir</code></td>
<td>The directory where the retrieval database files used for pipeline inference are located. If this parameter is not passed, the default retrieval database specified through the <code>index_dir</code> parameter in <code>create_pipeline()</code> will be used.</td>
</tr>
</tbody>
</table>
(4) Obtain the prediction results by calling the `predict` method: The `predict` method is a `generator`, so prediction results need to be obtained by iteration. The `predict` method predicts data in batches.
(5) Process the prediction results: The prediction result for each sample is of `dict` type and supports printing or saving to a file. The supported save types are related to the specific pipeline, such as:
<table>
<thead>
<tr>
<th>Method</th>
<th>Description</th>
<th>Method Parameters</th>
</tr>
</thead>
<tbody>
<tr>
<td>print</td>
<td>Print the results to the terminal</td>
<td><code>- format_json</code>: bool type, whether to use json indentation formatting for the output content, default is True;<br><code>- indent</code>: int type, json formatting setting, only effective when format_json is True, default is 4;<br><code>- ensure_ascii</code>: bool type, json formatting setting, only effective when format_json is True, default is False;</td>
</tr>
<tr>
<td>save_to_json</td>
<td>Save the results as a json-formatted file</td>
<td><code>- save_path</code>: str type, the save file path. When it is a directory, the saved file naming is consistent with the input file type naming;<br><code>- indent</code>: int type, json formatting setting, default is 4;<br><code>- ensure_ascii</code>: bool type, json formatting setting, default is False;</td>
</tr>
<tr>
<td>save_to_img</td>
<td>Save the results as an image-formatted file</td>
<td><code>- save_path</code>: str type, the save file path. When it is a directory, the saved file naming is consistent with the input file type naming;</td>
</tr>
</tbody>
</table>
If you have a configuration file, you can customize the configurations for the general image recognition pipeline by modifying the `pipeline` parameter value in the `create_pipeline` method to the path of the pipeline configuration file.
For example, if your configuration file is saved at `./my_path/PP-ShiTuV2.yaml`, you only need to execute:
```python
from paddlex import create_pipeline
pipeline = create_pipeline(pipeline="./my_path/PP-ShiTuV2.yaml", index_dir="index_dir")
output = pipeline.predict("./drink_dataset_v2.0/test_images/")
for res in output:
res.print()
res.save_to_img("./output/")
2.2.3 Add or Remove Features from the Feature Library¶
If you want to add more images to the feature library, you can call the append_index
function; to remove image features, you can call the remove_index
function.
from paddlex import create_pipeline
pipeline = create_pipeline("PP-ShiTuV2")
pipeline.build_index(data_root="drink_dataset_v2.0/", index_dir="index_dir", index_type="IVF")
pipeline.append_index(data_root="drink_dataset_v2.0/", index_dir="index_dir", index_type="IVF")
pipeline.remove_index(data_root="drink_dataset_v2.0/", index_dir="index_dir", index_type="IVF")
The parameter descriptions for the above methods are as follows:
Parameter | Description | Type | Default Value |
---|---|---|---|
data_root |
The root directory of the dataset to be added. The data organization should be the same as when building the feature library, refer to Section 2.3 Data Organization for Building the Feature Library | str |
None |
index_dir |
The storage directory for the feature library. In append_index and remove_index , it is also the path of the feature library to be modified (or deleted). |
str |
None |
index_type |
Supports HNSW32 , IVF , Flat . Among them, HNSW32 has faster retrieval speed and higher accuracy but does not support the remove_index() operation; IVF has faster retrieval speed but relatively lower accuracy, and supports append_index() and remove_index() operations; Flat has lower retrieval speed but higher accuracy, and supports append_index() and remove_index() operations. |
str |
HNSW32 |
metric_type |
Supports: IP , Inner Product; L2 , Euclidean Distance. |
str |
IP |
Notice: There may be some compatibility errors when HNSW32
is used to build or predict on Windows.
2.3 Data Organization for Building the Feature Library¶
The PaddleX general image recognition pipeline requires a pre-built feature library for feature retrieval. If you want to build a feature vector library with private data, you need to organize the data as follows:
data_root # Root directory of the dataset, the directory name can be changed
├── images # Directory for saving images, the directory name can be changed
│ │ ...
└── gallery.txt # Annotation file for the feature library dataset, the file name cannot be changed. Each line gives the path of the image to be retrieved and the image label, separated by a space, for example: “0/0.jpg label”
3. Development Integration/Deployment¶
If the pipeline meets your requirements for inference speed and accuracy, you can proceed directly with development integration/deployment.
If you need to apply the pipeline directly in your Python project, refer to the example code in 2.2.2 Python Script Integration.
Additionally, PaddleX provides three other deployment methods, detailed as follows:
🚀 High-Performance Inference: In actual production environments, many applications have stringent standards for the performance metrics of deployment strategies (especially response speed) to ensure efficient system operation and smooth user experience. To this end, PaddleX provides high-performance inference plugins aimed at deeply optimizing model inference and pre/post-processing for significant end-to-end speedups. For detailed high-performance inference procedures, refer to the PaddleX High-Performance Inference Guide.
☁️ Service-Oriented Deployment: Service-oriented deployment is a common deployment form in actual production environments. By encapsulating inference functions as services, clients can access these services through network requests to obtain inference results. PaddleX supports users in achieving low-cost service-oriented deployment of pipelines. For detailed service-oriented deployment procedures, refer to the PaddleX Service-Oriented Deployment Guide.
Below are the API references and multi-language service invocation examples:
API Reference
For main operations provided by the service:
- The HTTP request method is POST.
- The request body and the response body are both JSON data (JSON objects).
- When the request is processed successfully, the response status code is
200
, and the response body properties are as follows:
Name | Type | Meaning |
---|---|---|
errorCode |
integer |
Error code. Fixed to 0 . |
errorMsg |
string |
Error description. Fixed to "Success" . |
The response body may also have a result
property, which is an object
type that stores operation result information.
- When the request is not processed successfully, the properties of the response body are as follows:
Name | Type | Meaning |
---|---|---|
errorCode |
integer |
Error code. Same as the response status code. |
errorMsg |
string |
Error description. |
The main operations provided by the service are as follows:
buildIndex
Build feature vector index.
POST /shitu-index-build
- The properties of the request body are as follows:
Name | Type | Meaning | Required |
---|---|---|---|
imageLabelPairs |
array |
Image-label pairs for building the index. | Yes |
Each element in imageLabelPairs
is an object
with the following properties:
Name | Type | Meaning |
---|---|---|
image |
string |
The URL of an image file accessible by the service, or the Base64 encoding result of the image file content. |
label |
string |
Label. |
- When the request is processed successfully, the
result
of the response body has the following properties:
Name | Type | Meaning |
---|---|---|
indexKey |
string |
The key corresponding to the index, used to identify the established index. Can be used as input for other operations. |
idMap |
object |
Mapping from vector ID to label. |
addImagesToIndex
Add images (corresponding feature vectors) to the index.
POST /shitu-index-add
- The properties of the request body are as follows:
Name | Type | Meaning | Required |
---|---|---|---|
imageLabelPairs |
array |
Image-label pairs for building the index. | Yes |
indexKey |
string |
The key corresponding to the index. Provided by the buildIndex operation. |
Yes |
Each element in imageLabelPairs
is an object
with the following properties:
Name | Type | Meaning |
---|---|---|
image |
string |
The URL of an image file accessible by the service, or the Base64 encoding result of the image file content. |
label |
string |
Label. |
- When the request is processed successfully, the
result
of the response body has the following properties:
Name | Type | Meaning |
---|---|---|
idMap |
object |
Mapping from vector ID to label. |
removeImagesFromIndex
Remove images (corresponding feature vectors) from the index.
POST /shitu-index-remove
- The properties of the request body are as follows:
Name | Type | Meaning | Required |
---|---|---|---|
ids |
array |
IDs of the vectors to be removed from the index. | Yes |
indexKey |
string |
The key corresponding to the index. Provided by the buildIndex operation. |
Yes |
- When the request is processed successfully, the
result
of the response body has the following properties:
Name | Type | Meaning |
---|---|---|
idMap |
object |
Mapping from vector ID to label. |
infer
Perform image recognition.
POST /shitu-infer
- The properties of the request body are as follows:
Name | Type | Meaning | Required |
---|---|---|---|
image |
string |
The URL of an image file accessible by the service, or the Base64 encoding result of the image file content. | Yes |
indexKey |
string |
The key corresponding to the index. Provided by the buildIndex operation. |
No |
- When the request is processed successfully, the
result
of the response body has the following properties:
Name | Type | Meaning |
---|---|---|
detectedObjects |
array |
Information of the detected targets. |
image |
string |
Recognition result image. The image is in JPEG format, encoded with Base64. |
Each element in detectedObjects
is an object
with the following properties:
Name | Type | Meaning |
---|---|---|
bbox |
array |
Target location. The elements in the array are the x-coordinate of the upper-left corner, the y-coordinate of the upper-left corner, the x-coordinate of the lower-right corner, and the y-coordinate of the lower-right corner, respectively. |
recResults |
array |
Recognition results. |
score |
number |
Detection score. |
Each element in recResults
is an object
with the following properties:
Name | Type | Meaning |
---|---|---|
label |
string |
Label. |
score |
number |
Recognition score. |
Multi-Language Service Invocation Examples
Python
import base64
import pprint
import sys
import requests
API_BASE_URL = "http://0.0.0.0:8080"
base_image_label_pairs = [
{"image": "./demo0.jpg", "label": "兔子"},
{"image": "./demo1.jpg", "label": "兔子"},
{"image": "./demo2.jpg", "label": "小狗"},
]
image_label_pairs_to_add = [
{"image": "./demo3.jpg", "label": "小狗"},
]
ids_to_remove = [1]
infer_image_path = "./demo4.jpg"
output_image_path = "./out.jpg"
for pair in base_image_label_pairs:
with open(pair["image"], "rb") as file:
image_bytes = file.read()
image_data = base64.b64encode(image_bytes).decode("ascii")
pair["image"] = image_data
payload = {"imageLabelPairs": base_image_label_pairs}
resp_index_build = requests.post(f"{API_BASE_URL}/shitu-index-build", json=payload)
if resp_index_build.status_code != 200:
print(f"Request to shitu-index-build failed with status code {resp_index_build}.")
pprint.pp(resp_index_build.json())
sys.exit(1)
result_index_build = resp_index_build.json()["result"]
print(f"Number of images indexed: {len(result_index_build['idMap'])}")
for pair in image_label_pairs_to_add:
with open(pair["image"], "rb") as file:
image_bytes = file.read()
image_data = base64.b64encode(image_bytes).decode("ascii")
pair["image"] = image_data
payload = {"imageLabelPairs": image_label_pairs_to_add, "indexKey": result_index_build["indexKey"]}
resp_index_add = requests.post(f"{API_BASE_URL}/shitu-index-add", json=payload)
if resp_index_add.status_code != 200:
print(f"Request to shitu-index-add failed with status code {resp_index_add}.")
pprint.pp(resp_index_add.json())
sys.exit(1)
result_index_add = resp_index_add.json()["result"]
print(f"Number of images indexed: {len(result_index_add['idMap'])}")
payload = {"ids": ids_to_remove, "indexKey": result_index_build["indexKey"]}
resp_index_remove = requests.post(f"{API_BASE_URL}/shitu-index-remove", json=payload)
if resp_index_remove.status_code != 200:
print(f"Request to shitu-index-remove failed with status code {resp_index_remove}.")
pprint.pp(resp_index_remove.json())
sys.exit(1)
result_index_remove = resp_index_remove.json()["result"]
print(f"Number of images indexed: {len(result_index_remove['idMap'])}")
with open(infer_image_path, "rb") as file:
image_bytes = file.read()
image_data = base64.b64encode(image_bytes).decode("ascii")
payload = {"image": image_data, "indexKey": result_index_build["indexKey"]}
resp_infer = requests.post(f"{API_BASE_URL}/shitu-infer", json=payload)
if resp_infer.status_code != 200:
print(f"Request to shitu-infer failed with status code {resp_infer}.")
pprint.pp(resp_infer.json())
sys.exit(1)
result_infer = resp_infer.json()["result"]
with open(output_image_path, "wb") as file:
file.write(base64.b64decode(result_infer["image"]))
print(f"Output image saved at {output_image_path}")
print("\nDetected objects:")
pprint.pp(result_infer["detectedObjects"])
📱 Edge Deployment: Edge deployment is a method that places computing and data processing functions on user devices themselves, allowing devices to process data directly without relying on remote servers. PaddleX supports deploying models on edge devices such as Android. For detailed edge deployment procedures, refer to the PaddleX Edge Deployment Guide. You can choose the appropriate deployment method for your model pipeline based on your needs and proceed with subsequent AI application integration.
4. Custom Development¶
If the default model weights provided by the General Image Recognition Pipeline do not meet your expectations in terms of precision or speed. You can further fine-tune the existing models using your own data from specific domains or application scenarios to enhance the recognition performance of the pipeline in your context.
4.1 Model Fine-Tuning¶
Since the General Image Recognition Pipeline consists of two modules (the mainbody detection module and the image feature module), the suboptimal performance of the pipeline may stem from either module.
You can analyze images with poor recognition results. After analysising, if you find that many mainbody objects are not detected, it may indicate deficiencies in the mainbody detection model. You need to refer to the Custom Development section in the Object Detection Module Development Tutorial and use your private dataset to fine-tune the mainbody detection model. If there are mismatches in the detected mainbody objects, it suggests that the image feature model requires further improvement. You should refer to the Custom Development section in the Image Feature Module Development Tutorial and fine-tune the image feature model.
4.2 Model Application¶
After you complete the fine-tuning training with your private dataset, you will obtain local model files.
To use the fine-tuned model, you only need to modify the pipeline configuration file by replacing with the paths to your fine-tuned model:
Pipeline:
device: "gpu:0"
det_model: "./PP-ShiTuV2_det_infer/" # Can be modified to the local path of the fine-tuned mainbody detection model
rec_model: "./PP-ShiTuV2_rec_infer/" # Can be modified to the local path of the fine-tuned image feature model
det_batch_size: 1
rec_batch_size: 1
device: gpu
5. Multi-Hardware Support¶
PaddleX supports various mainstream hardware devices such as NVIDIA GPUs, Kunlun XPU, Ascend NPU, and Cambricon MLU. Simply by modifying the --device
parameter, seamless switching between different hardware can be achieved.
For example, when running the General Image Recognition Pipeline using Python and changing the running device from an NVIDIA GPU to an Ascend NPU, you only need to modify the device
in the script to npu
:
from paddlex import create_pipeline
pipeline = create_pipeline(
pipeline="PP-ShiTuV2",
device="npu:0" # gpu:0 --> npu:0
)
If you want to use the General Image Recognition Pipeline on more types of hardware, please refer to the PaddleX Multi-Device Usage Guide.