Model Inference Benchmark¶

Table of Contents¶

1. Instructions
2. Usage Examples
2.1 Command Line Method
2.2 Python Script Method
3. Explanation of Results

1. Instructions¶

The benchmark feature collects the average execution time per iteration for each operation in the end-to-end model inference process as well as the average execution time per instance, and provides summary information. The time measurements are in milliseconds.

To enable the benchmark feature, you must set the following environment variables:

PADDLE_PDX_INFER_BENCHMARK: When set to True, the benchmark feature is enabled (default is False).
PADDLE_PDX_INFER_BENCHMARK_WARMUP: The number of warm-up iterations before testing (default is 0).
PADDLE_PDX_INFER_BENCHMARK_ITERS: The number of iterations for testing (default is 0).
PADDLE_PDX_INFER_BENCHMARK_OUTPUT_DIR: The directory where the metrics are saved (e.g., ./benchmark). The default is None, meaning the benchmark metrics will not be saved.
PADDLE_PDX_INFER_BENCHMARK_USE_CACHE_FOR_READ: When set to True, the caching mechanism is applied to the operation of reading input data to avoid repetitive I/O overhead, and the time consumed by data read and cache is not recorded in the core time (default is False).
PADDLE_PDX_INFER_BENCHMARK_USE_NEW_INFER_API: When set to True,the new inference API is enabled, providing more detailed information for inference operations on benchmarks (default is False).

Note:

At least one of PADDLE_PDX_INFER_BENCHMARK_WARMUP or PADDLE_PDX_INFER_BENCHMARK_ITERS must be set to a value greater than zero; otherwise, the benchmark feature cannot be used.
The benchmark feature does not currently apply to model pipelines.

2. Usage Examples¶

You can use the benchmark feature by either the command line method or the Python script method.

2.1 Command Line Method¶

Note:

For a description of the input parameters, please refer to the PaddleX Common Model Configuration File Parameter Explanation.
If batch_size is greater than 1, the input data will be duplicated batch_size times to match the size of batch_size.

Execute the command:

PADDLE_PDX_INFER_BENCHMARK=True \
PADDLE_PDX_INFER_BENCHMARK_WARMUP=5 \
PADDLE_PDX_INFER_BENCHMARK_ITERS=10 \
PADDLE_PDX_INFER_BENCHMARK_OUTPUT_DIR=./benchmark \
python main.py \
    -c ./paddlex/configs/modules/object_detection/PicoDet-XS.yaml \
    -o Global.mode=predict \
    -o Predict.model_dir=None \
    -o Predict.batch_size=2 \
    -o Predict.input=./test.png

2.2 Python Script Method¶

Note:

For a description of the input parameters, please refer to the PaddleX Single Model Python Usage Instructions.
If batch_size is greater than 1, the input data will be duplicated batch_size times to match the size of batch_size.

Create the script test_infer.py:

from paddlex import create_model

model = create_model(model_name="PicoDet-XS", model_dir=None)
output = list(model.predict(input="./test.png", batch_size=2))

Run the script:

PADDLE_PDX_INFER_BENCHMARK=True \
PADDLE_PDX_INFER_BENCHMARK_WARMUP=5 \
PADDLE_PDX_INFER_BENCHMARK_ITERS=10 \
PADDLE_PDX_INFER_BENCHMARK_OUTPUT_DIR=./benchmark \
python test_infer.py

3. Explanation of Results¶

After enabling the benchmark feature, the benchmark results will be automatically printed. The details are as follows:

Field Name	Field Description
Iters	Number of iterations, i.e., the number of times inference is executed in a loop.
Batch Size	Batch size, i.e., the number of instances processed in each iteration.
Instances	Total number of instances, calculated as `Iters` multiplied by `Batch Size`.
Operation	Name of the operation, such as `Resize`, `Normalize`, etc.
Type	Type of time consumption, including: preprocessing time (`Preprocessing`) model inference time (`Inference`) postprocessing time (`Postprocessing`) core time (`Core`, i.e., Preprocessing + Inference + Postprocessing) other time (`Other`, e.g., the time taken to run the code that orchestrates the operations, and the extra overhead introduced by the benchmark feature) end-to-end time (`End-to-End`, i.e., Core + Other)
Avg Time Per Iter (ms)	Average execution time per iteration, in milliseconds.
Avg Time Per Instance (ms)	Average execution time per instance, in milliseconds.

Below is an example of the benchmark results obtained by running the example program in Section 2:

                                               Warmup Data
+-------+------------+-----------+----------------+------------------------+----------------------------+
| Iters | Batch Size | Instances |      Type      | Avg Time Per Iter (ms) | Avg Time Per Instance (ms) |
+-------+------------+-----------+----------------+------------------------+----------------------------+
|   5   |     2      |     10    | Preprocessing  |      97.89338876       |        48.94669438         |
|   5   |     2      |     10    |   Inference    |      66.70711380       |        33.35355690         |
|   5   |     2      |     10    | Postprocessing |       0.20138482       |         0.10069241         |
|   5   |     2      |     10    |      Core      |      164.80188738      |        82.40094369         |
|   5   |     2      |     10    |     Other      |       3.41097047       |         1.70548523         |
|   5   |     2      |     10    |   End-to-End   |      168.21285784      |        84.10642892         |
+-------+------------+-----------+----------------+------------------------+----------------------------+
                                           Operation Info
+--------------------+----------------------------------------------------------------------+
|     Operation      |                         Source Code Location                         |
+--------------------+----------------------------------------------------------------------+
|     ReadImage      | /PaddleX/paddlex/inference/models/object_detection/processors.py:34  |
|       Resize       | /PaddleX/paddlex/inference/models/object_detection/processors.py:99  |
|     Normalize      | /PaddleX/paddlex/inference/models/object_detection/processors.py:145 |
|     ToCHWImage     | /PaddleX/paddlex/inference/models/object_detection/processors.py:158 |
|      ToBatch       | /PaddleX/paddlex/inference/models/object_detection/processors.py:216 |
| PaddleCopyToDevice |     /PaddleX/paddlex/inference/models/common/static_infer.py:214     |
|  PaddleModelInfer  |     /PaddleX/paddlex/inference/models/common/static_infer.py:234     |
|  PaddleCopyToHost  |     /PaddleX/paddlex/inference/models/common/static_infer.py:223     |
|   DetPostProcess   | /PaddleX/paddlex/inference/models/object_detection/processors.py:773 |
+--------------------+----------------------------------------------------------------------+
                                                 Detail Data
+-------+------------+-----------+--------------------+------------------------+----------------------------+
| Iters | Batch Size | Instances |     Operation      | Avg Time Per Iter (ms) | Avg Time Per Instance (ms) |
+-------+------------+-----------+--------------------+------------------------+----------------------------+
|   10  |     2      |     20    |     ReadImage      |      76.22221033       |        38.11110517         |
|   10  |     2      |     20    |       Resize       |      12.02824502       |         6.01412251         |
|   10  |     2      |     20    |     Normalize      |       6.14072606       |         3.07036303         |
|   10  |     2      |     20    |     ToCHWImage     |       0.00533939       |         0.00266969         |
|   10  |     2      |     20    |      ToBatch       |       0.93134162       |         0.46567081         |
|   10  |     2      |     20    | PaddleCopyToDevice |       0.92240779       |         0.46120390         |
|   10  |     2      |     20    |  PaddleModelInfer  |       9.66330138       |         4.83165069         |
|   10  |     2      |     20    |  PaddleCopyToHost  |       0.06802108       |         0.03401054         |
|   10  |     2      |     20    |   DetPostProcess   |       0.18665448       |         0.09332724         |
+-------+------------+-----------+--------------------+------------------------+----------------------------+
                                               Summary Data
+-------+------------+-----------+----------------+------------------------+----------------------------+
| Iters | Batch Size | Instances |      Type      | Avg Time Per Iter (ms) | Avg Time Per Instance (ms) |
+-------+------------+-----------+----------------+------------------------+----------------------------+
|   10  |     2      |     20    | Preprocessing  |      95.32786242       |        47.66393121         |
|   10  |     2      |     20    |   Inference    |      10.65373025       |         5.32686512         |
|   10  |     2      |     20    | Postprocessing |       0.18665448       |         0.09332724         |
|   10  |     2      |     20    |      Core      |      106.16824715      |        53.08412358         |
|   10  |     2      |     20    |     Other      |       2.74794563       |         1.37397281         |
|   10  |     2      |     20    |   End-to-End   |      108.91619278      |        54.45809639         |
+-------+------------+-----------+----------------+------------------------+----------------------------+

Additionally, since PADDLE_PDX_INFER_BENCHMARK_OUTPUT_DIR=./benchmark is set, the above results will be saved locally in ./benchmark/detail.csv and ./benchmark/summary.csv.

The contents of detail.csv are as follows:

Iters,Batch Size,Instances,Operation,Avg Time Per Iter (ms),Avg Time Per Instance (ms)
10,2,20,ReadImage,76.22221033,38.11110517
10,2,20,Resize,12.02824502,6.01412251
10,2,20,Normalize,6.14072606,3.07036303
10,2,20,ToCHWImage,0.00533939,0.00266969
10,2,20,ToBatch,0.93134162,0.46567081
10,2,20,PaddleCopyToDevice,0.92240779,0.46120390
10,2,20,PaddleModelInfer,9.66330138,4.83165069
10,2,20,PaddleCopyToHost,0.06802108,0.03401054
10,2,20,DetPostProcess,0.18665448,0.09332724

The contents of summary.csv are as follows:

Iters,Batch Size,Instances,Type,Avg Time Per Iter (ms),Avg Time Per Instance (ms)
10,2,20,Preprocessing,95.32786242,47.66393121
10,2,20,Inference,10.65373025,5.32686512
10,2,20,Postprocessing,0.18665448,0.09332724
10,2,20,Core,106.16824715,53.08412358
10,2,20,Other,2.74794563,1.37397281
10,2,20,End-to-End,108.91619278,54.45809639