Model Inference Benchmark¶
Table of Contents¶
- 1. Instructions
- 2. Usage Examples
- 2.1 Command Line Method
- 2.2 Python Script Method
- 3. Explanation of Results
1. Instructions¶
The benchmark feature collects the average execution time per iteration for each operation in the end-to-end model inference process as well as the average execution time per instance, and provides summary information. The time measurements are in milliseconds.
To enable the benchmark feature, you must set the following environment variables:
PADDLE_PDX_INFER_BENCHMARK
: When set toTrue
, the benchmark feature is enabled (default isFalse
).PADDLE_PDX_INFER_BENCHMARK_WARMUP
: The number of warm-up iterations before testing (default is0
).PADDLE_PDX_INFER_BENCHMARK_ITERS
: The number of iterations for testing (default is0
).PADDLE_PDX_INFER_BENCHMARK_OUTPUT_DIR
: The directory where the metrics are saved (e.g.,./benchmark
). The default isNone
, meaning the benchmark metrics will not be saved.PADDLE_PDX_INFER_BENCHMARK_USE_CACHE_FOR_READ
: When set toTrue
, the caching mechanism is applied to the operation of reading input data to avoid repetitive I/O overhead, and the time consumed by data read and cache is not recorded in the core time (default isFalse
).PADDLE_PDX_INFER_BENCHMARK_USE_NEW_INFER_API
: When set toTrue
,the new inference API is enabled, providing more detailed information for inference operations on benchmarks (default isFalse
).
Note:
- At least one of
PADDLE_PDX_INFER_BENCHMARK_WARMUP
orPADDLE_PDX_INFER_BENCHMARK_ITERS
must be set to a value greater than zero; otherwise, the benchmark feature cannot be used. - The benchmark feature does not currently apply to model pipelines.
2. Usage Examples¶
You can use the benchmark feature by either the command line method or the Python script method.
2.1 Command Line Method¶
Note:
- For a description of the input parameters, please refer to the PaddleX Common Model Configuration File Parameter Explanation.
- If
batch_size
is greater than 1, the input data will be duplicatedbatch_size
times to match the size ofbatch_size
.
Execute the command:
PADDLE_PDX_INFER_BENCHMARK=True \
PADDLE_PDX_INFER_BENCHMARK_WARMUP=5 \
PADDLE_PDX_INFER_BENCHMARK_ITERS=10 \
PADDLE_PDX_INFER_BENCHMARK_OUTPUT_DIR=./benchmark \
python main.py \
-c ./paddlex/configs/modules/object_detection/PicoDet-XS.yaml \
-o Global.mode=predict \
-o Predict.model_dir=None \
-o Predict.batch_size=2 \
-o Predict.input=./test.png
2.2 Python Script Method¶
Note:
- For a description of the input parameters, please refer to the PaddleX Single Model Python Usage Instructions.
- If
batch_size
is greater than 1, the input data will be duplicatedbatch_size
times to match the size ofbatch_size
.
Create the script test_infer.py
:
from paddlex import create_model
model = create_model(model_name="PicoDet-XS", model_dir=None)
output = list(model.predict(input="./test.png", batch_size=2))
Run the script:
PADDLE_PDX_INFER_BENCHMARK=True \
PADDLE_PDX_INFER_BENCHMARK_WARMUP=5 \
PADDLE_PDX_INFER_BENCHMARK_ITERS=10 \
PADDLE_PDX_INFER_BENCHMARK_OUTPUT_DIR=./benchmark \
python test_infer.py
3. Explanation of Results¶
After enabling the benchmark feature, the benchmark results will be automatically printed. The details are as follows:
Field Name | Field Description |
---|---|
Iters | Number of iterations, i.e., the number of times inference is executed in a loop. |
Batch Size | Batch size, i.e., the number of instances processed in each iteration. |
Instances | Total number of instances, calculated as Iters multiplied by Batch Size . |
Operation | Name of the operation, such as Resize , Normalize , etc. |
Type | Type of time consumption, including:
|
Avg Time Per Iter (ms) | Average execution time per iteration, in milliseconds. |
Avg Time Per Instance (ms) | Average execution time per instance, in milliseconds. |
Below is an example of the benchmark results obtained by running the example program in Section 2:
Warmup Data
+-------+------------+-----------+----------------+------------------------+----------------------------+
| Iters | Batch Size | Instances | Type | Avg Time Per Iter (ms) | Avg Time Per Instance (ms) |
+-------+------------+-----------+----------------+------------------------+----------------------------+
| 5 | 2 | 10 | Preprocessing | 97.89338876 | 48.94669438 |
| 5 | 2 | 10 | Inference | 66.70711380 | 33.35355690 |
| 5 | 2 | 10 | Postprocessing | 0.20138482 | 0.10069241 |
| 5 | 2 | 10 | Core | 164.80188738 | 82.40094369 |
| 5 | 2 | 10 | Other | 3.41097047 | 1.70548523 |
| 5 | 2 | 10 | End-to-End | 168.21285784 | 84.10642892 |
+-------+------------+-----------+----------------+------------------------+----------------------------+
Operation Info
+--------------------+----------------------------------------------------------------------+
| Operation | Source Code Location |
+--------------------+----------------------------------------------------------------------+
| ReadImage | /PaddleX/paddlex/inference/models/object_detection/processors.py:34 |
| Resize | /PaddleX/paddlex/inference/models/object_detection/processors.py:99 |
| Normalize | /PaddleX/paddlex/inference/models/object_detection/processors.py:145 |
| ToCHWImage | /PaddleX/paddlex/inference/models/object_detection/processors.py:158 |
| ToBatch | /PaddleX/paddlex/inference/models/object_detection/processors.py:216 |
| PaddleCopyToDevice | /PaddleX/paddlex/inference/models/common/static_infer.py:214 |
| PaddleModelInfer | /PaddleX/paddlex/inference/models/common/static_infer.py:234 |
| PaddleCopyToHost | /PaddleX/paddlex/inference/models/common/static_infer.py:223 |
| DetPostProcess | /PaddleX/paddlex/inference/models/object_detection/processors.py:773 |
+--------------------+----------------------------------------------------------------------+
Detail Data
+-------+------------+-----------+--------------------+------------------------+----------------------------+
| Iters | Batch Size | Instances | Operation | Avg Time Per Iter (ms) | Avg Time Per Instance (ms) |
+-------+------------+-----------+--------------------+------------------------+----------------------------+
| 10 | 2 | 20 | ReadImage | 76.22221033 | 38.11110517 |
| 10 | 2 | 20 | Resize | 12.02824502 | 6.01412251 |
| 10 | 2 | 20 | Normalize | 6.14072606 | 3.07036303 |
| 10 | 2 | 20 | ToCHWImage | 0.00533939 | 0.00266969 |
| 10 | 2 | 20 | ToBatch | 0.93134162 | 0.46567081 |
| 10 | 2 | 20 | PaddleCopyToDevice | 0.92240779 | 0.46120390 |
| 10 | 2 | 20 | PaddleModelInfer | 9.66330138 | 4.83165069 |
| 10 | 2 | 20 | PaddleCopyToHost | 0.06802108 | 0.03401054 |
| 10 | 2 | 20 | DetPostProcess | 0.18665448 | 0.09332724 |
+-------+------------+-----------+--------------------+------------------------+----------------------------+
Summary Data
+-------+------------+-----------+----------------+------------------------+----------------------------+
| Iters | Batch Size | Instances | Type | Avg Time Per Iter (ms) | Avg Time Per Instance (ms) |
+-------+------------+-----------+----------------+------------------------+----------------------------+
| 10 | 2 | 20 | Preprocessing | 95.32786242 | 47.66393121 |
| 10 | 2 | 20 | Inference | 10.65373025 | 5.32686512 |
| 10 | 2 | 20 | Postprocessing | 0.18665448 | 0.09332724 |
| 10 | 2 | 20 | Core | 106.16824715 | 53.08412358 |
| 10 | 2 | 20 | Other | 2.74794563 | 1.37397281 |
| 10 | 2 | 20 | End-to-End | 108.91619278 | 54.45809639 |
+-------+------------+-----------+----------------+------------------------+----------------------------+
Additionally, since PADDLE_PDX_INFER_BENCHMARK_OUTPUT_DIR=./benchmark
is set, the above results will be saved locally in ./benchmark/detail.csv
and ./benchmark/summary.csv
.
The contents of detail.csv
are as follows:
Iters,Batch Size,Instances,Operation,Avg Time Per Iter (ms),Avg Time Per Instance (ms)
10,2,20,ReadImage,76.22221033,38.11110517
10,2,20,Resize,12.02824502,6.01412251
10,2,20,Normalize,6.14072606,3.07036303
10,2,20,ToCHWImage,0.00533939,0.00266969
10,2,20,ToBatch,0.93134162,0.46567081
10,2,20,PaddleCopyToDevice,0.92240779,0.46120390
10,2,20,PaddleModelInfer,9.66330138,4.83165069
10,2,20,PaddleCopyToHost,0.06802108,0.03401054
10,2,20,DetPostProcess,0.18665448,0.09332724
The contents of summary.csv
are as follows:
Iters,Batch Size,Instances,Type,Avg Time Per Iter (ms),Avg Time Per Instance (ms)
10,2,20,Preprocessing,95.32786242,47.66393121
10,2,20,Inference,10.65373025,5.32686512
10,2,20,Postprocessing,0.18665448,0.09332724
10,2,20,Core,106.16824715,53.08412358
10,2,20,Other,2.74794563,1.37397281
10,2,20,End-to-End,108.91619278,54.45809639