Skip to content

简体中文

NVIDIA CUDA GPU Installation

The following installation methods are available when your environment meets these requirements:

  • GPU Driver >= 535
  • CUDA >= 12.3
  • CUDNN >= 9.5
  • Python >= 3.10
  • Linux X86_64

Notice: The pre-built image supports SM 80/86/89/90 architecture GPUs (e.g. A800/H800/L20/L40/4090), and requires Python 3.10.

# CUDA 12.6
docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/fastdeploy-cuda-12.6:2.5.0
# CUDA 12.9
docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/fastdeploy-cuda-12.9:2.5.0

2. Pre-built Pip Installation

First install paddlepaddle-gpu. For detailed instructions, refer to PaddlePaddle Installation

# Install stable release
# CUDA 12.6
python -m pip install paddlepaddle-gpu==3.3.1 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/
# CUDA 12.9
python -m pip install paddlepaddle-gpu==3.3.1 -i https://www.paddlepaddle.org.cn/packages/stable/cu129/

# Install latest Nightly build
# CUDA 12.6
python -m pip install --pre paddlepaddle-gpu -i https://www.paddlepaddle.org.cn/packages/nightly/cu126/
# CUDA 12.9
python -m pip install --pre paddlepaddle-gpu -i https://www.paddlepaddle.org.cn/packages/nightly/cu129/

Then install fastdeploy. Do not install from PyPI. Use the following methods instead (supports SM80/86/89/90 GPU architectures).

Note: Stable FastDeploy release pairs with stable PaddlePaddle; Nightly Build FastDeploy pairs with Nightly Build PaddlePaddle. The --extra-index-url is only used for downloading fastdeploy-gpu's dependencies; fastdeploy-gpu itself must be installed from the Paddle source specified by -i.

# Install stable release FastDeploy
# CUDA 12.6
python -m pip install fastdeploy-gpu==2.5.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/ --extra-index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple
# CUDA 12.9
python -m pip install fastdeploy-gpu==2.5.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu129/ --extra-index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple

# Install Nightly Build FastDeploy
# CUDA 12.6
python -m pip install fastdeploy-gpu -i https://www.paddlepaddle.org.cn/packages/nightly/cu126/ --extra-index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple
# CUDA 12.9
python -m pip install fastdeploy-gpu -i https://www.paddlepaddle.org.cn/packages/nightly/cu129/ --extra-index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple

3. Build from Source Using Docker

Note: dockerfiles/Dockerfile.gpu currently supports CUDA 12.6 only, targeting SM 80/86/89/90 architectures, and requires Python 3.10. To support other architectures, modify bash build.sh 1 python false [80,90] in the Dockerfile. It's recommended to specify no more than 2 architectures.

git clone https://github.com/PaddlePaddle/FastDeploy
cd FastDeploy

docker build -f dockerfiles/Dockerfile.gpu -t fastdeploy:gpu .

4. Build Wheel from Source

First install paddlepaddle-gpu. For detailed instructions, refer to PaddlePaddle Installation

python -m pip install paddlepaddle-gpu==3.3.1 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/
git clone https://github.com/PaddlePaddle/FastDeploy
cd FastDeploy

# Argument 1: Whether to build wheel package (1 for yes, 0 for compile only)
# Argument 2: Python interpreter path
# Argument 3: Whether to compile CPU inference operators
# Argument 4: Target GPU architectures
bash build.sh 1 python false [80,90]

The built packages will be in the FastDeploy/dist directory.

5. Precompiled Operator Wheel Packages

FastDeploy provides precompiled GPU operator wheel packages for quick setup without building the entire source code. This method currently supports SM80/90 architecture (e.g., A100/H100) CUDA 12.6 and Python 3.10 environments only.

By default, build.sh compiles all custom operators from source.To use the precompiled package, enable it with the FD_USE_PRECOMPILED parameter. If the precompiled package cannot be downloaded or does not match the current environment, the system will automatically fall back to 4. Build Wheel from Source.

First, install paddlepaddle-gpu. For detailed instructions, please refer to the PaddlePaddle Installation Guide.

python -m pip install paddlepaddle-gpu==3.3.1 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/

Then, clone the FastDeploy repository and build using the precompiled operator wheels:

git clone https://github.com/PaddlePaddle/FastDeploy
cd FastDeploy

# Argument 1: Whether to build wheel package (1 for yes)
# Argument 2: Python interpreter path
# Argument 3: Whether to compile CPU inference operators (false for GPU only)
# Argument 4: Target GPU architectures (currently supports 80/90)
# Argument 5: Whether to use precompiled operators (1 for enable)
# Argument 6 (optional): Specific commitID for precompiled operators(The default is the current commit ID.)

# Use precompiled operators for accelerated build
bash build.sh 1 python false [90] 1

# Use precompiled wheel from a specific commit
bash build.sh 1 python false [90] 1 d693d4be1448d414097882386fdc24c8bec2a63a

The downloaded wheel packages will be stored in the FastDeploy/pre_wheel directory. After the build completes, the operator binaries can be found in FastDeploy/fastdeploy/model_executor/ops/gpu.

Notes:

  • This mode prioritizes downloading precompiled GPU operator wheels to reduce build time.
  • Currently supports GPU, SM80/90, CUDA 12.6, Python3.10 only.
  • For custom architectures or modified operator logic, please use source compilation (Section 4).
  • You can check whether the precompiled wheel for a specific commit has been successfully built on the FastDeploy CI Build Status Page.

Environment Verification

After installation, verify the environment with this Python code:

import paddle
from paddle.jit.marker import unified
# Verify GPU availability
paddle.utils.run_check()

If the above code executes successfully, the environment is ready.