Multi-Node Deployment

Overview

Multi-node deployment addresses scenarios where a single machine's GPU memory is insufficient to support deployment of large models by enabling tensor parallelism across multiple machines.

Environment Preparation

Network Requirements

All nodes must be within the same local network
Ensure bidirectional connectivity between all nodes (test using ping and nc -zv)

Software Requirements

Install the same version of FastDeploy on all nodes
[Recommended] Install and configure MPI (OpenMPI or MPICH)

Tensor Parallel Deployment

Recommended Launch Method

We recommend using mpirun for one-command startup without manually starting each node.

Usage Instructions

Execute the same command on all machines
The IP order in the ips parameter determines the node startup sequence
The first IP will be designated as the master node
Ensure all nodes can resolve each other's hostnames
Online inference startup example:

shell python -m fastdeploy.entrypoints.openai.api_server \ --model baidu/ERNIE-4.5-300B-A47B-Paddle \ --port 8180 \ --metrics-port 8181 \ --engine-worker-queue-port 8182 \ --max-model-len 32768 \ --max-num-seqs 32 \ --tensor-parallel-size 16 \ --ips 192.168.1.101,192.168.1.102
Offline startup example:

```python from fastdeploy.engine.sampling_params import SamplingParams from fastdeploy.entrypoints.llm import LLM

model_name_or_path = "baidu/ERNIE-4.5-300B-A47B-Paddle"

sampling_params = SamplingParams(temperature=0.1, max_tokens=30) llm = LLM(model=model_name_or_path, tensor_parallel_size=16, ips="192.168.1.101,192.168.1.102") if llm._check_master(): output = llm.generate(prompts="Who are you?", use_tqdm=True, sampling_params=sampling_params) print(output) ```
Notes:
Only the master node can receive completion requests
Always send requests to the master node (the first IP in the ips list)
The master node will distribute workloads across all nodes

Parameter Description

`ips` Parameter

Type: string
Format: Comma-separated IPv4 addresses
Description: Specifies the IP addresses of all nodes in the deployment group
Required: Only for multi-node deployments
Example: "192.168.1.101,192.168.1.102,192.168.1.103"

`tensor_parallel_size` Parameter

Type: integer
Description: Total number of GPUs across all nodes
Required: Yes
Example: For 2 nodes with 8 GPUs each, set to 16