Multi-Node Deployment
Overview
Multi-node deployment addresses scenarios where a single machine's GPU memory is insufficient to support deployment of large models by enabling tensor parallelism across multiple machines.
Environment Preparation
Network Requirements
- All nodes must be within the same local network
- Ensure bidirectional connectivity between all nodes (test using
ping
andnc -zv
)
Software Requirements
- Install the same version of FastDeploy on all nodes
- [Recommended] Install and configure MPI (OpenMPI or MPICH)
Tensor Parallel Deployment
Recommended Launch Method
We recommend using mpirun for one-command startup without manually starting each node.
Usage Instructions
- Execute the same command on all machines
- The IP order in the
ips
parameter determines the node startup sequence - The first IP will be designated as the master node
-
Ensure all nodes can resolve each other's hostnames
-
Online inference startup example:
shell python -m fastdeploy.entrypoints.openai.api_server \ --model baidu/ERNIE-4.5-300B-A47B-Paddle \ --port 8180 \ --metrics-port 8181 \ --engine-worker-queue-port 8182 \ --max-model-len 32768 \ --max-num-seqs 32 \ --tensor-parallel-size 16 \ --ips 192.168.1.101,192.168.1.102
-
Offline startup example:
```python
from fastdeploy.engine.sampling_params import SamplingParams
from fastdeploy.entrypoints.llm import LLMmodel_name_or_path = "baidu/ERNIE-4.5-300B-A47B-Paddle"
sampling_params = SamplingParams(temperature=0.1, max_tokens=30)
llm = LLM(model=model_name_or_path, tensor_parallel_size=16, ips="192.168.1.101,192.168.1.102")
if llm._check_master():
output = llm.generate(prompts="Who are you?", use_tqdm=True, sampling_params=sampling_params)
print(output)
``` -
Notes:
- Only the master node can receive completion requests
- Always send requests to the master node (the first IP in the ips list)
- The master node will distribute workloads across all nodes
Parameter Description
ips
Parameter
- Type:
string
- Format: Comma-separated IPv4 addresses
- Description: Specifies the IP addresses of all nodes in the deployment group
- Required: Only for multi-node deployments
- Example:
"192.168.1.101,192.168.1.102,192.168.1.103"
tensor_parallel_size
Parameter
- Type:
integer
- Description: Total number of GPUs across all nodes
- Required: Yes
- Example: For 2 nodes with 8 GPUs each, set to 16