Skip to content

Multi-Node Deployment

Overview

Multi-node deployment addresses scenarios where a single machine's GPU memory is insufficient to support deployment of large models by enabling tensor parallelism across multiple machines.

Environment Preparation

Network Requirements

  1. All nodes must be within the same local network
  2. Ensure bidirectional connectivity between all nodes (test using ping and nc -zv)

Software Requirements

  1. Install the same version of FastDeploy on all nodes
  2. [Recommended] Install and configure MPI (OpenMPI or MPICH)

Tensor Parallel Deployment

We recommend using mpirun for one-command startup without manually starting each node.

Usage Instructions

  1. Execute the same command on all machines
  2. The IP order in the ips parameter determines the node startup sequence
  3. The first IP will be designated as the master node
  4. Ensure all nodes can resolve each other's hostnames

  5. Online inference startup example:
    shell python -m fastdeploy.entrypoints.openai.api_server \ --model baidu/ERNIE-4.5-300B-A47B-Paddle \ --port 8180 \ --metrics-port 8181 \ --engine-worker-queue-port 8182 \ --max-model-len 32768 \ --max-num-seqs 32 \ --tensor-parallel-size 16 \ --ips 192.168.1.101,192.168.1.102

  6. Offline startup example:
    ```python
    from fastdeploy.engine.sampling_params import SamplingParams
    from fastdeploy.entrypoints.llm import LLM

    model_name_or_path = "baidu/ERNIE-4.5-300B-A47B-Paddle"

    sampling_params = SamplingParams(temperature=0.1, max_tokens=30)
    llm = LLM(model=model_name_or_path, tensor_parallel_size=16, ips="192.168.1.101,192.168.1.102")
    if llm._check_master():
    output = llm.generate(prompts="Who are you?", use_tqdm=True, sampling_params=sampling_params)
    print(output)
    ```

  7. Notes:

  8. Only the master node can receive completion requests
  9. Always send requests to the master node (the first IP in the ips list)
  10. The master node will distribute workloads across all nodes

Parameter Description

ips Parameter

  • Type: string
  • Format: Comma-separated IPv4 addresses
  • Description: Specifies the IP addresses of all nodes in the deployment group
  • Required: Only for multi-node deployments
  • Example: "192.168.1.101,192.168.1.102,192.168.1.103"

tensor_parallel_size Parameter

  • Type: integer
  • Description: Total number of GPUs across all nodes
  • Required: Yes
  • Example: For 2 nodes with 8 GPUs each, set to 16