Skip to content

简体中文

Supported Models

FastDeploy currently supports the following models, which can be downloaded automatically during FastDeploy deployment.Specify the model parameter as the model name in the table below to automatically download model weights (all supports resumable downloads). The following three download sources are supported:

When using automatic download, the default download source is AIStudio. Users can modify the default download source by setting the FD_MODEL_SOURCE environment variable, which can be set to “AISTUDIO”, ‘MODELSCOPE’ or “HUGGINGFACE”. The default download path is ~/ (i.e., the user's home directory). Users can modify the default download path by setting the FD_MODEL_CACHE environment variable, e.g.:

export FD_MODEL_SOURCE=AISTUDIO # "AISTUDIO", "MODELSCOPE" or "HUGGINGFACE"
export FD_MODEL_CACHE=/ssd1/download_models

Example launch Command using baidu/ERNIE-4.5-21B-A3B-PT:

python -m fastdeploy.entrypoints.openai.api_server \
       --model baidu/ERNIE-4.5-0.3B-PT \
       --port 8180 \
       --metrics-port 8181 \
       --engine-worker-queue-port 8182 \
       --max-model-len 32768 \
       --max-num-seqs 32

Large Language Models

These models accept text input.

Models DataType Example HF Model
⭐ERNIE BF16\WINT4\WINT8\W4A8C8\WINT2\FP8 baidu/ERNIE-4.5-VL-424B-A47B-Paddle;
baidu/ERNIE-4.5-300B-A47B-Paddle
quick startbest practice;
baidu/ERNIE-4.5-300B-A47B-2Bits-Paddle;
baidu/ERNIE-4.5-300B-A47B-W4A8C8-TP4-Paddle;
baidu/ERNIE-4.5-300B-A47B-FP8-Paddle;
baidu/ERNIE-4.5-300B-A47B-Base-Paddle;
baidu/ERNIE-4.5-21B-A3B-Paddle;
baidu/ERNIE-4.5-21B-A3B-Base-Paddle;
baidu/ERNIE-4.5-21B-A3B-Thinking;
baidu/ERNIE-4.5-VL-28B-A3B-Thinking;
baidu/ERNIE-4.5-0.3B-Paddle
quick startbest practice;
baidu/ERNIE-4.5-0.3B-Base-Paddle, etc.
⭐QWEN3-MOE BF16/WINT4/WINT8/FP8 Qwen/Qwen3-235B-A22B;
Qwen/Qwen3-30B-A3B, etc.
⭐QWEN3 BF16/WINT8/FP8 Qwen/qwen3-32B;
Qwen/qwen3-14B;
Qwen/qwen3-8B;
Qwen/qwen3-4B;
Qwen/qwen3-1.7B;
Qwen/qwen3-0.6B, etc.
⭐QWEN2.5 BF16/WINT8/FP8 Qwen/qwen2.5-72B;
Qwen/qwen2.5-32B;
Qwen/qwen2.5-14B;
Qwen/qwen2.5-7B;
Qwen/qwen2.5-3B;
Qwen/qwen2.5-1.5B;
Qwen/qwen2.5-0.5B, etc.
⭐QWEN2 BF16/WINT8/FP8 Qwen/Qwen/qwen2-72B;
Qwen/Qwen/qwen2-7B;
Qwen/qwen2-1.5B;
Qwen/qwen2-0.5B;
Qwen/QwQ-32, etc.
⭐DEEPSEEK BF16/WINT4 unsloth/DeepSeek-V3.1-BF16;
unsloth/DeepSeek-V3-0324-BF16;
unsloth/DeepSeek-R1-BF16, etc.
⭐GPT-OSS BF16/WINT8 unsloth/gpt-oss-20b-BF16, etc.
⭐GLM-4.5/4.6 BF16/wfp8afp8 zai-org/GLM-4.5-Air;
zai-org/GLM-4.6
最佳实践 etc.

Multimodal Language Models

These models accept multi-modal inputs (e.g., images and text).

Models DataType Example HF Model
ERNIE-VL BF16/WINT4/WINT8 baidu/ERNIE-4.5-VL-424B-A47B-Paddle
quick startbest practice ;
baidu/ERNIE-4.5-VL-28B-A3B-Paddle
quick startbest practice ;
baidu/ERNIE-4.5-VL-28B-A3B-Thinking
quick startbest practice;
PaddleOCR-VL BF16/WINT4/WINT8 PaddlePaddle/PaddleOCR-VL
best practice ;
QWEN-VL BF16/WINT4/FP8 Qwen/Qwen2.5-VL-72B-Instruct;
Qwen/Qwen2.5-VL-32B-Instruct;
Qwen/Qwen2.5-VL-7B-Instruct;
Qwen/Qwen2.5-VL-3B-Instruct

More models are being supported. You can submit requests for new model support via Github Issues.