Supported Models
FastDeploy currently supports the following models, which can be downloaded automatically during FastDeploy deployment.Specify the model
parameter as the model name in the table below to automatically download model weights (all supports resumable downloads). The following three download sources are supported:
-
- Search for corresponding Paddle-version ERNIE models on AIStudio/PaddlePaddle, e.g.,
ERNIE-4.5-0.3B-Paddle
- Search for corresponding Paddle-version ERNIE models on AIStudio/PaddlePaddle, e.g.,
-
- Download Paddle-version ERNIE models from HuggingFace/baidu/models, e.g.,
baidu/ERNIE-4.5-0.3B-Paddle
- Download Paddle-version ERNIE models from HuggingFace/baidu/models, e.g.,
-
- Search for corresponding Paddle-version ERNIE models on ModelScope/PaddlePaddle, e.g.,
ERNIE-4.5-0.3B-Paddle
- Search for corresponding Paddle-version ERNIE models on ModelScope/PaddlePaddle, e.g.,
When using automatic download, the default download source is AIStudio. Users can modify the default download source by setting the FD_MODEL_SOURCE
environment variable, which can be set to “AISTUDIO”, ‘MODELSCOPE’ or “HUGGINGFACE”. The default download path is ~/
(i.e., the user's home directory). Users can modify the default download path by setting the FD_MODEL_CACHE
environment variable, e.g.:
export FD_MODEL_SOURCE=AISTUDIO # "AISTUDIO", "MODELSCOPE" or "HUGGINGFACE"
export FD_MODEL_CACHE=/ssd1/download_models
Model Name | Context Length | Quantization | Minimum Deployment Resources | Notes |
---|---|---|---|---|
baidu/ERNIE-4.5-VL-424B-A47B-Paddle | 32K/128K | WINT4 | 4*80G GPU VRAM/1T RAM | Chunked Prefill required for 128K |
baidu/ERNIE-4.5-VL-424B-A47B-Paddle | 32K/128K | WINT8 | 8*80G GPU VRAM/1T RAM | Chunked Prefill required for 128K |
baidu/ERNIE-4.5-300B-A47B-Paddle | 32K/128K | WINT4 | 4*64G GPU VRAM/600G RAM | Chunked Prefill required for 128K |
baidu/ERNIE-4.5-300B-A47B-Paddle | 32K/128K | WINT8 | 8*64G GPU VRAM/600G RAM | Chunked Prefill required for 128K |
baidu/ERNIE-4.5-300B-A47B-2Bits-Paddle | 32K/128K | WINT2 | 1*141G GPU VRAM/600G RAM | Chunked Prefill required for 128K |
baidu/ERNIE-4.5-300B-A47B-W4A8C8-TP4-Paddle | 32K/128K | W4A8C8 | 4*64G GPU VRAM/160G RAM | Fixed 4-GPU setup, Chunked Prefill recommended |
baidu/ERNIE-4.5-300B-A47B-FP8-Paddle | 32K/128K | FP8 | 8*64G GPU VRAM/600G RAM | Chunked Prefill recommended, only supports PD Disaggragated Deployment with EP parallelism |
baidu/ERNIE-4.5-300B-A47B-Base-Paddle | 32K/128K | WINT4 | 4*64G GPU VRAM/600G RAM | Chunked Prefill recommended |
baidu/ERNIE-4.5-300B-A47B-Base-Paddle | 32K/128K | WINT8 | 8*64G GPU VRAM/600G RAM | Chunked Prefill recommended |
baidu/ERNIE-4.5-VL-28B-A3B-Paddle | 32K | WINT4 | 1*24G GPU VRAM/128G RAM | Chunked Prefill required |
baidu/ERNIE-4.5-VL-28B-A3B-Paddle | 128K | WINT4 | 1*48G GPU VRAM/128G RAM | Chunked Prefill required |
baidu/ERNIE-4.5-VL-28B-A3B-Paddle | 32K/128K | WINT8 | 1*48G GPU VRAM/128G RAM | Chunked Prefill required |
baidu/ERNIE-4.5-21B-A3B-Paddle | 32K/128K | WINT4 | 1*24G GPU VRAM/128G RAM | Chunked Prefill required for 128K |
baidu/ERNIE-4.5-21B-A3B-Paddle | 32K/128K | WINT8 | 1*48G GPU VRAM/128G RAM | Chunked Prefill required for 128K |
baidu/ERNIE-4.5-21B-A3B-Base-Paddle | 32K/128K | WINT4 | 1*24G GPU VRAM/128G RAM | Chunked Prefill required for 128K |
baidu/ERNIE-4.5-21B-A3B-Base-Paddle | 32K/128K | WINT8 | 1*48G GPU VRAM/128G RAM | Chunked Prefill required for 128K |
baidu/ERNIE-4.5-0.3B-Paddle | 32K/128K | BF16 | 1*6G/12G GPU VRAM/2G RAM | |
baidu/ERNIE-4.5-0.3B-Base-Paddle | 32K/128K | BF16 | 1*6G/12G GPU VRAM/2G RAM |
More models are being supported. You can submit requests for new model support via Github Issues.