Skip to content

FastDeploy CLI User Guide

Introduction

FastDeploy CLI is a command-line tool provided by the FastDeploy inference framework, designed for running, deploying, and testing AI model inference tasks. It allows developers to quickly perform model loading, API calls, service deployment, performance benchmarking, and environment information collection directly from the command line.

With FastDeploy CLI, you can:

  • 🚀 Run and validate model inference: Generate chat responses or text completions directly in the command line (chat, complete).
  • 🧩 Deploy models as services: Start an OpenAI-compatible API service with a single command (serve).
  • 📊 Perform performance and evaluation tests: Conduct latency, throughput, and task benchmarks (bench).
  • âš™ī¸ Collect environment information: Output system, framework, GPU, and FastDeploy version information (collect-env).
  • 📁 Run batch inference tasks: Supports batch input/output from files or URLs (run-batch).
  • 🔡 Manage model tokenizers: Encode/decode text and tokens, or export vocabulary (tokenizer).

View Help Information

fastdeploy --help

Available Commands

fastdeploy {chat, complete, serve, bench, collect-env, run-batch, tokenizer}

Command Name Description Detailed Documentation
chat Run interactive chat generation tasks in the command line to verify chat model inference results View chat command details
complete Perform text completion tasks and test various language model outputs View complete command details
serve Launch a local inference service compatible with the OpenAI API protocol View serve command details
bench Evaluate model performance (latency, throughput) and accuracy View bench command details
collect-env Collect and print system, GPU, dependency, and FastDeploy environment information View collect-env command details
run-batch Run batch inference tasks with file or URL input/output View run-batch command details
tokenizer Encode/decode text and tokens, and export vocabulary View tokenizer command details