Skip to content

Scheduler

FastDeploy currently supports two types of schedulers: Local Scheduler and Global Scheduler. The Global Scheduler is designed for large-scale clusters, enabling secondary load balancing across nodes based on real-time workload metrics.

Scheduling Strategies

Local Scheduler

The Local Scheduler functions similarly to a memory manager, performing eviction policies based on task queue length and TTL configurations.

Global Scheduler

The Global Scheduler is implemented using Redis. Each node actively steals tasks from others when its GPU is idle, then pushes the execution results back to the originating node.

PD-Separated Scheduler

Building upon the Global Scheduler, FastDeploy introduces the PD-Separated Scheduling Strategy, specifically optimized for large language model inference scenarios. It decouples the inference pipeline into two distinct phases:
- Prefill Phase: Builds KV cache, which is compute-intensive with high memory usage but low latency.
- Decode Phase: Performs autoregressive decoding, which is sequential and time-consuming but requires less memory.

By separating roles (prefill nodes handle request processing while decode nodes manage generation), this strategy enables finer-grained resource allocation, improving throughput and GPU utilization.

Configuration Parameters

Parameter Name Type Required Default Scope Description
scheduler_name str No local local,global,splitwise Scheduler type: local, global, or splitwise
scheduler_max_size int No -1 local Maximum task queue length
scheduler_ttl int No 900 local,global,splitwise Maximum task time-to-live (seconds)
scheduler_host str No 127.0.0.1 global,splitwise Redis server host
scheduler_port int No 6379 global,splitwise Redis server port
scheduler_db int No 0 global,splitwise Redis database index
scheduler_password str No "" global,splitwise Redis access password
scheduler_topic str No default global,splitwise Nodes under the same topic participate in task scheduling
scheduler_min_load_score float No 3 global Minimum load threshold for task stealing (idle nodes steal from busy ones)
scheduler_load_shards_num int No 1 global Number of shards for cluster load tracking
scheduler_sync_period int No 5 splitwise Node load synchronization interval (seconds)
scheduler_expire_period int No 3000 splitwise Node heartbeat expiration time (seconds)
scheduler_release_load_expire_period int No 600 splitwise Request expiration time for load release (seconds)
scheduler_reader_parallel int No 4 splitwise Number of output reader threads
scheduler_writer_parallel int No 4 splitwise Number of writer threads
scheduler_reader_batch_size int No 200 splitwise Batch size for fetching results from Redis
scheduler_writer_batch_size int No 200 splitwise Batch size for writing results to Redis