Skip to content

简体中文

Router Troubleshooting Guide

This document is based on the Golang Router implementation and summarizes common log messages, response outputs, and troubleshooting methods encountered during Router usage, helping users quickly locate and resolve issues.

For basic Router usage, please refer to Load-Balancing Scheduling Router.

Common Log Analysis

Note: {} represents variables that will be replaced with actual values in logs.

Error-Level Logs

Log Message Meaning Impact What to Check
Removed unhealthy prefill instance: {url} Prefill instance failed health check and has been removed This Prefill instance will no longer participate in scheduling Health status
Removed unhealthy decode instance: {url} Decode instance failed health check and has been removed This Decode instance will no longer participate in scheduling Health status
Removed unhealthy mixed instance: {url} Mixed instance failed health check and has been removed This Mixed instance will no longer participate in scheduling Health status
Failed to register instance: {error} Instance registration failed Router cannot register the instance Health status, registration parameters
Failed to read YAML file {path}: {error} Failed to read registration config file at startup Instances in the config file cannot be registered File path, file permissions
Failed to unmarshal YAML file {path}: {error} Registration config file has invalid format Instances in the config file cannot be registered YAML format
Failed to register instance from index {index}: {error} Instance at index {index} in config file failed to register That instance was not registered Health status, registration parameters
failed to send request to {url} with error: {error} Health check request failed to send The instance may be marked as unhealthy Network connectivity, proxy settings
scanner error: {error} Error occurred while reading backend streaming response The current request may fail Backend instance status
[prefill] scanner error: {error}, message={message} Error occurred while reading Prefill backend streaming response The current Prefill request may fail Backend instance status
[prefill] copy error: {error}, message={message} Error occurred while copying Prefill response data The current Prefill request may fail Backend instance status
Panic recovered: {error} A panic occurred during request processing and was recovered The current request fails, but the service continues running Backend instance status, request content
empty baseURL provided Health check received an empty base URL Health check cannot be performed Registration parameters
failed to create request: {error} Failed to create health check request The instance may be marked as unhealthy Network environment
failed to read response body: {error} Failed to read health check response body The instance may be marked as unhealthy Backend instance status

Warn-Level Logs

Log Message Meaning Impact What to Check
Server {url} is not healthy The instance at this URL failed health check Router cannot register the instance, or will remove it from the registered list Health status
Instance {url} role is unknown Instance role cannot be recognized The instance will not be added to the scheduling list Registration parameters
cache-aware prefill: tokenizer failed, fallback to char tokens: {error} Tokenizer service call failed, automatically falling back to character-based tokenization cache_aware strategy remains active, using character-based tokenization for cache matching instead of the Tokenizer; normal request processing is not affected Tokenizer service status
cache-aware prefill: tokenize failed, fallback to process_tokens: {error} Tokenization completely failed (e.g., empty input), falling back to process_tokens strategy Prefill scheduling temporarily does not use cache_aware strategy; normal request processing is not affected Request content, Tokenizer service status
cache-aware prefill: final strategy: process_tokens, reason: tokenize failed: {error}. ts_ms={ts} Tokenization failed (new format), falling back to process_tokens strategy Prefill scheduling temporarily does not use cache_aware strategy; normal request processing is not affected Request content, Tokenizer service status

Info-Level Logs

Log Message Meaning Description
Starting server on {host:port} Router service is starting Normal startup log
Server {url} is healthy Instance passed health check Normal operation log
Successfully registered instance from index {index} Instance from config file registered successfully Normal startup log
No instances found in config file {path} No instances found in the registration config file Check whether register.yaml is empty
Request completed successfully. Request processing completed Normal operation log
Request failed, retrying... Request failed, retrying Router will retry up to 3 times
select worker (prefill): {url}, tokens: {tokens} Prefill scheduler selected a worker, showing current token processing count Normal operation log
select worker ({type}): {url}, count: {count} Decode/Mixed scheduler selected a worker, showing current request concurrency Normal operation log
release worker: {url}, count: {count} Request ended, worker counter released Normal operation log
release prefill tokens: {url}, tokens: {tokens} Prefill request ended, token load released Normal operation log
cleanup unhealthy worker counter: {url} Cleaned up counter for unhealthy worker Normal operation log
removed counters for {count} unhealthy workers: {urls} Batch cleanup of counters for unhealthy workers Normal operation log
[stats] total_running={n}, workers: [{loads}], cache_hit_rate={rate}% (hits={hits}/total={total}) Periodic stats: total requests, worker loads, cache hit rate Normal operation log, useful for monitoring and tuning
Parsing completed; starting worker selection. Request parsing completed, starting worker selection Normal operation log
Request completed with an error. Request processing completed with an error Check backend instance status
[SelectWorkerPair] decode selection failed, releasing prefill counter url={url} Decode selection failed in PD disaggregated mode, releasing Prefill counter Error handling log
[prefill] first chunk received, release counter url={url} Prefill streaming response received first chunk, counter released Normal operation log
[prefill] non-stream prefill response done, release counter url={url} Prefill non-streaming response completed, counter released Normal operation log
[prefill] backendResp is nil or backendResp.Body is nil, url={url} Prefill backend response is nil May indicate backend connection issue
[prefill] release in defer (fallback) url={url}, isStream={bool} Fallback resource release when Prefill request exits abnormally Error handling log
[prefill] release in CommonCompletions defer (error path) url={url} Prefill resource release on error path Error handling log
cache-aware prefill: final strategy: process_tokens, reason: strategy not initialized cache_aware strategy not initialized, falling back to process_tokens Check cache_aware configuration
cache-aware prefill: final strategy: process_tokens, reason: load imbalanced, loads={loads}. ts_ms={ts} Load imbalanced across instances, falling back to process_tokens strategy Normal operation log, automatic load balancing switch
cache-aware prefill: final strategy: cache_aware_scoring, selected={url}, loads={loads}, hitRatios={ratios}. ts_ms={ts} cache_aware scoring strategy selected a worker Normal operation log, showing loads and hit ratios
[{method}] {path} {proto} {status} {latency} {clientIP} HTTP request access log Normal operation log, records basic info for each request
before SelectWorker prefill. ts_ms={ts} Starting Prefill worker selection in PD disaggregated mode Normal operation log, for performance tracing
before SelectWorker decode, after prefill. ts_ms={ts} Starting Decode worker selection after Prefill selection Normal operation log, for performance tracing
after SelectWorker decode, before return. ts_ms={ts} Decode worker selection completed Normal operation log, for performance tracing

Debug-Level Logs

Debug-level logs are only output when the log level is set to debug, typically used for development debugging.

Log Message Meaning Description
Healthy instances: prefill={urls}, decode={urls}, mixed={urls} Lists healthy instances for each role Useful for verifying instance discovery
cache-aware prefill: hashes={n} workers={n} load={loads} hit={hits} Hash count, worker count, and load info for cache_aware strategy Useful for debugging cache hits
cache-aware prefill: tokenizer tokens={tokens} Tokenizer tokenization result Useful for debugging tokenization results
cache-aware score: worker={url} hit={hit} loadRatio={ratio} score={score} Scoring details for cache_aware strategy Useful for debugging scheduling decisions
radix match: hashes={n} matched_len={n} node_children={n} Radix tree match details Useful for debugging cache matching
radix record: worker={url} hashes={n} node_depth={n} Radix tree record details Useful for debugging cache recording
radix eviction: removed={n} nodeCount={n} Radix tree eviction details Useful for debugging cache eviction

Common Response Output Analysis

Inference Request Errors (/v1/chat/completions, /v1/completions)

Output HTTP Status Meaning What to Check
{"error": "No available prefill/decode workers"} 503 All Prefill or Decode instances are unavailable, no registered healthy instances Health status
{"error": "Failed to select worker pair"} 502 Failed to select a worker pair in PD disaggregated mode Health status, scheduling strategy
{"error": "Failed to select worker"} 502 Failed to select a worker in centralized mode Health status, scheduling strategy
{"error": "Failed to connect to backend service"} 502 Failed to connect to backend inference instance (after 3 retries) Backend instance status, network connectivity
{"error": "Failed to build disaggregate_info"} 500 Failed to build PD disaggregation communication info Registration parameters (connector_port, device_ids, etc.)
{"error": "Invalid request body"} 400 Failed to read request body Request format
{"error": "Invalid JSON format"} 400 Failed to parse request body JSON Request format

Registration Request Errors (/register)

Output HTTP Status Meaning What to Check
{"code": 503, "msg": "{url} service is not healthy"} 503 Instance failed health check, cannot be registered Health status
{"code": 400, "msg": "Invalid request body"} 400 Failed to read registration request body Request format
{"code": 400, "msg": "Invalid InstanceInfo JSON format: {error}"} 400 Registration request has invalid JSON format Request format
{"code": 400, "msg": "splitwise mode only supports PREFILL/DECODE instances"} 400 MIXED instances are not allowed in PD disaggregated mode Deployment mode, instance role
{"code": 400, "msg": "only MIXED instances are allowed"} 400 Only MIXED instances are allowed in centralized mode Deployment mode, instance role
{"code": 400, "msg": "invalid InstanceInfo format: {error}"} 400 Instance registration info validation failed Registration parameters
{"code": 200, "msg": "Register success"} 200 Registration successful

Common Registration Parameter Validation Errors

Error Message Meaning Solution
role is required Missing role field Add the role field with value: prefill / decode / mixed
invalid role: {role} Invalid role value Use a valid role value: prefill / decode / mixed
host_ip is required Missing host_ip field Add the host_ip field
invalid host_ip: {ip} host_ip is not a valid IP address Provide a valid IP address
port is required Missing port field Add the port field
invalid port: {port} port is not a valid port number Provide a port number in the range 1-65535
invalid protocol: {protocol} Invalid transfer protocol Use a valid protocol value: ipc / rdma

Troubleshooting Guide

Health Status

Instance health checking is fundamental to Router operation. When instances fail to register or are removed, follow these steps:

1. Check instance registration status

View the currently registered instances and their count:

# View registered instance list
curl -X GET http://{router_url}/registered

# View registered instance count
curl -X GET http://{router_url}/registered_number

Verify that all expected instances are registered. If the count does not match, some instances may have failed to register or been removed by health checks.

2. Check instance health and network connectivity

Directly access the inference instance's health endpoint from the Router's host:

curl -X GET http://{server_url}/health
  • HTTP 200 response indicates the instance is healthy and the network is reachable
  • If unreachable or returning a non-200 status code, investigate further:
  • Whether the instance is started and listening on the correct port
  • Whether a proxy is interfering with the connection (try disabling: unset http_proxy && unset https_proxy)
  • Whether firewall rules are blocking the connection

Common solutions: - Disable network proxy: unset http_proxy && unset https_proxy - gunicorn version compatibility: If registered instance count is incomplete, it may be due to gunicorn and FastDeploy version incompatibility. Downgrading to gunicorn==25.0.3 can resolve this issue

Scheduling Strategy

When encountering Failed to select worker or Failed to select worker pair errors:

1. Verify registered instance count

curl -X GET http://{router_url}/registered_number

If the returned count is 0, there are no available healthy instances. Please refer to Health Status for troubleshooting.

2. Check scheduling strategy configuration

Verify that the scheduling strategy in config.yaml matches your deployment mode. The default scheduling strategies are:

Deployment Mode Config Field Default Strategy
Centralized mode policy request_num
PD disaggregated mode (Prefill) prefill-policy process_tokens
PD disaggregated mode (Decode) decode-policy request_num

If no strategy is specified in the config file, the Router will use the defaults listed above. To use advanced strategies such as cache_aware or fd_metrics_score, specify them explicitly in the config file. For detailed descriptions of each strategy, see Scheduling Strategies.

3. Check fd_metrics_score strategy dependencies

When using the fd_metrics_score strategy, the Router fetches running/waiting request counts in real time from the /metrics endpoint of inference instances. When the /metrics endpoint is unavailable (e.g., metrics_port is not configured or the metrics service is down), the Router automatically falls back to the internal request counter for scheduling. This does not affect normal request processing, but scheduling accuracy may be reduced.

To ensure optimal scheduling with fd_metrics_score, verify that the inference instance's metrics endpoint is responding correctly:

curl -X GET http://{server_url}/metrics

Registration Parameters

When registration fails with parameter validation errors:

1. Verify deployment mode and instance role match - PD disaggregated mode (--splitwise): Only prefill and decode roles can be registered - Centralized mode (default): Only mixed role can be registered

2. Check required parameters

Registration requests must include the following fields: - role: Instance role (prefill / decode / mixed) - host_ip: Instance IP address - port: Instance port number

3. Check optional parameters for PD disaggregated mode

In PD disaggregated mode, the following parameters should be fully configured to ensure proper KV Cache transfer: - connector_port: PD communication port - transfer_protocol: Transfer protocol (ipc / rdma) - device_ids: GPU device IDs - rdma_ports: RDMA ports (required when using the rdma protocol)

Startup Failures

1. Configuration file loading failure

If Failed to load config appears in startup logs, check: - Whether the file path specified by --config_path is correct - Whether the configuration file is valid YAML - Whether configuration parameter values are valid

2. Port already in use

If Failed to start server appears in startup logs, check: - Whether the port specified by --port is already occupied by another process - Use lsof -i:{port} or netstat -tlnp | grep {port} to check port usage

Tokenizer Service (cache_aware Strategy)

When using the cache_aware scheduling strategy, the Router calls a Tokenizer service to tokenize requests for cache hit ratio computation. When the Tokenizer service is unavailable, the Router has a two-level degradation mechanism:

  1. Fallback to character-based tokenization (common case): The log will show tokenizer failed, fallback to char tokens. The cache_aware strategy remains active, using character-based tokenization for cache matching instead of the Tokenizer. Cache hit accuracy may decrease, but normal request processing is not affected.
  2. Fallback to process_tokens strategy (extreme case): When tokenization completely fails (e.g., empty request content), the log will show tokenize failed, fallback to process_tokens. The cache_aware strategy temporarily becomes inactive, and scheduling falls back to token processing volume. Normal request processing is not affected.

To restore full cache_aware functionality:

1. Check if the Tokenizer service is running

curl -X POST http://{tokenizer_url}/tokenize \
  -H "Content-Type: application/json" \
  -d '{"text": "hello"}'

2. Check related configuration - Verify that tokenizer-url in config.yaml is set correctly - If the Tokenizer service responds slowly, consider increasing tokenizer-timeout-secs (default: 2 seconds)