Router Troubleshooting Guide
This document is based on the Golang Router implementation and summarizes common log messages, response outputs, and troubleshooting methods encountered during Router usage, helping users quickly locate and resolve issues.
For basic Router usage, please refer to Load-Balancing Scheduling Router.
Common Log Analysis
Note:
{}represents variables that will be replaced with actual values in logs.
Error-Level Logs
| Log Message | Meaning | Impact | What to Check |
|---|---|---|---|
Removed unhealthy prefill instance: {url} |
Prefill instance failed health check and has been removed | This Prefill instance will no longer participate in scheduling | Health status |
Removed unhealthy decode instance: {url} |
Decode instance failed health check and has been removed | This Decode instance will no longer participate in scheduling | Health status |
Removed unhealthy mixed instance: {url} |
Mixed instance failed health check and has been removed | This Mixed instance will no longer participate in scheduling | Health status |
Failed to register instance: {error} |
Instance registration failed | Router cannot register the instance | Health status, registration parameters |
Failed to read YAML file {path}: {error} |
Failed to read registration config file at startup | Instances in the config file cannot be registered | File path, file permissions |
Failed to unmarshal YAML file {path}: {error} |
Registration config file has invalid format | Instances in the config file cannot be registered | YAML format |
Failed to register instance from index {index}: {error} |
Instance at index {index} in config file failed to register | That instance was not registered | Health status, registration parameters |
failed to send request to {url} with error: {error} |
Health check request failed to send | The instance may be marked as unhealthy | Network connectivity, proxy settings |
scanner error: {error} |
Error occurred while reading backend streaming response | The current request may fail | Backend instance status |
[prefill] scanner error: {error}, message={message} |
Error occurred while reading Prefill backend streaming response | The current Prefill request may fail | Backend instance status |
[prefill] copy error: {error}, message={message} |
Error occurred while copying Prefill response data | The current Prefill request may fail | Backend instance status |
Panic recovered: {error} |
A panic occurred during request processing and was recovered | The current request fails, but the service continues running | Backend instance status, request content |
empty baseURL provided |
Health check received an empty base URL | Health check cannot be performed | Registration parameters |
failed to create request: {error} |
Failed to create health check request | The instance may be marked as unhealthy | Network environment |
failed to read response body: {error} |
Failed to read health check response body | The instance may be marked as unhealthy | Backend instance status |
Warn-Level Logs
| Log Message | Meaning | Impact | What to Check |
|---|---|---|---|
Server {url} is not healthy |
The instance at this URL failed health check | Router cannot register the instance, or will remove it from the registered list | Health status |
Instance {url} role is unknown |
Instance role cannot be recognized | The instance will not be added to the scheduling list | Registration parameters |
cache-aware prefill: tokenizer failed, fallback to char tokens: {error} |
Tokenizer service call failed, automatically falling back to character-based tokenization | cache_aware strategy remains active, using character-based tokenization for cache matching instead of the Tokenizer; normal request processing is not affected | Tokenizer service status |
cache-aware prefill: tokenize failed, fallback to process_tokens: {error} |
Tokenization completely failed (e.g., empty input), falling back to process_tokens strategy | Prefill scheduling temporarily does not use cache_aware strategy; normal request processing is not affected | Request content, Tokenizer service status |
cache-aware prefill: final strategy: process_tokens, reason: tokenize failed: {error}. ts_ms={ts} |
Tokenization failed (new format), falling back to process_tokens strategy | Prefill scheduling temporarily does not use cache_aware strategy; normal request processing is not affected | Request content, Tokenizer service status |
Info-Level Logs
| Log Message | Meaning | Description |
|---|---|---|
Starting server on {host:port} |
Router service is starting | Normal startup log |
Server {url} is healthy |
Instance passed health check | Normal operation log |
Successfully registered instance from index {index} |
Instance from config file registered successfully | Normal startup log |
No instances found in config file {path} |
No instances found in the registration config file | Check whether register.yaml is empty |
Request completed successfully. |
Request processing completed | Normal operation log |
Request failed, retrying... |
Request failed, retrying | Router will retry up to 3 times |
select worker (prefill): {url}, tokens: {tokens} |
Prefill scheduler selected a worker, showing current token processing count | Normal operation log |
select worker ({type}): {url}, count: {count} |
Decode/Mixed scheduler selected a worker, showing current request concurrency | Normal operation log |
release worker: {url}, count: {count} |
Request ended, worker counter released | Normal operation log |
release prefill tokens: {url}, tokens: {tokens} |
Prefill request ended, token load released | Normal operation log |
cleanup unhealthy worker counter: {url} |
Cleaned up counter for unhealthy worker | Normal operation log |
removed counters for {count} unhealthy workers: {urls} |
Batch cleanup of counters for unhealthy workers | Normal operation log |
[stats] total_running={n}, workers: [{loads}], cache_hit_rate={rate}% (hits={hits}/total={total}) |
Periodic stats: total requests, worker loads, cache hit rate | Normal operation log, useful for monitoring and tuning |
Parsing completed; starting worker selection. |
Request parsing completed, starting worker selection | Normal operation log |
Request completed with an error. |
Request processing completed with an error | Check backend instance status |
[SelectWorkerPair] decode selection failed, releasing prefill counter url={url} |
Decode selection failed in PD disaggregated mode, releasing Prefill counter | Error handling log |
[prefill] first chunk received, release counter url={url} |
Prefill streaming response received first chunk, counter released | Normal operation log |
[prefill] non-stream prefill response done, release counter url={url} |
Prefill non-streaming response completed, counter released | Normal operation log |
[prefill] backendResp is nil or backendResp.Body is nil, url={url} |
Prefill backend response is nil | May indicate backend connection issue |
[prefill] release in defer (fallback) url={url}, isStream={bool} |
Fallback resource release when Prefill request exits abnormally | Error handling log |
[prefill] release in CommonCompletions defer (error path) url={url} |
Prefill resource release on error path | Error handling log |
cache-aware prefill: final strategy: process_tokens, reason: strategy not initialized |
cache_aware strategy not initialized, falling back to process_tokens | Check cache_aware configuration |
cache-aware prefill: final strategy: process_tokens, reason: load imbalanced, loads={loads}. ts_ms={ts} |
Load imbalanced across instances, falling back to process_tokens strategy | Normal operation log, automatic load balancing switch |
cache-aware prefill: final strategy: cache_aware_scoring, selected={url}, loads={loads}, hitRatios={ratios}. ts_ms={ts} |
cache_aware scoring strategy selected a worker | Normal operation log, showing loads and hit ratios |
[{method}] {path} {proto} {status} {latency} {clientIP} |
HTTP request access log | Normal operation log, records basic info for each request |
before SelectWorker prefill. ts_ms={ts} |
Starting Prefill worker selection in PD disaggregated mode | Normal operation log, for performance tracing |
before SelectWorker decode, after prefill. ts_ms={ts} |
Starting Decode worker selection after Prefill selection | Normal operation log, for performance tracing |
after SelectWorker decode, before return. ts_ms={ts} |
Decode worker selection completed | Normal operation log, for performance tracing |
Debug-Level Logs
Debug-level logs are only output when the log level is set to
debug, typically used for development debugging.
| Log Message | Meaning | Description |
|---|---|---|
Healthy instances: prefill={urls}, decode={urls}, mixed={urls} |
Lists healthy instances for each role | Useful for verifying instance discovery |
cache-aware prefill: hashes={n} workers={n} load={loads} hit={hits} |
Hash count, worker count, and load info for cache_aware strategy | Useful for debugging cache hits |
cache-aware prefill: tokenizer tokens={tokens} |
Tokenizer tokenization result | Useful for debugging tokenization results |
cache-aware score: worker={url} hit={hit} loadRatio={ratio} score={score} |
Scoring details for cache_aware strategy | Useful for debugging scheduling decisions |
radix match: hashes={n} matched_len={n} node_children={n} |
Radix tree match details | Useful for debugging cache matching |
radix record: worker={url} hashes={n} node_depth={n} |
Radix tree record details | Useful for debugging cache recording |
radix eviction: removed={n} nodeCount={n} |
Radix tree eviction details | Useful for debugging cache eviction |
Common Response Output Analysis
Inference Request Errors (/v1/chat/completions, /v1/completions)
| Output | HTTP Status | Meaning | What to Check |
|---|---|---|---|
{"error": "No available prefill/decode workers"} |
503 | All Prefill or Decode instances are unavailable, no registered healthy instances | Health status |
{"error": "Failed to select worker pair"} |
502 | Failed to select a worker pair in PD disaggregated mode | Health status, scheduling strategy |
{"error": "Failed to select worker"} |
502 | Failed to select a worker in centralized mode | Health status, scheduling strategy |
{"error": "Failed to connect to backend service"} |
502 | Failed to connect to backend inference instance (after 3 retries) | Backend instance status, network connectivity |
{"error": "Failed to build disaggregate_info"} |
500 | Failed to build PD disaggregation communication info | Registration parameters (connector_port, device_ids, etc.) |
{"error": "Invalid request body"} |
400 | Failed to read request body | Request format |
{"error": "Invalid JSON format"} |
400 | Failed to parse request body JSON | Request format |
Registration Request Errors (/register)
| Output | HTTP Status | Meaning | What to Check |
|---|---|---|---|
{"code": 503, "msg": "{url} service is not healthy"} |
503 | Instance failed health check, cannot be registered | Health status |
{"code": 400, "msg": "Invalid request body"} |
400 | Failed to read registration request body | Request format |
{"code": 400, "msg": "Invalid InstanceInfo JSON format: {error}"} |
400 | Registration request has invalid JSON format | Request format |
{"code": 400, "msg": "splitwise mode only supports PREFILL/DECODE instances"} |
400 | MIXED instances are not allowed in PD disaggregated mode | Deployment mode, instance role |
{"code": 400, "msg": "only MIXED instances are allowed"} |
400 | Only MIXED instances are allowed in centralized mode | Deployment mode, instance role |
{"code": 400, "msg": "invalid InstanceInfo format: {error}"} |
400 | Instance registration info validation failed | Registration parameters |
{"code": 200, "msg": "Register success"} |
200 | Registration successful | — |
Common Registration Parameter Validation Errors
| Error Message | Meaning | Solution |
|---|---|---|
role is required |
Missing role field | Add the role field with value: prefill / decode / mixed |
invalid role: {role} |
Invalid role value | Use a valid role value: prefill / decode / mixed |
host_ip is required |
Missing host_ip field | Add the host_ip field |
invalid host_ip: {ip} |
host_ip is not a valid IP address | Provide a valid IP address |
port is required |
Missing port field | Add the port field |
invalid port: {port} |
port is not a valid port number | Provide a port number in the range 1-65535 |
invalid protocol: {protocol} |
Invalid transfer protocol | Use a valid protocol value: ipc / rdma |
Troubleshooting Guide
Health Status
Instance health checking is fundamental to Router operation. When instances fail to register or are removed, follow these steps:
1. Check instance registration status
View the currently registered instances and their count:
# View registered instance list
curl -X GET http://{router_url}/registered
# View registered instance count
curl -X GET http://{router_url}/registered_number
Verify that all expected instances are registered. If the count does not match, some instances may have failed to register or been removed by health checks.
2. Check instance health and network connectivity
Directly access the inference instance's health endpoint from the Router's host:
curl -X GET http://{server_url}/health
- HTTP 200 response indicates the instance is healthy and the network is reachable
- If unreachable or returning a non-200 status code, investigate further:
- Whether the instance is started and listening on the correct port
- Whether a proxy is interfering with the connection (try disabling:
unset http_proxy && unset https_proxy) - Whether firewall rules are blocking the connection
Common solutions:
- Disable network proxy: unset http_proxy && unset https_proxy
- gunicorn version compatibility: If registered instance count is incomplete, it may be due to gunicorn and FastDeploy version incompatibility. Downgrading to gunicorn==25.0.3 can resolve this issue
Scheduling Strategy
When encountering Failed to select worker or Failed to select worker pair errors:
1. Verify registered instance count
curl -X GET http://{router_url}/registered_number
If the returned count is 0, there are no available healthy instances. Please refer to Health Status for troubleshooting.
2. Check scheduling strategy configuration
Verify that the scheduling strategy in config.yaml matches your deployment mode. The default scheduling strategies are:
| Deployment Mode | Config Field | Default Strategy |
|---|---|---|
| Centralized mode | policy |
request_num |
| PD disaggregated mode (Prefill) | prefill-policy |
process_tokens |
| PD disaggregated mode (Decode) | decode-policy |
request_num |
If no strategy is specified in the config file, the Router will use the defaults listed above. To use advanced strategies such as cache_aware or fd_metrics_score, specify them explicitly in the config file. For detailed descriptions of each strategy, see Scheduling Strategies.
3. Check fd_metrics_score strategy dependencies
When using the fd_metrics_score strategy, the Router fetches running/waiting request counts in real time from the /metrics endpoint of inference instances. When the /metrics endpoint is unavailable (e.g., metrics_port is not configured or the metrics service is down), the Router automatically falls back to the internal request counter for scheduling. This does not affect normal request processing, but scheduling accuracy may be reduced.
To ensure optimal scheduling with fd_metrics_score, verify that the inference instance's metrics endpoint is responding correctly:
curl -X GET http://{server_url}/metrics
Registration Parameters
When registration fails with parameter validation errors:
1. Verify deployment mode and instance role match
- PD disaggregated mode (--splitwise): Only prefill and decode roles can be registered
- Centralized mode (default): Only mixed role can be registered
2. Check required parameters
Registration requests must include the following fields:
- role: Instance role (prefill / decode / mixed)
- host_ip: Instance IP address
- port: Instance port number
3. Check optional parameters for PD disaggregated mode
In PD disaggregated mode, the following parameters should be fully configured to ensure proper KV Cache transfer:
- connector_port: PD communication port
- transfer_protocol: Transfer protocol (ipc / rdma)
- device_ids: GPU device IDs
- rdma_ports: RDMA ports (required when using the rdma protocol)
Startup Failures
1. Configuration file loading failure
If Failed to load config appears in startup logs, check:
- Whether the file path specified by --config_path is correct
- Whether the configuration file is valid YAML
- Whether configuration parameter values are valid
2. Port already in use
If Failed to start server appears in startup logs, check:
- Whether the port specified by --port is already occupied by another process
- Use lsof -i:{port} or netstat -tlnp | grep {port} to check port usage
Tokenizer Service (cache_aware Strategy)
When using the cache_aware scheduling strategy, the Router calls a Tokenizer service to tokenize requests for cache hit ratio computation. When the Tokenizer service is unavailable, the Router has a two-level degradation mechanism:
- Fallback to character-based tokenization (common case): The log will show
tokenizer failed, fallback to char tokens. The cache_aware strategy remains active, using character-based tokenization for cache matching instead of the Tokenizer. Cache hit accuracy may decrease, but normal request processing is not affected. - Fallback to process_tokens strategy (extreme case): When tokenization completely fails (e.g., empty request content), the log will show
tokenize failed, fallback to process_tokens. The cache_aware strategy temporarily becomes inactive, and scheduling falls back to token processing volume. Normal request processing is not affected.
To restore full cache_aware functionality:
1. Check if the Tokenizer service is running
curl -X POST http://{tokenizer_url}/tokenize \
-H "Content-Type: application/json" \
-d '{"text": "hello"}'
2. Check related configuration
- Verify that tokenizer-url in config.yaml is set correctly
- If the Tokenizer service responds slowly, consider increasing tokenizer-timeout-secs (default: 2 seconds)