Router Troubleshooting Guide

This document is based on the Golang Router implementation and summarizes common log messages, response outputs, and troubleshooting methods encountered during Router usage, helping users quickly locate and resolve issues.

For basic Router usage, please refer to Load-Balancing Scheduling Router.

Common Log Analysis

Note: {} represents variables that will be replaced with actual values in logs.

Error-Level Logs

Log Message	Meaning	Impact	What to Check
`Removed unhealthy prefill instance: {url}`	Prefill instance failed health check and has been removed	This Prefill instance will no longer participate in scheduling	Health status
`Removed unhealthy decode instance: {url}`	Decode instance failed health check and has been removed	This Decode instance will no longer participate in scheduling	Health status
`Removed unhealthy mixed instance: {url}`	Mixed instance failed health check and has been removed	This Mixed instance will no longer participate in scheduling	Health status
`Failed to register instance: {error}`	Instance registration failed	Router cannot register the instance	Health status, registration parameters
`Failed to read YAML file {path}: {error}`	Failed to read registration config file at startup	Instances in the config file cannot be registered	File path, file permissions
`Failed to unmarshal YAML file {path}: {error}`	Registration config file has invalid format	Instances in the config file cannot be registered	YAML format
`Failed to register instance from index {index}: {error}`	Instance at index {index} in config file failed to register	That instance was not registered	Health status, registration parameters
`failed to send request to {url} with error: {error}`	Health check request failed to send	The instance may be marked as unhealthy	Network connectivity, proxy settings
`scanner error: {error}`	Error occurred while reading backend streaming response	The current request may fail	Backend instance status
`[prefill] scanner error: {error}, message={message}`	Error occurred while reading Prefill backend streaming response	The current Prefill request may fail	Backend instance status
`[prefill] copy error: {error}, message={message}`	Error occurred while copying Prefill response data	The current Prefill request may fail	Backend instance status
`Panic recovered: {error}`	A panic occurred during request processing and was recovered	The current request fails, but the service continues running	Backend instance status, request content
`empty baseURL provided`	Health check received an empty base URL	Health check cannot be performed	Registration parameters
`failed to create request: {error}`	Failed to create health check request	The instance may be marked as unhealthy	Network environment
`failed to read response body: {error}`	Failed to read health check response body	The instance may be marked as unhealthy	Backend instance status

Warn-Level Logs

Log Message	Meaning	Impact	What to Check
`Server {url} is not healthy`	The instance at this URL failed health check	Router cannot register the instance, or will remove it from the registered list	Health status
`Instance {url} role is unknown`	Instance role cannot be recognized	The instance will not be added to the scheduling list	Registration parameters
`cache-aware prefill: tokenizer failed, fallback to char tokens: {error}`	Tokenizer service call failed, automatically falling back to character-based tokenization	cache_aware strategy remains active, using character-based tokenization for cache matching instead of the Tokenizer; normal request processing is not affected	Tokenizer service status
`cache-aware prefill: tokenize failed, fallback to process_tokens: {error}`	Tokenization completely failed (e.g., empty input), falling back to process_tokens strategy	Prefill scheduling temporarily does not use cache_aware strategy; normal request processing is not affected	Request content, Tokenizer service status
`cache-aware prefill: final strategy: process_tokens, reason: tokenize failed: {error}. ts_ms={ts}`	Tokenization failed (new format), falling back to process_tokens strategy	Prefill scheduling temporarily does not use cache_aware strategy; normal request processing is not affected	Request content, Tokenizer service status

Info-Level Logs

Log Message	Meaning	Description
`Starting server on {host:port}`	Router service is starting	Normal startup log
`Server {url} is healthy`	Instance passed health check	Normal operation log
`Successfully registered instance from index {index}`	Instance from config file registered successfully	Normal startup log
`No instances found in config file {path}`	No instances found in the registration config file	Check whether register.yaml is empty
`Request completed successfully.`	Request processing completed	Normal operation log
`Request failed, retrying...`	Request failed, retrying	Router will retry up to 3 times
`select worker (prefill): {url}, tokens: {tokens}`	Prefill scheduler selected a worker, showing current token processing count	Normal operation log
`select worker ({type}): {url}, count: {count}`	Decode/Mixed scheduler selected a worker, showing current request concurrency	Normal operation log
`release worker: {url}, count: {count}`	Request ended, worker counter released	Normal operation log
`release prefill tokens: {url}, tokens: {tokens}`	Prefill request ended, token load released	Normal operation log
`cleanup unhealthy worker counter: {url}`	Cleaned up counter for unhealthy worker	Normal operation log
`removed counters for {count} unhealthy workers: {urls}`	Batch cleanup of counters for unhealthy workers	Normal operation log
`[stats] total_running={n}, workers: [{loads}], cache_hit_rate={rate}% (hits={hits}/total={total})`	Periodic stats: total requests, worker loads, cache hit rate	Normal operation log, useful for monitoring and tuning
`Parsing completed; starting worker selection.`	Request parsing completed, starting worker selection	Normal operation log
`Request completed with an error.`	Request processing completed with an error	Check backend instance status
`[SelectWorkerPair] decode selection failed, releasing prefill counter url={url}`	Decode selection failed in PD disaggregated mode, releasing Prefill counter	Error handling log
`[prefill] first chunk received, release counter url={url}`	Prefill streaming response received first chunk, counter released	Normal operation log
`[prefill] non-stream prefill response done, release counter url={url}`	Prefill non-streaming response completed, counter released	Normal operation log
`[prefill] backendResp is nil or backendResp.Body is nil, url={url}`	Prefill backend response is nil	May indicate backend connection issue
`[prefill] release in defer (fallback) url={url}, isStream={bool}`	Fallback resource release when Prefill request exits abnormally	Error handling log
`[prefill] release in CommonCompletions defer (error path) url={url}`	Prefill resource release on error path	Error handling log
`cache-aware prefill: final strategy: process_tokens, reason: strategy not initialized`	cache_aware strategy not initialized, falling back to process_tokens	Check cache_aware configuration
`cache-aware prefill: final strategy: process_tokens, reason: load imbalanced, loads={loads}. ts_ms={ts}`	Load imbalanced across instances, falling back to process_tokens strategy	Normal operation log, automatic load balancing switch
`cache-aware prefill: final strategy: cache_aware_scoring, selected={url}, loads={loads}, hitRatios={ratios}. ts_ms={ts}`	cache_aware scoring strategy selected a worker	Normal operation log, showing loads and hit ratios
`[{method}] {path} {proto} {status} {latency} {clientIP}`	HTTP request access log	Normal operation log, records basic info for each request
`before SelectWorker prefill. ts_ms={ts}`	Starting Prefill worker selection in PD disaggregated mode	Normal operation log, for performance tracing
`before SelectWorker decode, after prefill. ts_ms={ts}`	Starting Decode worker selection after Prefill selection	Normal operation log, for performance tracing
`after SelectWorker decode, before return. ts_ms={ts}`	Decode worker selection completed	Normal operation log, for performance tracing

Debug-Level Logs

Debug-level logs are only output when the log level is set to debug, typically used for development debugging.

Log Message	Meaning	Description
`Healthy instances: prefill={urls}, decode={urls}, mixed={urls}`	Lists healthy instances for each role	Useful for verifying instance discovery
`cache-aware prefill: hashes={n} workers={n} load={loads} hit={hits}`	Hash count, worker count, and load info for cache_aware strategy	Useful for debugging cache hits
`cache-aware prefill: tokenizer tokens={tokens}`	Tokenizer tokenization result	Useful for debugging tokenization results
`cache-aware score: worker={url} hit={hit} loadRatio={ratio} score={score}`	Scoring details for cache_aware strategy	Useful for debugging scheduling decisions
`radix match: hashes={n} matched_len={n} node_children={n}`	Radix tree match details	Useful for debugging cache matching
`radix record: worker={url} hashes={n} node_depth={n}`	Radix tree record details	Useful for debugging cache recording
`radix eviction: removed={n} nodeCount={n}`	Radix tree eviction details	Useful for debugging cache eviction

Common Response Output Analysis

Inference Request Errors (/v1/chat/completions, /v1/completions)

Output	HTTP Status	Meaning	What to Check
`{"error": "No available prefill/decode workers"}`	503	All Prefill or Decode instances are unavailable, no registered healthy instances	Health status
`{"error": "Failed to select worker pair"}`	502	Failed to select a worker pair in PD disaggregated mode	Health status, scheduling strategy
`{"error": "Failed to select worker"}`	502	Failed to select a worker in centralized mode	Health status, scheduling strategy
`{"error": "Failed to connect to backend service"}`	502	Failed to connect to backend inference instance (after 3 retries)	Backend instance status, network connectivity
`{"error": "Failed to build disaggregate_info"}`	500	Failed to build PD disaggregation communication info	Registration parameters (connector_port, device_ids, etc.)
`{"error": "Invalid request body"}`	400	Failed to read request body	Request format
`{"error": "Invalid JSON format"}`	400	Failed to parse request body JSON	Request format

Registration Request Errors (/register)

Output	HTTP Status	Meaning	What to Check
`{"code": 503, "msg": "{url} service is not healthy"}`	503	Instance failed health check, cannot be registered	Health status
`{"code": 400, "msg": "Invalid request body"}`	400	Failed to read registration request body	Request format
`{"code": 400, "msg": "Invalid InstanceInfo JSON format: {error}"}`	400	Registration request has invalid JSON format	Request format
`{"code": 400, "msg": "splitwise mode only supports PREFILL/DECODE instances"}`	400	MIXED instances are not allowed in PD disaggregated mode	Deployment mode, instance role
`{"code": 400, "msg": "only MIXED instances are allowed"}`	400	Only MIXED instances are allowed in centralized mode	Deployment mode, instance role
`{"code": 400, "msg": "invalid InstanceInfo format: {error}"}`	400	Instance registration info validation failed	Registration parameters
`{"code": 200, "msg": "Register success"}`	200	Registration successful	—

Common Registration Parameter Validation Errors

Error Message	Meaning	Solution
`role is required`	Missing role field	Add the role field with value: prefill / decode / mixed
`invalid role: {role}`	Invalid role value	Use a valid role value: prefill / decode / mixed
`host_ip is required`	Missing host_ip field	Add the host_ip field
`invalid host_ip: {ip}`	host_ip is not a valid IP address	Provide a valid IP address
`port is required`	Missing port field	Add the port field
`invalid port: {port}`	port is not a valid port number	Provide a port number in the range 1-65535
`invalid protocol: {protocol}`	Invalid transfer protocol	Use a valid protocol value: ipc / rdma

Troubleshooting Guide

Health Status

Instance health checking is fundamental to Router operation. When instances fail to register or are removed, follow these steps:

1. Check instance registration status

View the currently registered instances and their count:

# View registered instance list
curl -X GET http://{router_url}/registered

# View registered instance count
curl -X GET http://{router_url}/registered_number

Verify that all expected instances are registered. If the count does not match, some instances may have failed to register or been removed by health checks.

2. Check instance health and network connectivity

Directly access the inference instance's health endpoint from the Router's host:

curl -X GET http://{server_url}/health

HTTP 200 response indicates the instance is healthy and the network is reachable
If unreachable or returning a non-200 status code, investigate further:
Whether the instance is started and listening on the correct port
Whether a proxy is interfering with the connection (try disabling: unset http_proxy && unset https_proxy)
Whether firewall rules are blocking the connection

Common solutions: - Disable network proxy: unset http_proxy && unset https_proxy - gunicorn version compatibility: If registered instance count is incomplete, it may be due to gunicorn and FastDeploy version incompatibility. Downgrading to gunicorn==25.0.3 can resolve this issue

Scheduling Strategy

When encountering Failed to select worker or Failed to select worker pair errors:

1. Verify registered instance count

curl -X GET http://{router_url}/registered_number

If the returned count is 0, there are no available healthy instances. Please refer to Health Status for troubleshooting.

2. Check scheduling strategy configuration

Verify that the scheduling strategy in config.yaml matches your deployment mode. The default scheduling strategies are:

Deployment Mode	Config Field	Default Strategy
Centralized mode	`policy`	`request_num`
PD disaggregated mode (Prefill)	`prefill-policy`	`process_tokens`
PD disaggregated mode (Decode)	`decode-policy`	`request_num`

If no strategy is specified in the config file, the Router will use the defaults listed above. To use advanced strategies such as cache_aware or fd_metrics_score, specify them explicitly in the config file. For detailed descriptions of each strategy, see Scheduling Strategies.

3. Check fd_metrics_score strategy dependencies

When using the fd_metrics_score strategy, the Router fetches running/waiting request counts in real time from the /metrics endpoint of inference instances. When the /metrics endpoint is unavailable (e.g., metrics_port is not configured or the metrics service is down), the Router automatically falls back to the internal request counter for scheduling. This does not affect normal request processing, but scheduling accuracy may be reduced.

To ensure optimal scheduling with fd_metrics_score, verify that the inference instance's metrics endpoint is responding correctly:

curl -X GET http://{server_url}/metrics

Registration Parameters

When registration fails with parameter validation errors:

1. Verify deployment mode and instance role match - PD disaggregated mode (--splitwise): Only prefill and decode roles can be registered - Centralized mode (default): Only mixed role can be registered

2. Check required parameters

Registration requests must include the following fields: - role: Instance role (prefill / decode / mixed) - host_ip: Instance IP address - port: Instance port number

3. Check optional parameters for PD disaggregated mode

In PD disaggregated mode, the following parameters should be fully configured to ensure proper KV Cache transfer: - connector_port: PD communication port - transfer_protocol: Transfer protocol (ipc / rdma) - device_ids: GPU device IDs - rdma_ports: RDMA ports (required when using the rdma protocol)

Startup Failures

1. Configuration file loading failure

If Failed to load config appears in startup logs, check: - Whether the file path specified by --config_path is correct - Whether the configuration file is valid YAML - Whether configuration parameter values are valid

2. Port already in use

If Failed to start server appears in startup logs, check: - Whether the port specified by --port is already occupied by another process - Use lsof -i:{port} or netstat -tlnp | grep {port} to check port usage

Tokenizer Service (cache_aware Strategy)

When using the cache_aware scheduling strategy, the Router calls a Tokenizer service to tokenize requests for cache hit ratio computation. When the Tokenizer service is unavailable, the Router has a two-level degradation mechanism:

Fallback to character-based tokenization (common case): The log will show tokenizer failed, fallback to char tokens. The cache_aware strategy remains active, using character-based tokenization for cache matching instead of the Tokenizer. Cache hit accuracy may decrease, but normal request processing is not affected.
Fallback to process_tokens strategy (extreme case): When tokenization completely fails (e.g., empty request content), the log will show tokenize failed, fallback to process_tokens. The cache_aware strategy temporarily becomes inactive, and scheduling falls back to token processing volume. Normal request processing is not affected.

To restore full cache_aware functionality:

1. Check if the Tokenizer service is running

curl -X POST http://{tokenizer_url}/tokenize \
  -H "Content-Type: application/json" \
  -d '{"text": "hello"}'

2. Check related configuration - Verify that tokenizer-url in config.yaml is set correctly - If the Tokenizer service responds slowly, consider increasing tokenizer-timeout-secs (default: 2 seconds)