docs(runtime-fallback): clarify timeout_seconds=0 disables auto-retry detection
This commit is contained in:
parent
eef80a4e23
commit
f82e65fdd1
@ -738,15 +738,33 @@ Automatically switch to backup models when the primary model encounters retryabl
|
||||
| `retry_on_errors` | `[400, 429, 503, 529]` | HTTP status codes that trigger fallback (rate limit, service unavailable). Also supports certain classified provider errors (for example, missing API key) that do not expose HTTP status codes. |
|
||||
| `max_fallback_attempts` | `3` | Maximum fallback attempts per session (1-20) |
|
||||
| `cooldown_seconds` | `60` | Cooldown in seconds before retrying a failed model |
|
||||
| `timeout_seconds` | `30` | Timeout in seconds for an in-flight fallback request before forcing the next fallback model. Set to `0` to disable timeout-based fallback and provider quota retry signal detection. |
|
||||
| `timeout_seconds` | `30` | Timeout in seconds for an in-flight fallback request before forcing the next fallback model. **⚠️ Set to `0` to disable auto-retry signal detection** (see below). |
|
||||
| `notify_on_fallback` | `true` | Show toast notification when switching to a fallback model |
|
||||
|
||||
### timeout_seconds: Understanding the 0 Value
|
||||
|
||||
**⚠️ IMPORTANT**: Setting `timeout_seconds: 0` **disables auto-retry signal detection**. This is a critical behavior change:
|
||||
|
||||
| Setting | Behavior |
|
||||
|---------|----------|
|
||||
| `timeout_seconds: 30` (default) | ✅ **Full fallback coverage**: Error-based fallback (429, 503, etc.) + auto-retry signal detection (provider messages like "retrying in 8h") |
|
||||
| `timeout_seconds: 0` | ⚠️ **Limited fallback**: Only error-based fallback works. Provider retry messages are **completely ignored**. Timeout-based escalation is **disabled**. |
|
||||
|
||||
**When `timeout_seconds: 0`:**
|
||||
- ✅ HTTP errors (429, 503, 529) still trigger fallback
|
||||
- ✅ Provider key errors (missing API key) still trigger fallback
|
||||
- ❌ Provider retry messages ("retrying in Xh") are **ignored**
|
||||
- ❌ Timeout-based escalation is **disabled**
|
||||
- ❌ Hanging requests do **not** advance to the next fallback model
|
||||
|
||||
**Recommendation**: Use a non-zero value (e.g., `30` seconds) to enable full fallback coverage. Only set to `0` if you explicitly want to disable auto-retry signal detection.
|
||||
|
||||
### How It Works
|
||||
|
||||
1. When an API error matching `retry_on_errors` occurs (or a classified provider key error such as missing API key), the hook intercepts it
|
||||
2. The next request automatically uses the next available model from `fallback_models`
|
||||
3. Failed models enter a cooldown period before being retried
|
||||
4. If a fallback provider hangs, timeout advances to the next fallback model
|
||||
4. If `timeout_seconds > 0` and a fallback provider hangs, timeout advances to the next fallback model
|
||||
5. Toast notification (optional) informs you of the model switch
|
||||
|
||||
### Configuring Fallback Models
|
||||
|
||||
@ -352,7 +352,7 @@ Hooks intercept and modify behavior at key points in the agent lifecycle.
|
||||
| **session-recovery** | Stop | Recovers from session errors - missing tool results, thinking block issues, empty messages. |
|
||||
| **anthropic-context-window-limit-recovery** | Stop | Handles Claude context window limits gracefully. |
|
||||
| **background-compaction** | Stop | Auto-compacts sessions hitting token limits. |
|
||||
| **runtime-fallback** | Event | Automatically switches to backup models on retryable API errors (e.g., 429, 503, 529) and provider key misconfiguration errors (e.g., missing API key). Configurable retry logic with per-model cooldown. |
|
||||
| **runtime-fallback** | Event | Automatically switches to backup models on retryable API errors (e.g., 429, 503, 529), provider key misconfiguration errors (e.g., missing API key), and auto-retry signals (when `timeout_seconds > 0`). Configurable retry logic with per-model cooldown. See [Runtime Fallback Configuration](configurations.md#runtime-fallback) for details on `timeout_seconds` behavior. |
|
||||
|
||||
#### Truncation & Context Management
|
||||
|
||||
|
||||
@ -9,7 +9,7 @@ export const RuntimeFallbackConfigSchema = z.object({
|
||||
max_fallback_attempts: z.number().min(1).max(20).optional(),
|
||||
/** Cooldown in seconds before retrying a failed model (default: 60) */
|
||||
cooldown_seconds: z.number().min(0).optional(),
|
||||
/** Session-level timeout in seconds to advance fallback when provider hangs (default: 30, 0 to disable) */
|
||||
/** Session-level timeout in seconds to advance fallback when provider hangs (default: 30). Set to 0 to disable auto-retry signal detection (only error-based fallback remains active). */
|
||||
timeout_seconds: z.number().min(0).optional(),
|
||||
/** Show toast notification when switching to fallback model (default: true) */
|
||||
notify_on_fallback: z.boolean().optional(),
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user