diff --git a/docs/configurations.md b/docs/configurations.md index ef0c3617..51cd5af5 100644 --- a/docs/configurations.md +++ b/docs/configurations.md @@ -738,15 +738,33 @@ Automatically switch to backup models when the primary model encounters retryabl | `retry_on_errors` | `[400, 429, 503, 529]` | HTTP status codes that trigger fallback (rate limit, service unavailable). Also supports certain classified provider errors (for example, missing API key) that do not expose HTTP status codes. | | `max_fallback_attempts` | `3` | Maximum fallback attempts per session (1-20) | | `cooldown_seconds` | `60` | Cooldown in seconds before retrying a failed model | -| `timeout_seconds` | `30` | Timeout in seconds for an in-flight fallback request before forcing the next fallback model. Set to `0` to disable timeout-based fallback and provider quota retry signal detection. | +| `timeout_seconds` | `30` | Timeout in seconds for an in-flight fallback request before forcing the next fallback model. **⚠️ Set to `0` to disable auto-retry signal detection** (see below). | | `notify_on_fallback` | `true` | Show toast notification when switching to a fallback model | +### timeout_seconds: Understanding the 0 Value + +**⚠️ IMPORTANT**: Setting `timeout_seconds: 0` **disables auto-retry signal detection**. This is a critical behavior change: + +| Setting | Behavior | +|---------|----------| +| `timeout_seconds: 30` (default) | ✅ **Full fallback coverage**: Error-based fallback (429, 503, etc.) + auto-retry signal detection (provider messages like "retrying in 8h") | +| `timeout_seconds: 0` | ⚠️ **Limited fallback**: Only error-based fallback works. Provider retry messages are **completely ignored**. Timeout-based escalation is **disabled**. | + +**When `timeout_seconds: 0`:** +- ✅ HTTP errors (429, 503, 529) still trigger fallback +- ✅ Provider key errors (missing API key) still trigger fallback +- ❌ Provider retry messages ("retrying in Xh") are **ignored** +- ❌ Timeout-based escalation is **disabled** +- ❌ Hanging requests do **not** advance to the next fallback model + +**Recommendation**: Use a non-zero value (e.g., `30` seconds) to enable full fallback coverage. Only set to `0` if you explicitly want to disable auto-retry signal detection. + ### How It Works 1. When an API error matching `retry_on_errors` occurs (or a classified provider key error such as missing API key), the hook intercepts it 2. The next request automatically uses the next available model from `fallback_models` 3. Failed models enter a cooldown period before being retried -4. If a fallback provider hangs, timeout advances to the next fallback model +4. If `timeout_seconds > 0` and a fallback provider hangs, timeout advances to the next fallback model 5. Toast notification (optional) informs you of the model switch ### Configuring Fallback Models diff --git a/docs/features.md b/docs/features.md index 5d4643fe..60c59e00 100644 --- a/docs/features.md +++ b/docs/features.md @@ -352,7 +352,7 @@ Hooks intercept and modify behavior at key points in the agent lifecycle. | **session-recovery** | Stop | Recovers from session errors - missing tool results, thinking block issues, empty messages. | | **anthropic-context-window-limit-recovery** | Stop | Handles Claude context window limits gracefully. | | **background-compaction** | Stop | Auto-compacts sessions hitting token limits. | -| **runtime-fallback** | Event | Automatically switches to backup models on retryable API errors (e.g., 429, 503, 529) and provider key misconfiguration errors (e.g., missing API key). Configurable retry logic with per-model cooldown. | +| **runtime-fallback** | Event | Automatically switches to backup models on retryable API errors (e.g., 429, 503, 529), provider key misconfiguration errors (e.g., missing API key), and auto-retry signals (when `timeout_seconds > 0`). Configurable retry logic with per-model cooldown. See [Runtime Fallback Configuration](configurations.md#runtime-fallback) for details on `timeout_seconds` behavior. | #### Truncation & Context Management diff --git a/src/config/schema/runtime-fallback.ts b/src/config/schema/runtime-fallback.ts index 78ae06bb..53219611 100644 --- a/src/config/schema/runtime-fallback.ts +++ b/src/config/schema/runtime-fallback.ts @@ -9,7 +9,7 @@ export const RuntimeFallbackConfigSchema = z.object({ max_fallback_attempts: z.number().min(1).max(20).optional(), /** Cooldown in seconds before retrying a failed model (default: 60) */ cooldown_seconds: z.number().min(0).optional(), - /** Session-level timeout in seconds to advance fallback when provider hangs (default: 30, 0 to disable) */ + /** Session-level timeout in seconds to advance fallback when provider hangs (default: 30). Set to 0 to disable auto-retry signal detection (only error-based fallback remains active). */ timeout_seconds: z.number().min(0).optional(), /** Show toast notification when switching to fallback model (default: true) */ notify_on_fallback: z.boolean().optional(),