docs(runtime-fallback): document retry classes and timeout behavior

2026-02-12 13:11:06 -05:00 · 2026-02-12 13:11:06 -05:00 · 31f61078b1
commit 31f61078b1
parent 6a97f00a22
2 changed files with 67 additions and 5 deletions
--- a/docs/configurations.md
+++ b/docs/configurations.md
@ -717,7 +717,7 @@ Configure concurrency limits for background agent tasks. This controls how many

 ## Runtime Fallback

-Automatically switch to backup models when the primary model encounters transient API errors (rate limits, overload, etc.). This keeps conversations running without manual intervention.
+Automatically switch to backup models when the primary model encounters retryable API errors (rate limits, overload, etc.) or provider key misconfiguration errors (for example, missing API key). This keeps conversations running without manual intervention.

 ```json
 {
@ -726,6 +726,7 @@ Automatically switch to backup models when the primary model encounters transien
    "retry_on_errors": [429, 503, 529],
    "max_fallback_attempts": 3,
    "cooldown_seconds": 60,
+    "timeout_seconds": 30,
    "notify_on_fallback": true
  }
 }
@ -734,17 +735,19 @@ Automatically switch to backup models when the primary model encounters transien
 | Option                  | Default           | Description                                                                 |
 | ----------------------- | ----------------- | --------------------------------------------------------------------------- |
 | `enabled`               | `true`            | Enable runtime fallback                                                     |
-| `retry_on_errors`       | `[429, 503, 529]` | HTTP status codes that trigger fallback (rate limit, service unavailable)   |
+| `retry_on_errors`       | `[429, 503, 529]` | HTTP status codes that trigger fallback (rate limit, service unavailable). Also supports certain classified provider errors (for example, missing API key) that do not expose HTTP status codes.   |
 | `max_fallback_attempts` | `3`               | Maximum fallback attempts per session (1-10)                                |
 | `cooldown_seconds`      | `60`              | Cooldown in seconds before retrying a failed model                          |
+| `timeout_seconds`       | `30`              | Timeout in seconds for an in-flight fallback request before forcing the next fallback model                          |
 | `notify_on_fallback`    | `true`            | Show toast notification when switching to a fallback model                  |

 ### How It Works

-1. When an API error matching `retry_on_errors` occurs, the hook intercepts it
+1. When an API error matching `retry_on_errors` occurs (or a classified provider key error such as missing API key), the hook intercepts it
 2. The next request automatically uses the next available model from `fallback_models`
 3. Failed models enter a cooldown period before being retried
-4. Toast notification (optional) informs you of the model switch
+4. If a fallback provider hangs, timeout advances to the next fallback model
+5. Toast notification (optional) informs you of the model switch

 ### Configuring Fallback Models

@ -898,6 +901,65 @@ Each category supports: `model`, `fallback_models`, `temperature`, `top_p`, `max
 | `description`       | string       | -       | Human-readable description of the category's purpose. Shown in delegate_task prompt.                |
 | `is_unstable_agent` | boolean      | `false` | Mark agent as unstable - forces background mode for monitoring. Auto-enabled for gemini models.    |

+## Runtime Fallback
+
+Automatically switch to backup models when the primary model encounters retryable API errors (rate limits, overload, etc.) or provider key misconfiguration errors (for example, missing API key). This keeps conversations running without manual intervention.
+
+```json
+{
+  "runtime_fallback": {
+    "enabled": true,
+    "retry_on_errors": [429, 503, 529],
+    "max_fallback_attempts": 3,
+    "cooldown_seconds": 60,
+    "timeout_seconds": 30,
+    "notify_on_fallback": true
+  }
+}
+```
+
+| Option                  | Default           | Description                                                                 |
+| ----------------------- | ----------------- | --------------------------------------------------------------------------- |
+| `enabled`               | `true`            | Enable runtime fallback                                                     |
+| `retry_on_errors`       | `[429, 503, 529]` | HTTP status codes that trigger fallback (rate limit, service unavailable). Also supports certain classified provider errors (for example, missing API key) that do not expose HTTP status codes.   |
+| `max_fallback_attempts` | `3`               | Maximum fallback attempts per session (1-10)                                |
+| `cooldown_seconds`      | `60`              | Cooldown in seconds before retrying a failed model                          |
+| `timeout_seconds`       | `30`              | Timeout in seconds for an in-flight fallback request before forcing the next fallback model                          |
+| `notify_on_fallback`    | `true`            | Show toast notification when switching to a fallback model                  |
+
+### How It Works
+
+1. When an API error matching `retry_on_errors` occurs (or a classified provider key error such as missing API key), the hook intercepts it
+2. The next request automatically uses the next available model from `fallback_models`
+3. Failed models enter a cooldown period before being retried
+4. If a fallback provider hangs, timeout advances to the next fallback model
+5. Toast notification (optional) informs you of the model switch
+
+### Configuring Fallback Models
+
+Define `fallback_models` at the agent or category level:
+
+```json
+{
+  "agents": {
+    "sisyphus": {
+      "model": "anthropic/claude-opus-4-5",
+      "fallback_models": ["openai/gpt-5.2", "google/gemini-3-pro"]
+    }
+  },
+  "categories": {
+    "ultrabrain": {
+      "model": "openai/gpt-5.2-codex",
+      "fallback_models": ["anthropic/claude-opus-4-5", "google/gemini-3-pro"]
+    }
+  }
+}
+```
+
+When the primary model fails:
+1. First fallback: `openai/gpt-5.2`
+2. Second fallback: `google/gemini-3-pro`
+3. After `max_fallback_attempts`, returns to primary model
 ## Model Resolution System

 At runtime, Oh My OpenCode uses a 3-step resolution process to determine which model to use for each agent and category. This happens dynamically based on your configuration and available models.
--- a/docs/features.md
+++ b/docs/features.md
@ -352,7 +352,7 @@ Hooks intercept and modify behavior at key points in the agent lifecycle.
 | **session-recovery** | Stop | Recovers from session errors - missing tool results, thinking block issues, empty messages. |
 | **anthropic-context-window-limit-recovery** | Stop | Handles Claude context window limits gracefully. |
 | **background-compaction** | Stop | Auto-compacts sessions hitting token limits. |
-| **runtime-fallback** | Event | Automatically switches to fallback models on API errors (429, 503, 529). Configurable via `runtime_fallback` and `fallback_models`, with retry logic and cooldown. |
+| **runtime-fallback** | Event | Automatically switches to backup models on retryable API errors (e.g., 429, 503, 529) and provider key misconfiguration errors (e.g., missing API key). Configurable retry logic with per-model cooldown. |

 #### Truncation & Context Management