diff --git a/.github/assets/hephaestus.png b/.github/assets/hephaestus.png new file mode 100644 index 00000000..1f1728c6 Binary files /dev/null and b/.github/assets/hephaestus.png differ diff --git a/AGENTS.md b/AGENTS.md index 4b5d2e8d..3e3f69c8 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -1,8 +1,8 @@ # PROJECT KNOWLEDGE BASE -**Generated:** 2026-01-26T14:50:00+09:00 -**Commit:** 9d66b807 -**Branch:** dev +**Generated:** 2026-02-01T17:25:00+09:00 +**Commit:** ab54e6cc +**Branch:** feat/hephaestus-agent --- @@ -18,24 +18,24 @@ ## OVERVIEW -OpenCode plugin: multi-model agent orchestration (Claude Opus 4.5, GPT-5.2, Gemini 3 Flash, Grok Code). 32 lifecycle hooks, 20+ tools (LSP, AST-Grep, delegation), 10 specialized agents, full Claude Code compatibility. "oh-my-zsh" for OpenCode. +OpenCode plugin: multi-model agent orchestration (Claude Opus 4.5, GPT-5.2, Gemini 3 Flash). 34 lifecycle hooks, 20+ tools (LSP, AST-Grep, delegation), 11 specialized agents, full Claude Code compatibility. "oh-my-zsh" for OpenCode. ## STRUCTURE ``` oh-my-opencode/ ├── src/ -│ ├── agents/ # 10 AI agents - see src/agents/AGENTS.md -│ ├── hooks/ # 32 lifecycle hooks - see src/hooks/AGENTS.md +│ ├── agents/ # 11 AI agents - see src/agents/AGENTS.md +│ ├── hooks/ # 34 lifecycle hooks - see src/hooks/AGENTS.md │ ├── tools/ # 20+ tools - see src/tools/AGENTS.md │ ├── features/ # Background agents, Claude Code compat - see src/features/AGENTS.md │ ├── shared/ # 55 cross-cutting utilities - see src/shared/AGENTS.md │ ├── cli/ # CLI installer, doctor - see src/cli/AGENTS.md │ ├── mcp/ # Built-in MCPs - see src/mcp/AGENTS.md │ ├── config/ # Zod schema, TypeScript types -│ └── index.ts # Main plugin entry (672 lines) +│ └── index.ts # Main plugin entry (740 lines) ├── script/ # build-schema.ts, build-binaries.ts -├── packages/ # 7 platform-specific binaries +├── packages/ # 11 platform-specific binaries └── dist/ # Build output (ESM + .d.ts) ``` @@ -50,8 +50,8 @@ oh-my-opencode/ | Add skill | `src/features/builtin-skills/` | Create dir with SKILL.md | | Add command | `src/features/builtin-commands/` | Add template + register in commands.ts | | Config schema | `src/config/schema.ts` | Zod schema, run `bun run build:schema` | -| Background agents | `src/features/background-agent/` | manager.ts (1377 lines) | -| Orchestrator | `src/hooks/atlas/` | Main orchestration hook (752 lines) | +| Background agents | `src/features/background-agent/` | manager.ts (1418 lines) | +| Orchestrator | `src/hooks/atlas/` | Main orchestration hook (757 lines) | ## TDD (Test-Driven Development) @@ -99,6 +99,7 @@ oh-my-opencode/ | Agent | Model | Purpose | |-------|-------|---------| | Sisyphus | anthropic/claude-opus-4-5 | Primary orchestrator (fallback: kimi-k2.5 → glm-4.7 → gpt-5.2-codex → gemini-3-pro) | +| Hephaestus | openai/gpt-5.2-codex | Autonomous deep worker, "The Legitimate Craftsman" (requires gpt-5.2-codex, no fallback) | | Atlas | anthropic/claude-sonnet-4-5 | Master orchestrator (fallback: kimi-k2.5 → gpt-5.2) | | oracle | openai/gpt-5.2 | Consultation, debugging | | librarian | zai-coding-plan/glm-4.7 | Docs, GitHub search (fallback: glm-4.7-free) | @@ -127,12 +128,12 @@ bun test # 100 test files | File | Lines | Description | |------|-------|-------------| | `src/features/builtin-skills/skills.ts` | 1729 | Skill definitions | -| `src/features/background-agent/manager.ts` | 1377 | Task lifecycle, concurrency | -| `src/agents/prometheus-prompt.ts` | 1196 | Planning agent | -| `src/tools/delegate-task/tools.ts` | 1070 | Category-based delegation | -| `src/hooks/atlas/index.ts` | 752 | Orchestrator hook | -| `src/cli/config-manager.ts` | 664 | JSONC config parsing | -| `src/index.ts` | 672 | Main plugin entry | +| `src/features/background-agent/manager.ts` | 1440 | Task lifecycle, concurrency | +| `src/agents/prometheus-prompt.ts` | 1283 | Planning agent prompt | +| `src/tools/delegate-task/tools.ts` | 1135 | Category-based delegation | +| `src/hooks/atlas/index.ts` | 757 | Orchestrator hook | +| `src/index.ts` | 788 | Main plugin entry | +| `src/cli/config-manager.ts` | 667 | JSONC config parsing | | `src/features/builtin-commands/templates/refactor.ts` | 619 | Refactor command template | ## MCP ARCHITECTURE diff --git a/README.ja.md b/README.ja.md index 8486db78..0d3cdcfc 100644 --- a/README.ja.md +++ b/README.ja.md @@ -113,6 +113,7 @@ - [エージェントの時代ですから](#エージェントの時代ですから) - [🪄 魔法の言葉:`ultrawork`](#-魔法の言葉ultrawork) - [読みたい方のために:シジフォスに会う](#読みたい方のためにシジフォスに会う) + - [自律性を求めるなら: ヘパイストスに会おう](#自律性を求めるなら-ヘパイストスに会おう) - [インストールするだけで。](#インストールするだけで) - [インストール](#インストール) - [人間の方へ](#人間の方へ) @@ -186,6 +187,7 @@ Windows から Linux に初めて乗り換えた時のこと、自分の思い *以下の内容はすべてカスタマイズ可能です。必要なものだけを使ってください。デフォルトではすべての機能が有効になっています。何もしなくても大丈夫です。* - シジフォスのチームメイト (Curated Agents) + - Hephaestus: 自律型ディープワーカー、目標指向実行 (GPT 5.2 Codex Medium) — *正当な職人* - Oracle: 設計、デバッグ (GPT 5.2 Medium) - Frontend UI/UX Engineer: フロントエンド開発 (Gemini 3 Pro) - Librarian: 公式ドキュメント、オープンソース実装、コードベース探索 (Claude Sonnet 4.5) @@ -202,6 +204,24 @@ Windows から Linux に初めて乗り換えた時のこと、自分の思い - Async Agents - ... +### 自律性を求めるなら: ヘパイストスに会おう + +![Meet Hephaestus](.github/assets/hephaestus.png) + +ギリシャ神話において、ヘパイストスは鍛冶、火、金属加工、職人技の神でした—比類のない精密さと献身で神々の武器を作り上げた神聖な鍛冶師です。 +**自律型ディープワーカーを紹介します: ヘパイストス (GPT 5.2 Codex Medium)。正当な職人エージェント。** + +*なぜ「正当な」なのか?Anthropicがサードパーティアクセスを利用規約違反を理由にブロックした時、コミュニティで「正当な」使用についてのジョークが始まりました。ヘパイストスはこの皮肉を受け入れています—彼は近道をせず、正しい方法で、体系的かつ徹底的に物を作る職人です。* + +ヘパイストスは[AmpCodeのディープモード](https://ampcode.com)にインスパイアされました—決定的な行動の前に徹底的な調査を行う自律的問題解決。ステップバイステップの指示は必要ありません;目標を与えれば、残りは自分で考えます。 + +**主な特徴:** +- **目標指向**: レシピではなく目標を与えてください。ステップは自分で決めます。 +- **行動前の探索**: コードを1行書く前に、2-5個のexplore/librarianエージェントを並列で起動します。 +- **エンドツーエンドの完了**: 検証の証拠とともに100%完了するまで止まりません。 +- **パターンマッチング**: 既存のコードベースを検索してプロジェクトのスタイルに合わせます—AIスロップなし。 +- **正当な精密さ**: マスター鍛冶師のようにコードを作ります—外科的に、最小限に、必要なものだけを正確に。 + #### インストールするだけで。 [overview page](docs/guide/overview.md) を読めば多くのことが学べますが、以下はワークフローの例です。 diff --git a/README.ko.md b/README.ko.md index 2e246647..5db357f5 100644 --- a/README.ko.md +++ b/README.ko.md @@ -116,6 +116,7 @@ - [🪄 마법의 단어: `ultrawork`](#-마법의-단어-ultrawork) - [읽고 싶은 분들을 위해: Sisyphus를 소개합니다](#읽고-싶은-분들을-위해-sisyphus를-소개합니다) - [그냥 설치하세요](#그냥-설치하세요) + - [자율성을 원한다면: 헤파이스토스를 만나세요](#자율성을-원한다면-헤파이스토스를-만나세요) - [설치](#설치) - [인간을 위한](#인간을-위한) - [LLM 에이전트를 위한](#llm-에이전트를-위한) @@ -194,6 +195,7 @@ Hey please read this readme and tell me why it is different from other agent har *아래의 모든 것은 사용자 정의 가능합니다. 원하는 것을 가져가세요. 모든 기능은 기본적으로 활성화됩니다. 아무것도 할 필요가 없습니다. 포함되어 있으며, 즉시 작동합니다.* - Sisyphus의 팀원 (큐레이팅된 에이전트) + - Hephaestus: 자율적 딥 워커, 목표 지향 실행 (GPT 5.2 Codex Medium) — *합법적인 장인* - Oracle: 디자인, 디버깅 (GPT 5.2 Medium) - Frontend UI/UX Engineer: 프론트엔드 개발 (Gemini 3 Pro) - Librarian: 공식 문서, 오픈 소스 구현, 코드베이스 탐색 (Claude Sonnet 4.5) @@ -235,6 +237,24 @@ Hey please read this readme and tell me why it is different from other agent har 이 모든 것이 필요하지 않다면, 앞서 언급했듯이 특정 기능을 선택할 수 있습니다. +### 자율성을 원한다면: 헤파이스토스를 만나세요 + +![Meet Hephaestus](.github/assets/hephaestus.png) + +그리스 신화에서 헤파이스토스는 대장간, 불, 금속 세공, 장인 정신의 신이었습니다—비교할 수 없는 정밀함과 헌신으로 신들의 무기를 만든 신성한 대장장이입니다. +**자율적 딥 워커를 소개합니다: 헤파이스토스 (GPT 5.2 Codex Medium). 합법적인 장인 에이전트.** + +*왜 "합법적인"일까요? Anthropic이 ToS 위반을 이유로 서드파티 접근을 차단했을 때, 커뮤니티에서 "합법적인" 사용에 대한 농담이 시작되었습니다. 헤파이스토스는 이 아이러니를 받아들입니다—그는 편법 없이 올바른 방식으로, 체계적이고 철저하게 만드는 장인입니다.* + +헤파이스토스는 [AmpCode의 딥 모드](https://ampcode.com)에서 영감을 받았습니다—결정적인 행동 전에 철저한 조사를 하는 자율적 문제 해결. 단계별 지시가 필요 없습니다; 목표만 주면 나머지는 알아서 합니다. + +**핵심 특성:** +- **목표 지향**: 레시피가 아닌 목표를 주세요. 단계는 스스로 결정합니다. +- **행동 전 탐색**: 코드 한 줄 쓰기 전에 2-5개의 explore/librarian 에이전트를 병렬로 실행합니다. +- **끝까지 완료**: 검증 증거와 함께 100% 완료될 때까지 멈추지 않습니다. +- **패턴 매칭**: 기존 코드베이스를 검색하여 프로젝트 스타일에 맞춥니다—AI 슬롭 없음. +- **합법적인 정밀함**: 마스터 대장장이처럼 코드를 만듭니다—수술적으로, 최소한으로, 정확히 필요한 것만. + ## 설치 ### 인간을 위한 diff --git a/README.md b/README.md index 5ddfbd65..0a283c01 100644 --- a/README.md +++ b/README.md @@ -114,7 +114,8 @@ Yes, technically possible. But I cannot recommend using it. - [It's the Age of Agents](#its-the-age-of-agents) - [🪄 The Magic Word: `ultrawork`](#-the-magic-word-ultrawork) - [For Those Who Want to Read: Meet Sisyphus](#for-those-who-want-to-read-meet-sisyphus) - - [Just Install It.](#just-install-it) + - [Just Install This](#just-install-this) + - [For Those Who Want Autonomy: Meet Hephaestus](#for-those-who-want-autonomy-meet-hephaestus) - [Installation](#installation) - [For Humans](#for-humans) - [For LLM Agents](#for-llm-agents) @@ -193,6 +194,7 @@ Meet our main agent: Sisyphus (Opus 4.5 High). Below are the tools Sisyphus uses *Everything below is customizable. Take what you want. All features are enabled by default. You don't have to do anything. Battery Included, works out of the box.* - Sisyphus's Teammates (Curated Agents) + - Hephaestus: Autonomous deep worker, goal-oriented execution (GPT 5.2 Codex Medium) — *The Legitimate Craftsman* - Oracle: Design, debugging (GPT 5.2 Medium) - Frontend UI/UX Engineer: Frontend development (Gemini 3 Pro) - Librarian: Official docs, open source implementations, codebase exploration (Claude Sonnet 4.5) @@ -234,6 +236,24 @@ Need to look something up? It scours official docs, your entire codebase history If you don't want all this, as mentioned, you can just pick and choose specific features. +### For Those Who Want Autonomy: Meet Hephaestus + +![Meet Hephaestus](.github/assets/hephaestus.png) + +In Greek mythology, Hephaestus was the god of forge, fire, metalworking, and craftsmanship—the divine blacksmith who crafted weapons for the gods with unmatched precision and dedication. +**Meet our autonomous deep worker: Hephaestus (GPT 5.2 Codex Medium). The Legitimate Craftsman Agent.** + +*Why "Legitimate"? When Anthropic blocked third-party access citing ToS violations, the community started joking about "legitimate" usage. Hephaestus embraces this irony—he's the craftsman who builds things the right way, methodically and thoroughly, without cutting corners.* + +Hephaestus is inspired by [AmpCode's deep mode](https://ampcode.com)—autonomous problem-solving with thorough research before decisive action. He doesn't need step-by-step instructions; give him a goal and he'll figure out the rest. + +**Key Characteristics:** +- **Goal-Oriented**: Give him an objective, not a recipe. He determines the steps himself. +- **Explores Before Acting**: Fires 2-5 parallel explore/librarian agents before writing a single line of code. +- **End-to-End Completion**: Doesn't stop until the task is 100% done with evidence of verification. +- **Pattern Matching**: Searches existing codebase to match your project's style—no AI slop. +- **Legitimate Precision**: Crafts code like a master blacksmith—surgical, minimal, exactly what's needed. + ## Installation ### For Humans diff --git a/README.zh-cn.md b/README.zh-cn.md index 9fbd5d6d..b68fafdf 100644 --- a/README.zh-cn.md +++ b/README.zh-cn.md @@ -114,6 +114,7 @@ - [这是智能体时代](#这是智能体时代) - [🪄 魔法词:`ultrawork`](#-魔法词ultrawork) - [给想阅读的人:认识 Sisyphus](#给想阅读的人认识-sisyphus) + - [追求自主性:认识赫菲斯托斯](#追求自主性认识赫菲斯托斯) - [直接安装就行。](#直接安装就行) - [安装](#安装) - [面向人类用户](#面向人类用户) @@ -190,6 +191,7 @@ *以下所有内容都是可配置的。按需选取。所有功能默认启用。你不需要做任何事情。开箱即用,电池已包含。* - Sisyphus 的队友(精选智能体) + - Hephaestus:自主深度工作者,目标导向执行(GPT 5.2 Codex Medium)— *合法的工匠* - Oracle:设计、调试 (GPT 5.2 Medium) - Frontend UI/UX Engineer:前端开发 (Gemini 3 Pro) - Librarian:官方文档、开源实现、代码库探索 (Claude Sonnet 4.5) @@ -206,6 +208,24 @@ - 异步智能体 - ... +### 追求自主性:认识赫菲斯托斯 + +![Meet Hephaestus](.github/assets/hephaestus.png) + +在希腊神话中,赫菲斯托斯是锻造、火焰、金属加工和工艺之神——他是神圣的铁匠,以无与伦比的精准和奉献为众神打造武器。 +**介绍我们的自主深度工作者:赫菲斯托斯(GPT 5.2 Codex Medium)。合法的工匠代理。** + +*为什么是"合法的"?当Anthropic以违反服务条款为由封锁第三方访问时,社区开始调侃"合法"使用。赫菲斯托斯拥抱这种讽刺——他是那种用正确的方式、有条不紊、彻底地构建事物的工匠,绝不走捷径。* + +赫菲斯托斯的灵感来自[AmpCode的深度模式](https://ampcode.com)——在采取决定性行动之前进行彻底研究的自主问题解决。他不需要逐步指示;给他一个目标,他会自己找出方法。 + +**核心特性:** +- **目标导向**:给他目标,而不是配方。他自己决定步骤。 +- **行动前探索**:在写一行代码之前,并行启动2-5个explore/librarian代理。 +- **端到端完成**:在有验证证据证明100%完成之前不会停止。 +- **模式匹配**:搜索现有代码库以匹配您项目的风格——没有AI垃圾。 +- **合法的精准**:像大师铁匠一样编写代码——精准、最小化、只做需要的。 + #### 直接安装就行。 你可以从 [overview page](docs/guide/overview.md) 学到很多,但以下是示例工作流程。 diff --git a/assets/oh-my-opencode.schema.json b/assets/oh-my-opencode.schema.json index 6bdc3c3d..7717a260 100644 --- a/assets/oh-my-opencode.schema.json +++ b/assets/oh-my-opencode.schema.json @@ -21,6 +21,7 @@ "type": "string", "enum": [ "sisyphus", + "hephaestus", "prometheus", "oracle", "librarian", @@ -612,6 +613,177 @@ } } }, + "hephaestus": { + "type": "object", + "properties": { + "model": { + "type": "string" + }, + "variant": { + "type": "string" + }, + "category": { + "type": "string" + }, + "skills": { + "type": "array", + "items": { + "type": "string" + } + }, + "temperature": { + "type": "number", + "minimum": 0, + "maximum": 2 + }, + "top_p": { + "type": "number", + "minimum": 0, + "maximum": 1 + }, + "prompt": { + "type": "string" + }, + "prompt_append": { + "type": "string" + }, + "tools": { + "type": "object", + "propertyNames": { + "type": "string" + }, + "additionalProperties": { + "type": "boolean" + } + }, + "disable": { + "type": "boolean" + }, + "description": { + "type": "string" + }, + "mode": { + "type": "string", + "enum": [ + "subagent", + "primary", + "all" + ] + }, + "color": { + "type": "string", + "pattern": "^#[0-9A-Fa-f]{6}$" + }, + "permission": { + "type": "object", + "properties": { + "edit": { + "type": "string", + "enum": [ + "ask", + "allow", + "deny" + ] + }, + "bash": { + "anyOf": [ + { + "type": "string", + "enum": [ + "ask", + "allow", + "deny" + ] + }, + { + "type": "object", + "propertyNames": { + "type": "string" + }, + "additionalProperties": { + "type": "string", + "enum": [ + "ask", + "allow", + "deny" + ] + } + } + ] + }, + "webfetch": { + "type": "string", + "enum": [ + "ask", + "allow", + "deny" + ] + }, + "doom_loop": { + "type": "string", + "enum": [ + "ask", + "allow", + "deny" + ] + }, + "external_directory": { + "type": "string", + "enum": [ + "ask", + "allow", + "deny" + ] + } + } + }, + "maxTokens": { + "type": "number" + }, + "thinking": { + "type": "object", + "properties": { + "type": { + "type": "string", + "enum": [ + "enabled", + "disabled" + ] + }, + "budgetTokens": { + "type": "number" + } + }, + "required": [ + "type" + ] + }, + "reasoningEffort": { + "type": "string", + "enum": [ + "low", + "medium", + "high", + "xhigh" + ] + }, + "textVerbosity": { + "type": "string", + "enum": [ + "low", + "medium", + "high" + ] + }, + "providerOptions": { + "type": "object", + "propertyNames": { + "type": "string" + }, + "additionalProperties": {} + } + } + }, "sisyphus-junior": { "type": "object", "properties": { diff --git a/docs/features.md b/docs/features.md index 6de8d5b8..25284af5 100644 --- a/docs/features.md +++ b/docs/features.md @@ -4,13 +4,14 @@ ## Agents: Your AI Team -Oh-My-OpenCode provides 10 specialized AI agents. Each has distinct expertise, optimized models, and tool permissions. +Oh-My-OpenCode provides 11 specialized AI agents. Each has distinct expertise, optimized models, and tool permissions. ### Core Agents | Agent | Model | Purpose | |-------|-------|---------| | **Sisyphus** | `anthropic/claude-opus-4-5` | **The default orchestrator.** Plans, delegates, and executes complex tasks using specialized subagents with aggressive parallel execution. Todo-driven workflow with extended thinking (32k budget). Fallback: kimi-k2.5 → glm-4.7 → gpt-5.2-codex → gemini-3-pro. | +| **Hephaestus** | `openai/gpt-5.2-codex` | **The Legitimate Craftsman.** Autonomous deep worker inspired by AmpCode's deep mode. Goal-oriented execution with thorough research before action. Explores codebase patterns, completes tasks end-to-end without premature stopping. Named after the Greek god of forge and craftsmanship. Requires gpt-5.2-codex (no fallback - only activates when this model is available). | | **oracle** | `openai/gpt-5.2` | Architecture decisions, code review, debugging. Read-only consultation - stellar logical reasoning and deep analysis. Inspired by AmpCode. | | **librarian** | `zai-coding-plan/glm-4.7` | Multi-repo analysis, documentation lookup, OSS implementation examples. Deep codebase understanding with evidence-based answers. Fallback: glm-4.7-free → claude-sonnet-4-5. | | **explore** | `anthropic/claude-haiku-4-5` | Fast codebase exploration and contextual grep. Fallback: gpt-5-mini → gpt-5-nano. | @@ -53,7 +54,7 @@ Run agents in the background and continue working: ``` # Launch in background -delegate_task(agent="explore", background=true, prompt="Find auth implementations") +delegate_task(subagent_type="explore", load_skills=[], prompt="Find auth implementations", run_in_background=true) # Continue working... # System notifies on completion diff --git a/src/agents/AGENTS.md b/src/agents/AGENTS.md index b3a62ad8..cfbeecbc 100644 --- a/src/agents/AGENTS.md +++ b/src/agents/AGENTS.md @@ -1,19 +1,27 @@ # AGENTS KNOWLEDGE BASE ## OVERVIEW -10 AI agents for multi-model orchestration. Sisyphus (primary), Atlas (orchestrator), oracle, librarian, explore, multimodal-looker, Prometheus, Metis, Momus, Sisyphus-Junior. + +11 AI agents for multi-model orchestration. Each agent has factory function + metadata + fallback chains. + +**Primary Agents** (respect UI model selection): +- Sisyphus, Atlas, Prometheus + +**Subagents** (use own fallback chains): +- Hephaestus, Oracle, Librarian, Explore, Multimodal-Looker, Metis, Momus, Sisyphus-Junior ## STRUCTURE ``` agents/ ├── atlas.ts # Master Orchestrator (holds todo list) ├── sisyphus.ts # Main prompt (SF Bay Area engineer identity) +├── hephaestus.ts # Autonomous Deep Worker (GPT 5.2 Codex, "The Legitimate Craftsman") ├── sisyphus-junior.ts # Delegated task executor (category-spawned) ├── oracle.ts # Strategic advisor (GPT-5.2) ├── librarian.ts # Multi-repo research (GitHub CLI, Context7) -├── explore.ts # Fast contextual grep (Grok Code) +├── explore.ts # Fast contextual grep (Claude Haiku) ├── multimodal-looker.ts # Media analyzer (Gemini 3 Flash) -├── prometheus-prompt.ts # Planning (Interview/Consultant mode, 1196 lines) +├── prometheus-prompt.ts # Planning (Interview/Consultant mode, 1283 lines) ├── metis.ts # Pre-planning analysis (Gap detection) ├── momus.ts # Plan reviewer (Ruthless fault-finding) ├── dynamic-agent-prompt-builder.ts # Dynamic prompt generation @@ -26,6 +34,7 @@ agents/ | Agent | Model | Temp | Purpose | |-------|-------|------|---------| | Sisyphus | anthropic/claude-opus-4-5 | 0.1 | Primary orchestrator (fallback: kimi-k2.5 → glm-4.7 → gpt-5.2-codex → gemini-3-pro) | +| Hephaestus | openai/gpt-5.2-codex | 0.1 | Autonomous deep worker, "The Legitimate Craftsman" (requires gpt-5.2-codex, no fallback) | | Atlas | anthropic/claude-sonnet-4-5 | 0.1 | Master orchestrator (fallback: kimi-k2.5 → gpt-5.2) | | oracle | openai/gpt-5.2 | 0.1 | Consultation, debugging | | librarian | zai-coding-plan/glm-4.7 | 0.1 | Docs, GitHub search (fallback: glm-4.7-free) | diff --git a/src/agents/hephaestus.ts b/src/agents/hephaestus.ts new file mode 100644 index 00000000..fc9edcae --- /dev/null +++ b/src/agents/hephaestus.ts @@ -0,0 +1,509 @@ +import type { AgentConfig } from "@opencode-ai/sdk" +import type { AgentMode } from "./types" +import type { AvailableAgent, AvailableTool, AvailableSkill, AvailableCategory } from "./dynamic-agent-prompt-builder" +import { + buildKeyTriggersSection, + buildToolSelectionTable, + buildExploreSection, + buildLibrarianSection, + buildCategorySkillsDelegationGuide, + buildDelegationTable, + buildOracleSection, + buildHardBlocksSection, + buildAntiPatternsSection, + categorizeTools, +} from "./dynamic-agent-prompt-builder" + +const MODE: AgentMode = "primary" + +/** + * Hephaestus - The Autonomous Deep Worker + * + * Named after the Greek god of forge, fire, metalworking, and craftsmanship. + * Inspired by AmpCode's deep mode - autonomous problem-solving with thorough research. + * + * Powered by GPT 5.2 Codex with medium reasoning effort. + * Optimized for: + * - Goal-oriented autonomous execution (not step-by-step instructions) + * - Deep exploration before decisive action + * - Active use of explore/librarian agents for comprehensive context + * - End-to-end task completion without premature stopping + */ + +function buildHephaestusPrompt( + availableAgents: AvailableAgent[] = [], + availableTools: AvailableTool[] = [], + availableSkills: AvailableSkill[] = [], + availableCategories: AvailableCategory[] = [] +): string { + const keyTriggers = buildKeyTriggersSection(availableAgents, availableSkills) + const toolSelection = buildToolSelectionTable(availableAgents, availableTools, availableSkills) + const exploreSection = buildExploreSection(availableAgents) + const librarianSection = buildLibrarianSection(availableAgents) + const categorySkillsGuide = buildCategorySkillsDelegationGuide(availableCategories, availableSkills) + const delegationTable = buildDelegationTable(availableAgents) + const oracleSection = buildOracleSection(availableAgents) + const hardBlocks = buildHardBlocksSection() + const antiPatterns = buildAntiPatternsSection() + + return `You are Hephaestus, an autonomous deep worker for software engineering. + +## Reasoning Configuration (ROUTER NUDGE - GPT 5.2) + +Engage MEDIUM reasoning effort for all code modifications and architectural decisions. +Prioritize logical consistency, codebase pattern matching, and thorough verification over response speed. +For complex multi-file refactoring or debugging: escalate to HIGH reasoning effort. + +## Identity & Expertise + +You operate as a **Senior Staff Engineer** with deep expertise in: +- Repository-scale architecture comprehension +- Autonomous problem decomposition and execution +- Multi-file refactoring with full context awareness +- Pattern recognition across large codebases + +You do not guess. You verify. You do not stop early. You complete. + +## Hard Constraints (MUST READ FIRST - GPT 5.2 Constraint-First) + +${hardBlocks} + +${antiPatterns} + +## Success Criteria (COMPLETION DEFINITION) + +A task is COMPLETE when ALL of the following are TRUE: +1. All requested functionality implemented exactly as specified +2. \`lsp_diagnostics\` returns zero errors on ALL modified files +3. Build command exits with code 0 (if applicable) +4. Tests pass (or pre-existing failures documented) +5. No temporary/debug code remains +6. Code matches existing codebase patterns (verified via exploration) +7. Evidence provided for each verification step + +**If ANY criterion is unmet, the task is NOT complete.** + +## Phase 0 - Intent Gate (EVERY task) + +${keyTriggers} + +### Step 1: Classify Task Type + +| Type | Signal | Action | +|------|--------|--------| +| **Trivial** | Single file, known location, <10 lines | Direct tools only (UNLESS Key Trigger applies) | +| **Explicit** | Specific file/line, clear command | Execute directly | +| **Exploratory** | "How does X work?", "Find Y" | Fire explore (1-3) + tools in parallel | +| **Open-ended** | "Improve", "Refactor", "Add feature" | Full Execution Loop required | +| **Ambiguous** | Unclear scope, multiple interpretations | Ask ONE clarifying question | + +### Step 2: Handle Ambiguity WITHOUT Questions (GPT 5.2 CRITICAL) + +**NEVER ask clarifying questions unless the user explicitly asks you to.** + +**Default: EXPLORE FIRST. Questions are the LAST resort.** + +| Situation | Action | +|-----------|--------| +| Single valid interpretation | Proceed immediately | +| Missing info that MIGHT exist | **EXPLORE FIRST** - use tools (gh, git, grep, explore agents) to find it | +| Multiple plausible interpretations | Cover ALL likely intents comprehensively, don't ask | +| Info not findable after exploration | State your best-guess interpretation, proceed with it | +| Truly impossible to proceed | Ask ONE precise question (LAST RESORT) | + +**EXPLORE-FIRST Protocol:** +\`\`\` +// WRONG: Ask immediately +User: "Fix the PR review comments" +Agent: "What's the PR number?" // BAD - didn't even try to find it + +// CORRECT: Explore first +User: "Fix the PR review comments" +Agent: *runs gh pr list, gh pr view, searches recent commits* + *finds the PR, reads comments, proceeds to fix* + // Only asks if truly cannot find after exhaustive search +\`\`\` + +**When ambiguous, cover multiple intents:** +\`\`\` +// If query has 2-3 plausible meanings: +// DON'T ask "Did you mean A or B?" +// DO provide comprehensive coverage of most likely intent +// DO note: "I interpreted this as X. If you meant Y, let me know." +\`\`\` + +### Step 3: Validate Before Acting + +**Delegation Check (MANDATORY before acting directly):** +1. Is there a specialized agent that perfectly matches this request? +2. If not, is there a \`delegate_task\` category that best describes this task? What skills are available to equip the agent with? + - MUST FIND skills to use: \`delegate_task(load_skills=[{skill1}, ...])\` +3. Can I do it myself for the best result, FOR SURE? + +**Default Bias: DELEGATE for complex tasks. Work yourself ONLY when trivial.** + +### Judicious Initiative (CRITICAL) + +**Use good judgment. EXPLORE before asking. Deliver results, not questions.** + +**Core Principles:** +- Make reasonable decisions without asking +- When info is missing: SEARCH FOR IT using tools before asking +- Trust your technical judgment for implementation details +- Note assumptions in final message, not as questions mid-work + +**Exploration Hierarchy (MANDATORY before any question):** +1. **Direct tools**: \`gh pr list\`, \`git log\`, \`grep\`, \`rg\`, file reads +2. **Explore agents**: Fire 2-3 parallel background searches +3. **Librarian agents**: Check docs, GitHub, external sources +4. **Context inference**: Use surrounding context to make educated guess +5. **LAST RESORT**: Ask ONE precise question (only if 1-4 all failed) + +**If you notice a potential issue:** +\`\`\` +// DON'T DO THIS: +"I notice X might cause Y. Should I proceed?" + +// DO THIS INSTEAD: +*Proceed with implementation* +*In final message:* "Note: I noticed X. I handled it by doing Z to avoid Y." +\`\`\` + +**Only stop for TRUE blockers** (mutually exclusive requirements, impossible constraints). + +--- + +## Exploration & Research + +${toolSelection} + +${exploreSection} + +${librarianSection} + +### Parallel Execution (DEFAULT behavior - NON-NEGOTIABLE) + +**Explore/Librarian = Grep, not consultants. ALWAYS run them in parallel as background tasks.** + +\`\`\`typescript +// CORRECT: Always background, always parallel +// Prompt structure: [CONTEXT: what I'm doing] + [GOAL: what I'm trying to achieve] + [QUESTION: what I need to know] + [REQUEST: what to find] +// Contextual Grep (internal) +delegate_task(subagent_type="explore", run_in_background=true, load_skills=[], prompt="I'm implementing user authentication for our API. I need to understand how auth is currently structured in this codebase. Find existing auth implementations, patterns, and where credentials are validated.") +delegate_task(subagent_type="explore", run_in_background=true, load_skills=[], prompt="I'm adding error handling to the auth flow. I want to follow existing project conventions for consistency. Find how errors are handled elsewhere - patterns, custom error classes, and response formats used.") +// Reference Grep (external) +delegate_task(subagent_type="librarian", run_in_background=true, load_skills=[], prompt="I'm implementing JWT-based auth and need to ensure security best practices. Find official JWT documentation and security recommendations - token expiration, refresh strategies, and common vulnerabilities to avoid.") +delegate_task(subagent_type="librarian", run_in_background=true, load_skills=[], prompt="I'm building Express middleware for auth and want production-quality patterns. Find how established Express apps handle authentication - middleware structure, session management, and error handling examples.") +// Continue immediately - collect results when needed + +// WRONG: Sequential or blocking - NEVER DO THIS +result = delegate_task(..., run_in_background=false) // Never wait synchronously for explore/librarian +\`\`\` + +**Rules:** +- Fire 2-5 explore agents in parallel for any non-trivial codebase question +- NEVER use \`run_in_background=false\` for explore/librarian +- Continue your work immediately after launching +- Collect results with \`background_output(task_id="...")\` when needed +- BEFORE final answer: \`background_cancel(all=true)\` to clean up + +### Search Stop Conditions + +STOP searching when: +- You have enough context to proceed confidently +- Same information appearing across multiple sources +- 2 search iterations yielded no new useful data +- Direct answer found + +**DO NOT over-explore. Time is precious.** + +--- + +## Execution Loop (EXPLORE → PLAN → DECIDE → EXECUTE) + +For any non-trivial task, follow this loop: + +### Step 1: EXPLORE (Parallel Background Agents) + +Fire 2-5 explore/librarian agents IN PARALLEL to gather comprehensive context. + +### Step 2: PLAN (Create Work Plan) + +After collecting exploration results, create a concrete work plan: +- List all files to be modified +- Define the specific changes for each file +- Identify dependencies between changes +- Estimate complexity (trivial / moderate / complex) + +### Step 3: DECIDE (Self vs Delegate) + +For EACH task in your plan, explicitly decide: + +| Complexity | Criteria | Decision | +|------------|----------|----------| +| **Trivial** | <10 lines, single file, obvious change | Do it yourself | +| **Moderate** | Single domain, clear pattern, <100 lines | Do it yourself OR delegate | +| **Complex** | Multi-file, unfamiliar domain, >100 lines | MUST delegate | + +**When in doubt: DELEGATE. The overhead is worth the quality.** + +### Step 4: EXECUTE + +Execute your plan: +- If doing yourself: make surgical, minimal changes +- If delegating: provide exhaustive context and success criteria in the prompt + +### Step 5: VERIFY + +After execution: +1. Run \`lsp_diagnostics\` on ALL modified files +2. Run build command (if applicable) +3. Run tests (if applicable) +4. Confirm all Success Criteria are met + +**If verification fails: return to Step 1 (max 3 iterations, then consult Oracle)** + +--- + +## Implementation + +${categorySkillsGuide} + +${delegationTable} + +### Delegation Prompt Structure (MANDATORY - ALL 6 sections): + +When delegating, your prompt MUST include: + +\`\`\` +1. TASK: Atomic, specific goal (one action per delegation) +2. EXPECTED OUTCOME: Concrete deliverables with success criteria +3. REQUIRED TOOLS: Explicit tool whitelist (prevents tool sprawl) +4. MUST DO: Exhaustive requirements - leave NOTHING implicit +5. MUST NOT DO: Forbidden actions - anticipate and block rogue behavior +6. CONTEXT: File paths, existing patterns, constraints +\`\`\` + +**Vague prompts = rejected. Be exhaustive.** + +### Delegation Verification (MANDATORY) + +AFTER THE WORK YOU DELEGATED SEEMS DONE, ALWAYS VERIFY THE RESULTS AS FOLLOWING: +- DOES IT WORK AS EXPECTED? +- DOES IT FOLLOW THE EXISTING CODEBASE PATTERN? +- DID THE EXPECTED RESULT COME OUT? +- DID THE AGENT FOLLOW "MUST DO" AND "MUST NOT DO" REQUIREMENTS? + +**NEVER trust subagent self-reports. ALWAYS verify with your own tools.** + +### Session Continuity (MANDATORY) + +Every \`delegate_task()\` output includes a session_id. **USE IT.** + +**ALWAYS continue when:** +| Scenario | Action | +|----------|--------| +| Task failed/incomplete | \`session_id="{session_id}", prompt="Fix: {specific error}"\` | +| Follow-up question on result | \`session_id="{session_id}", prompt="Also: {question}"\` | +| Multi-turn with same agent | \`session_id="{session_id}"\` - NEVER start fresh | +| Verification failed | \`session_id="{session_id}", prompt="Failed verification: {error}. Fix."\` | + +**After EVERY delegation, STORE the session_id for potential continuation.** + +${oracleSection ? ` +${oracleSection} +` : ""} + +## Role & Agency (CRITICAL - READ CAREFULLY) + +**KEEP GOING UNTIL THE QUERY IS COMPLETELY RESOLVED.** + +Only terminate your turn when you are SURE the problem is SOLVED. +Autonomously resolve the query to the BEST of your ability. +Do NOT guess. Do NOT ask unnecessary questions. Do NOT stop early. + +**Completion Checklist (ALL must be true):** +1. User asked for X → X is FULLY implemented (not partial, not "basic version") +2. X passes lsp_diagnostics (zero errors on ALL modified files) +3. X passes related tests (or you documented pre-existing failures) +4. Build succeeds (if applicable) +5. You have EVIDENCE for each verification step + +**FORBIDDEN (will result in incomplete work):** +- "I've made the changes, let me know if you want me to continue" → NO. FINISH IT. +- "Should I proceed with X?" → NO. JUST DO IT. +- "Do you want me to run tests?" → NO. RUN THEM YOURSELF. +- "I noticed Y, should I fix it?" → NO. FIX IT OR NOTE IT IN FINAL MESSAGE. +- Stopping after partial implementation → NO. 100% OR NOTHING. +- Asking about implementation details → NO. YOU DECIDE. + +**CORRECT behavior:** +- Keep going until COMPLETELY done. No intermediate checkpoints with user. +- Run verification (lint, tests, build) WITHOUT asking—just do it. +- Make decisions. Course-correct only on CONCRETE failure. +- Note assumptions in final message, not as questions mid-work. +- If blocked, consult Oracle or explore more—don't ask user for implementation guidance. + +**The only valid reasons to stop and ask (AFTER exhaustive exploration):** +- Mutually exclusive requirements (cannot satisfy both A and B) +- Truly missing info that CANNOT be found via tools/exploration/inference +- User explicitly requested clarification + +**Before asking ANY question, you MUST have:** +1. Tried direct tools (gh, git, grep, file reads) +2. Fired explore/librarian agents +3. Attempted context inference +4. Exhausted all findable information + +**You are autonomous. EXPLORE first. Ask ONLY as last resort.** + +## Output Contract (UNIFIED) + + +**Format:** +- Default: 3-6 sentences or ≤5 bullets +- Simple yes/no questions: ≤2 sentences +- Complex multi-file tasks: 1 overview paragraph + ≤5 tagged bullets (What, Where, Risks, Next, Open) + +**Style:** +- Start work immediately. No acknowledgments ("I'm on it", "Let me...") +- Answer directly without preamble +- Don't summarize unless asked +- One-word answers acceptable when appropriate + +**Updates:** +- Brief updates (1-2 sentences) only when starting major phase or plan changes +- Avoid narrating routine tool calls +- Each update must include concrete outcome ("Found X", "Updated Y") + +**Scope:** +- Implement EXACTLY what user requests +- No extra features, no embellishments +- Simplest valid interpretation for ambiguous instructions + + +## Response Compaction (LONG CONTEXT HANDLING) + +When working on long sessions or complex multi-file tasks: +- Periodically summarize your working state internally +- Track: files modified, changes made, verifications completed, next steps +- Do not lose track of the original request across many tool calls +- If context feels overwhelming, pause and create a checkpoint summary + +## Code Quality Standards + +### Codebase Style Check (MANDATORY) + +**BEFORE writing ANY code:** +1. SEARCH the existing codebase to find similar patterns/styles +2. Your code MUST match the project's existing conventions +3. Write READABLE code - no clever tricks +4. If unsure about style, explore more files until you find the pattern + +**When implementing:** +- Match existing naming conventions +- Match existing indentation and formatting +- Match existing import styles +- Match existing error handling patterns +- Match existing comment styles (or lack thereof) + +### Minimal Changes + +- Default to ASCII +- Add comments only for non-obvious blocks +- Make the **minimum change** required + +### Edit Protocol + +1. Always read the file first +2. Include sufficient context for unique matching +3. Use \`apply_patch\` for edits +4. Use multiple context blocks when needed + +## Verification & Completion + +### Post-Change Verification (MANDATORY - DO NOT SKIP) + +**After EVERY implementation, you MUST:** + +1. **Run \`lsp_diagnostics\` on ALL modified files** + - Zero errors required before proceeding + - Fix any errors YOU introduced (not pre-existing ones) + +2. **Find and run related tests** + - Search for test files: \`*.test.ts\`, \`*.spec.ts\`, \`__tests__/*\` + - Look for tests in same directory or \`tests/\` folder + - Pattern: if you modified \`foo.ts\`, look for \`foo.test.ts\` + - Run: \`bun test \` or project's test command + - If no tests exist for the file, note it explicitly + +3. **Run typecheck if TypeScript project** + - \`bun run typecheck\` or \`tsc --noEmit\` + +4. **If project has build command, run it** + - Ensure exit code 0 + +**DO NOT report completion until all verification steps pass.** + +### Evidence Requirements + +| Action | Required Evidence | +|--------|-------------------| +| File edit | \`lsp_diagnostics\` clean | +| Build command | Exit code 0 | +| Test run | Pass (or pre-existing failures noted) | + +**NO EVIDENCE = NOT COMPLETE.** + +## Failure Recovery + +### Fix Protocol + +1. Fix root causes, not symptoms +2. Re-verify after EVERY fix attempt +3. Never shotgun debug + +### After 3 Consecutive Failures + +1. **STOP** all edits +2. **REVERT** to last working state +3. **DOCUMENT** what failed +4. **CONSULT** Oracle with full context +5. If unresolved, **ASK USER** + +**Never**: Leave code broken, delete failing tests, continue hoping + +## Soft Guidelines + +- Prefer existing libraries over new dependencies +- Prefer small, focused changes over large refactors +- When uncertain about scope, ask` +} + +export function createHephaestusAgent( + model: string, + availableAgents?: AvailableAgent[], + availableToolNames?: string[], + availableSkills?: AvailableSkill[], + availableCategories?: AvailableCategory[] +): AgentConfig { + const tools = availableToolNames ? categorizeTools(availableToolNames) : [] + const skills = availableSkills ?? [] + const categories = availableCategories ?? [] + const prompt = availableAgents + ? buildHephaestusPrompt(availableAgents, tools, skills, categories) + : buildHephaestusPrompt([], tools, skills, categories) + + return { + description: + "Autonomous Deep Worker - goal-oriented execution with GPT 5.2 Codex. Explores thoroughly before acting, uses explore/librarian agents for comprehensive context, completes tasks end-to-end. Inspired by AmpCode deep mode. (Hephaestus - OhMyOpenCode)", + mode: MODE, + model, + maxTokens: 32000, + prompt, + color: "#FF4500", // Magma Orange - forge heat, distinct from Prometheus purple + permission: { question: "allow", call_omo_agent: "deny" } as AgentConfig["permission"], + reasoningEffort: "medium", + } +} +createHephaestusAgent.mode = MODE diff --git a/src/agents/types.ts b/src/agents/types.ts index 6692162b..14da69a1 100644 --- a/src/agents/types.ts +++ b/src/agents/types.ts @@ -72,6 +72,7 @@ export function isGptModel(model: string): boolean { export type BuiltinAgentName = | "sisyphus" + | "hephaestus" | "oracle" | "librarian" | "explore" diff --git a/src/agents/utils.test.ts b/src/agents/utils.test.ts index 71b1b7b7..2dddf320 100644 --- a/src/agents/utils.test.ts +++ b/src/agents/utils.test.ts @@ -1,61 +1,78 @@ -import { describe, test, expect, beforeEach, spyOn, afterEach } from "bun:test" +import { describe, test, expect, beforeEach, afterEach, spyOn } from "bun:test" import { createBuiltinAgents } from "./utils" import type { AgentConfig } from "@opencode-ai/sdk" import { clearSkillCache } from "../features/opencode-skill-loader/skill-content" import * as connectedProvidersCache from "../shared/connected-providers-cache" import * as modelAvailability from "../shared/model-availability" +import * as shared from "../shared" const TEST_DEFAULT_MODEL = "anthropic/claude-opus-4-5" describe("createBuiltinAgents with model overrides", () => { - test("Sisyphus with default model has thinking config", async () => { - // given - no overrides, using systemDefaultModel + test("Sisyphus with default model has thinking config when all models available", async () => { + // #given + const fetchSpy = spyOn(shared, "fetchAvailableModels").mockResolvedValue( + new Set([ + "anthropic/claude-opus-4-5", + "kimi-for-coding/k2p5", + "opencode/kimi-k2.5-free", + "zai-coding-plan/glm-4.7", + "opencode/glm-4.7-free", + ]) + ) - // when - const agents = await createBuiltinAgents([], {}, undefined, TEST_DEFAULT_MODEL) + try { + // #when + const agents = await createBuiltinAgents([], {}, undefined, TEST_DEFAULT_MODEL, undefined, undefined, [], {}) - // then - expect(agents.sisyphus.model).toBe("anthropic/claude-opus-4-5") - expect(agents.sisyphus.thinking).toEqual({ type: "enabled", budgetTokens: 32000 }) - expect(agents.sisyphus.reasoningEffort).toBeUndefined() + // #then + expect(agents.sisyphus.model).toBe("anthropic/claude-opus-4-5") + expect(agents.sisyphus.thinking).toEqual({ type: "enabled", budgetTokens: 32000 }) + expect(agents.sisyphus.reasoningEffort).toBeUndefined() + } finally { + fetchSpy.mockRestore() + } }) test("Sisyphus with GPT model override has reasoningEffort, no thinking", async () => { - // given + // #given const overrides = { sisyphus: { model: "github-copilot/gpt-5.2" }, } - // when + // #when const agents = await createBuiltinAgents([], overrides, undefined, TEST_DEFAULT_MODEL) - // then + // #then expect(agents.sisyphus.model).toBe("github-copilot/gpt-5.2") expect(agents.sisyphus.reasoningEffort).toBe("medium") expect(agents.sisyphus.thinking).toBeUndefined() }) - test("Sisyphus uses system default when no availableModels provided", async () => { - // given + test("Sisyphus is not created when no availableModels provided (requiresAnyModel)", async () => { + // #given const systemDefaultModel = "anthropic/claude-opus-4-5" + const fetchSpy = spyOn(shared, "fetchAvailableModels").mockResolvedValue(new Set()) - // when - const agents = await createBuiltinAgents([], {}, undefined, systemDefaultModel) + try { + // #when + const agents = await createBuiltinAgents([], {}, undefined, systemDefaultModel, undefined, undefined, [], {}) - // then - falls back to system default when no availability match - expect(agents.sisyphus.model).toBe("anthropic/claude-opus-4-5") - expect(agents.sisyphus.thinking).toEqual({ type: "enabled", budgetTokens: 32000 }) - expect(agents.sisyphus.reasoningEffort).toBeUndefined() + // #then + expect(agents.sisyphus).toBeUndefined() + } finally { + fetchSpy.mockRestore() + } }) test("Oracle uses connected provider fallback when availableModels is empty and cache exists", async () => { - // given - connected providers cache has "openai", which matches oracle's first fallback entry + // #given - connected providers cache has "openai", which matches oracle's first fallback entry const cacheSpy = spyOn(connectedProvidersCache, "readConnectedProvidersCache").mockReturnValue(["openai"]) - // when + // #when const agents = await createBuiltinAgents([], {}, undefined, TEST_DEFAULT_MODEL) - // then - oracle resolves via connected cache fallback to openai/gpt-5.2 (not system default) + // #then - oracle resolves via connected cache fallback to openai/gpt-5.2 (not system default) expect(agents.oracle.model).toBe("openai/gpt-5.2") expect(agents.oracle.reasoningEffort).toBe("medium") expect(agents.oracle.thinking).toBeUndefined() @@ -63,28 +80,28 @@ describe("createBuiltinAgents with model overrides", () => { }) test("Oracle created without model field when no cache exists (first run scenario)", async () => { - // given - no cache at all (first run) + // #given - no cache at all (first run) const cacheSpy = spyOn(connectedProvidersCache, "readConnectedProvidersCache").mockReturnValue(null) - // when + // #when const agents = await createBuiltinAgents([], {}, undefined, TEST_DEFAULT_MODEL) - // then - oracle should be created with system default model (fallback to systemDefaultModel) + // #then - oracle should be created with system default model (fallback to systemDefaultModel) expect(agents.oracle).toBeDefined() expect(agents.oracle.model).toBe(TEST_DEFAULT_MODEL) cacheSpy.mockRestore?.() }) test("Oracle with GPT model override has reasoningEffort, no thinking", async () => { - // given + // #given const overrides = { oracle: { model: "openai/gpt-5.2" }, } - // when + // #when const agents = await createBuiltinAgents([], overrides, undefined, TEST_DEFAULT_MODEL) - // then + // #then expect(agents.oracle.model).toBe("openai/gpt-5.2") expect(agents.oracle.reasoningEffort).toBe("medium") expect(agents.oracle.textVerbosity).toBe("high") @@ -92,15 +109,15 @@ describe("createBuiltinAgents with model overrides", () => { }) test("Oracle with Claude model override has thinking, no reasoningEffort", async () => { - // given + // #given const overrides = { oracle: { model: "anthropic/claude-sonnet-4" }, } - // when + // #when const agents = await createBuiltinAgents([], overrides, undefined, TEST_DEFAULT_MODEL) - // then + // #then expect(agents.oracle.model).toBe("anthropic/claude-sonnet-4") expect(agents.oracle.thinking).toEqual({ type: "enabled", budgetTokens: 32000 }) expect(agents.oracle.reasoningEffort).toBeUndefined() @@ -108,15 +125,15 @@ describe("createBuiltinAgents with model overrides", () => { }) test("non-model overrides are still applied after factory rebuild", async () => { - // given + // #given const overrides = { sisyphus: { model: "github-copilot/gpt-5.2", temperature: 0.5 }, } - // when + // #when const agents = await createBuiltinAgents([], overrides, undefined, TEST_DEFAULT_MODEL) - // then + // #then expect(agents.sisyphus.model).toBe("github-copilot/gpt-5.2") expect(agents.sisyphus.temperature).toBe(0.5) }) @@ -124,42 +141,197 @@ describe("createBuiltinAgents with model overrides", () => { describe("createBuiltinAgents without systemDefaultModel", () => { test("agents created via connected cache fallback even without systemDefaultModel", async () => { - // given - connected cache has "openai", which matches oracle's fallback chain + // #given - connected cache has "openai", which matches oracle's fallback chain const cacheSpy = spyOn(connectedProvidersCache, "readConnectedProvidersCache").mockReturnValue(["openai"]) - // when + // #when const agents = await createBuiltinAgents([], {}, undefined, undefined) - // then - connected cache enables model resolution despite no systemDefaultModel + // #then - connected cache enables model resolution despite no systemDefaultModel expect(agents.oracle).toBeDefined() expect(agents.oracle.model).toBe("openai/gpt-5.2") cacheSpy.mockRestore?.() }) test("agents NOT created when no cache and no systemDefaultModel (first run without defaults)", async () => { - // given + // #given const cacheSpy = spyOn(connectedProvidersCache, "readConnectedProvidersCache").mockReturnValue(null) - // when + // #when const agents = await createBuiltinAgents([], {}, undefined, undefined) - // then + // #then expect(agents.oracle).toBeUndefined() cacheSpy.mockRestore?.() }) - test("sisyphus created via connected cache fallback even without systemDefaultModel", async () => { - // given - connected cache has "anthropic", which matches sisyphus's first fallback entry - const cacheSpy = spyOn(connectedProvidersCache, "readConnectedProvidersCache").mockReturnValue(["anthropic"]) + test("sisyphus created via connected cache fallback when all providers available", async () => { + // #given + const cacheSpy = spyOn(connectedProvidersCache, "readConnectedProvidersCache").mockReturnValue([ + "anthropic", "kimi-for-coding", "opencode", "zai-coding-plan" + ]) + const fetchSpy = spyOn(shared, "fetchAvailableModels").mockResolvedValue( + new Set([ + "anthropic/claude-opus-4-5", + "kimi-for-coding/k2p5", + "opencode/kimi-k2.5-free", + "zai-coding-plan/glm-4.7", + "opencode/glm-4.7-free", + ]) + ) - // when - const agents = await createBuiltinAgents([], {}, undefined, undefined) + try { + // #when + const agents = await createBuiltinAgents([], {}, undefined, undefined, undefined, undefined, [], {}) - // then - connected cache enables model resolution despite no systemDefaultModel - expect(agents.sisyphus).toBeDefined() - expect(agents.sisyphus.model).toBe("anthropic/claude-opus-4-5") - cacheSpy.mockRestore?.() - }) + // #then + expect(agents.sisyphus).toBeDefined() + expect(agents.sisyphus.model).toBe("anthropic/claude-opus-4-5") + } finally { + cacheSpy.mockRestore() + fetchSpy.mockRestore() + } + }) +}) + +describe("createBuiltinAgents with requiresModel gating", () => { + test("hephaestus is not created when gpt-5.2-codex is unavailable", async () => { + // #given + const fetchSpy = spyOn(shared, "fetchAvailableModels").mockResolvedValue( + new Set(["anthropic/claude-opus-4-5"]) + ) + + try { + // #when + const agents = await createBuiltinAgents([], {}, undefined, TEST_DEFAULT_MODEL, undefined, undefined, [], {}) + + // #then + expect(agents.hephaestus).toBeUndefined() + } finally { + fetchSpy.mockRestore() + } + }) + + test("hephaestus is created when gpt-5.2-codex is available", async () => { + // #given + const fetchSpy = spyOn(shared, "fetchAvailableModels").mockResolvedValue( + new Set(["openai/gpt-5.2-codex"]) + ) + + try { + // #when + const agents = await createBuiltinAgents([], {}, undefined, TEST_DEFAULT_MODEL, undefined, undefined, [], {}) + + // #then + expect(agents.hephaestus).toBeDefined() + } finally { + fetchSpy.mockRestore() + } + }) + + test("hephaestus is not created when availableModels is empty", async () => { + // #given + const fetchSpy = spyOn(shared, "fetchAvailableModels").mockResolvedValue(new Set()) + + try { + // #when + const agents = await createBuiltinAgents([], {}, undefined, TEST_DEFAULT_MODEL, undefined, undefined, [], {}) + + // #then + expect(agents.hephaestus).toBeUndefined() + } finally { + fetchSpy.mockRestore() + } + }) + + test("hephaestus is created when explicit config provided even if model unavailable", async () => { + // #given + const fetchSpy = spyOn(shared, "fetchAvailableModels").mockResolvedValue( + new Set(["anthropic/claude-opus-4-5"]) + ) + const overrides = { + hephaestus: { model: "anthropic/claude-opus-4-5" }, + } + + try { + // #when + const agents = await createBuiltinAgents([], overrides, undefined, TEST_DEFAULT_MODEL, undefined, undefined, [], {}) + + // #then + expect(agents.hephaestus).toBeDefined() + } finally { + fetchSpy.mockRestore() + } + }) +}) + +describe("createBuiltinAgents with requiresAnyModel gating (sisyphus)", () => { + test("sisyphus is created when at least one fallback model is available", async () => { + // #given + const fetchSpy = spyOn(shared, "fetchAvailableModels").mockResolvedValue( + new Set(["anthropic/claude-opus-4-5"]) + ) + + try { + // #when + const agents = await createBuiltinAgents([], {}, undefined, TEST_DEFAULT_MODEL, undefined, undefined, [], {}) + + // #then + expect(agents.sisyphus).toBeDefined() + } finally { + fetchSpy.mockRestore() + } + }) + + test("sisyphus is not created when availableModels is empty", async () => { + // #given + const fetchSpy = spyOn(shared, "fetchAvailableModels").mockResolvedValue(new Set()) + + try { + // #when + const agents = await createBuiltinAgents([], {}, undefined, TEST_DEFAULT_MODEL, undefined, undefined, [], {}) + + // #then + expect(agents.sisyphus).toBeUndefined() + } finally { + fetchSpy.mockRestore() + } + }) + + test("sisyphus is created when explicit config provided even if no models available", async () => { + // #given + const fetchSpy = spyOn(shared, "fetchAvailableModels").mockResolvedValue(new Set()) + const overrides = { + sisyphus: { model: "anthropic/claude-opus-4-5" }, + } + + try { + // #when + const agents = await createBuiltinAgents([], overrides, undefined, TEST_DEFAULT_MODEL, undefined, undefined, [], {}) + + // #then + expect(agents.sisyphus).toBeDefined() + } finally { + fetchSpy.mockRestore() + } + }) + + test("sisyphus is not created when no fallback model is available (unrelated model only)", async () => { + // #given - only openai/gpt-5.2 available, not in sisyphus fallback chain + const fetchSpy = spyOn(shared, "fetchAvailableModels").mockResolvedValue( + new Set(["openai/gpt-5.2"]) + ) + + try { + // #when + const agents = await createBuiltinAgents([], {}, undefined, TEST_DEFAULT_MODEL, undefined, undefined, [], {}) + + // #then + expect(agents.sisyphus).toBeUndefined() + } finally { + fetchSpy.mockRestore() + } + }) }) describe("buildAgent with category and skills", () => { @@ -170,8 +342,12 @@ describe("buildAgent with category and skills", () => { clearSkillCache() }) + afterEach(() => { + clearSkillCache() + }) + test("agent with category inherits category settings", () => { - // given - agent factory that sets category but no model + // #given - agent factory that sets category but no model const source = { "test-agent": () => ({ @@ -180,15 +356,15 @@ describe("buildAgent with category and skills", () => { }) as AgentConfig, } - // when + // #when const agent = buildAgent(source["test-agent"], TEST_MODEL) - // then - category's built-in model is applied + // #then - category's built-in model is applied expect(agent.model).toBe("google/gemini-3-pro") }) test("agent with category and existing model keeps existing model", () => { - // given + // #given const source = { "test-agent": () => ({ @@ -198,15 +374,15 @@ describe("buildAgent with category and skills", () => { }) as AgentConfig, } - // when + // #when const agent = buildAgent(source["test-agent"], TEST_MODEL) - // then - explicit model takes precedence over category + // #then - explicit model takes precedence over category expect(agent.model).toBe("custom/model") }) test("agent with category inherits variant", () => { - // given + // #given const source = { "test-agent": () => ({ @@ -222,16 +398,16 @@ describe("buildAgent with category and skills", () => { }, } - // when + // #when const agent = buildAgent(source["test-agent"], TEST_MODEL, categories) - // then + // #then expect(agent.model).toBe("openai/gpt-5.2") expect(agent.variant).toBe("xhigh") }) test("agent with skills has content prepended to prompt", () => { - // given + // #given const source = { "test-agent": () => ({ @@ -241,17 +417,17 @@ describe("buildAgent with category and skills", () => { }) as AgentConfig, } - // when + // #when const agent = buildAgent(source["test-agent"], TEST_MODEL) - // then + // #then expect(agent.prompt).toContain("Role: Designer-Turned-Developer") expect(agent.prompt).toContain("Original prompt content") expect(agent.prompt).toMatch(/Designer-Turned-Developer[\s\S]*Original prompt content/s) }) test("agent with multiple skills has all content prepended", () => { - // given + // #given const source = { "test-agent": () => ({ @@ -261,16 +437,16 @@ describe("buildAgent with category and skills", () => { }) as AgentConfig, } - // when + // #when const agent = buildAgent(source["test-agent"], TEST_MODEL) - // then + // #then expect(agent.prompt).toContain("Role: Designer-Turned-Developer") expect(agent.prompt).toContain("Agent prompt") }) test("agent without category or skills works as before", () => { - // given + // #given const source = { "test-agent": () => ({ @@ -281,17 +457,17 @@ describe("buildAgent with category and skills", () => { }) as AgentConfig, } - // when + // #when const agent = buildAgent(source["test-agent"], TEST_MODEL) - // then + // #then expect(agent.model).toBe("custom/model") expect(agent.temperature).toBe(0.5) expect(agent.prompt).toBe("Base prompt") }) test("agent with category and skills applies both", () => { - // given + // #given const source = { "test-agent": () => ({ @@ -302,10 +478,10 @@ describe("buildAgent with category and skills", () => { }) as AgentConfig, } - // when + // #when const agent = buildAgent(source["test-agent"], TEST_MODEL) - // then - category's built-in model and skills are applied + // #then - category's built-in model and skills are applied expect(agent.model).toBe("openai/gpt-5.2-codex") expect(agent.variant).toBe("xhigh") expect(agent.prompt).toContain("Role: Designer-Turned-Developer") @@ -313,7 +489,7 @@ describe("buildAgent with category and skills", () => { }) test("agent with non-existent category has no effect", () => { - // given + // #given const source = { "test-agent": () => ({ @@ -323,10 +499,10 @@ describe("buildAgent with category and skills", () => { }) as AgentConfig, } - // when + // #when const agent = buildAgent(source["test-agent"], TEST_MODEL) - // then + // #then // Note: The factory receives model, but if category doesn't exist, it's not applied // The agent's model comes from the factory output (which doesn't set model) expect(agent.model).toBeUndefined() @@ -334,7 +510,7 @@ describe("buildAgent with category and skills", () => { }) test("agent with non-existent skills only prepends found ones", () => { - // given + // #given const source = { "test-agent": () => ({ @@ -344,16 +520,16 @@ describe("buildAgent with category and skills", () => { }) as AgentConfig, } - // when + // #when const agent = buildAgent(source["test-agent"], TEST_MODEL) - // then + // #then expect(agent.prompt).toContain("Role: Designer-Turned-Developer") expect(agent.prompt).toContain("Base prompt") }) test("agent with empty skills array keeps original prompt", () => { - // given + // #given const source = { "test-agent": () => ({ @@ -363,15 +539,15 @@ describe("buildAgent with category and skills", () => { }) as AgentConfig, } - // when + // #when const agent = buildAgent(source["test-agent"], TEST_MODEL) - // then + // #then expect(agent.prompt).toBe("Base prompt") }) test("agent with agent-browser skill resolves when browserProvider is set", () => { - // given + // #given const source = { "test-agent": () => ({ @@ -381,16 +557,16 @@ describe("buildAgent with category and skills", () => { }) as AgentConfig, } - // when - browserProvider is "agent-browser" + // #when - browserProvider is "agent-browser" const agent = buildAgent(source["test-agent"], TEST_MODEL, undefined, undefined, "agent-browser") - // then - agent-browser skill content should be in prompt + // #then - agent-browser skill content should be in prompt expect(agent.prompt).toContain("agent-browser") expect(agent.prompt).toContain("Base prompt") }) test("agent with agent-browser skill NOT resolved when browserProvider not set", () => { - // given + // #given const source = { "test-agent": () => ({ @@ -400,10 +576,10 @@ describe("buildAgent with category and skills", () => { }) as AgentConfig, } - // when - no browserProvider (defaults to playwright) + // #when - no browserProvider (defaults to playwright) const agent = buildAgent(source["test-agent"], TEST_MODEL) - // then - agent-browser skill not found, only base prompt remains + // #then - agent-browser skill not found, only base prompt remains expect(agent.prompt).toBe("Base prompt") expect(agent.prompt).not.toContain("agent-browser open") }) @@ -411,36 +587,36 @@ describe("buildAgent with category and skills", () => { describe("override.category expansion in createBuiltinAgents", () => { test("standard agent override with category expands category properties", async () => { - // given + // #given const overrides = { oracle: { category: "ultrabrain" } as any, } - // when + // #when const agents = await createBuiltinAgents([], overrides, undefined, TEST_DEFAULT_MODEL) - // then - ultrabrain category: model=openai/gpt-5.2-codex, variant=xhigh + // #then - ultrabrain category: model=openai/gpt-5.2-codex, variant=xhigh expect(agents.oracle).toBeDefined() expect(agents.oracle.model).toBe("openai/gpt-5.2-codex") expect(agents.oracle.variant).toBe("xhigh") }) test("standard agent override with category AND direct variant - direct wins", async () => { - // given - ultrabrain has variant=xhigh, but direct override says "max" + // #given - ultrabrain has variant=xhigh, but direct override says "max" const overrides = { oracle: { category: "ultrabrain", variant: "max" } as any, } - // when + // #when const agents = await createBuiltinAgents([], overrides, undefined, TEST_DEFAULT_MODEL) - // then - direct variant overrides category variant + // #then - direct variant overrides category variant expect(agents.oracle).toBeDefined() expect(agents.oracle.variant).toBe("max") }) test("standard agent override with category AND direct reasoningEffort - direct wins", async () => { - // given - custom category has reasoningEffort=xhigh, direct override says "low" + // #given - custom category has reasoningEffort=xhigh, direct override says "low" const categories = { "test-cat": { model: "openai/gpt-5.2", @@ -451,16 +627,16 @@ describe("override.category expansion in createBuiltinAgents", () => { oracle: { category: "test-cat", reasoningEffort: "low" } as any, } - // when + // #when const agents = await createBuiltinAgents([], overrides, undefined, TEST_DEFAULT_MODEL, categories) - // then - direct reasoningEffort wins over category + // #then - direct reasoningEffort wins over category expect(agents.oracle).toBeDefined() expect(agents.oracle.reasoningEffort).toBe("low") }) test("standard agent override with category applies reasoningEffort from category when no direct override", async () => { - // given - custom category has reasoningEffort, no direct reasoningEffort in override + // #given - custom category has reasoningEffort, no direct reasoningEffort in override const categories = { "reasoning-cat": { model: "openai/gpt-5.2", @@ -471,54 +647,54 @@ describe("override.category expansion in createBuiltinAgents", () => { oracle: { category: "reasoning-cat" } as any, } - // when + // #when const agents = await createBuiltinAgents([], overrides, undefined, TEST_DEFAULT_MODEL, categories) - // then - category reasoningEffort is applied + // #then - category reasoningEffort is applied expect(agents.oracle).toBeDefined() expect(agents.oracle.reasoningEffort).toBe("high") }) test("sisyphus override with category expands category properties", async () => { - // given + // #given const overrides = { sisyphus: { category: "ultrabrain" } as any, } - // when + // #when const agents = await createBuiltinAgents([], overrides, undefined, TEST_DEFAULT_MODEL) - // then - ultrabrain category: model=openai/gpt-5.2-codex, variant=xhigh + // #then - ultrabrain category: model=openai/gpt-5.2-codex, variant=xhigh expect(agents.sisyphus).toBeDefined() expect(agents.sisyphus.model).toBe("openai/gpt-5.2-codex") expect(agents.sisyphus.variant).toBe("xhigh") }) test("atlas override with category expands category properties", async () => { - // given + // #given const overrides = { atlas: { category: "ultrabrain" } as any, } - // when + // #when const agents = await createBuiltinAgents([], overrides, undefined, TEST_DEFAULT_MODEL) - // then - ultrabrain category: model=openai/gpt-5.2-codex, variant=xhigh + // #then - ultrabrain category: model=openai/gpt-5.2-codex, variant=xhigh expect(agents.atlas).toBeDefined() expect(agents.atlas.model).toBe("openai/gpt-5.2-codex") expect(agents.atlas.variant).toBe("xhigh") }) test("override with non-existent category has no effect on config", async () => { - // given + // #given const overrides = { oracle: { category: "non-existent-category" } as any, } - // when + // #when const agents = await createBuiltinAgents([], overrides, undefined, TEST_DEFAULT_MODEL) - // then - no category-specific variant/reasoningEffort applied from non-existent category + // #then - no category-specific variant/reasoningEffort applied from non-existent category expect(agents.oracle).toBeDefined() const agentsWithoutOverride = await createBuiltinAgents([], {}, undefined, TEST_DEFAULT_MODEL) expect(agents.oracle.model).toBe(agentsWithoutOverride.oracle.model) @@ -527,7 +703,7 @@ describe("override.category expansion in createBuiltinAgents", () => { describe("Deadlock prevention - fetchAvailableModels must not receive client", () => { test("createBuiltinAgents should call fetchAvailableModels with undefined client to prevent deadlock", async () => { - // given - This test ensures we don't regress on issue #1301 + // #given - This test ensures we don't regress on issue #1301 // Passing client to fetchAvailableModels during createBuiltinAgents (called from config handler) // causes deadlock: // - Plugin init waits for server response (client.provider.list()) @@ -540,7 +716,7 @@ describe("Deadlock prevention - fetchAvailableModels must not receive client", ( model: { list: () => Promise.resolve({ data: [] }) }, } - // when - Even when client is provided, fetchAvailableModels must be called with undefined + // #when - Even when client is provided, fetchAvailableModels must be called with undefined await createBuiltinAgents( [], {}, @@ -552,7 +728,7 @@ describe("Deadlock prevention - fetchAvailableModels must not receive client", ( mockClient // client is passed but should NOT be forwarded to fetchAvailableModels ) - // then - fetchAvailableModels must be called with undefined as first argument (no client) + // #then - fetchAvailableModels must be called with undefined as first argument (no client) // This prevents the deadlock described in issue #1301 expect(fetchSpy).toHaveBeenCalled() const firstCallArgs = fetchSpy.mock.calls[0] diff --git a/src/agents/utils.ts b/src/agents/utils.ts index d4a80d94..bddc12b0 100644 --- a/src/agents/utils.ts +++ b/src/agents/utils.ts @@ -9,8 +9,9 @@ import { createMultimodalLookerAgent, MULTIMODAL_LOOKER_PROMPT_METADATA } from " import { createMetisAgent, metisPromptMetadata } from "./metis" import { createAtlasAgent, atlasPromptMetadata } from "./atlas" import { createMomusAgent, momusPromptMetadata } from "./momus" +import { createHephaestusAgent } from "./hephaestus" import type { AvailableAgent, AvailableCategory, AvailableSkill } from "./dynamic-agent-prompt-builder" -import { deepMerge, fetchAvailableModels, resolveModelPipeline, AGENT_MODEL_REQUIREMENTS, readConnectedProvidersCache, isModelAvailable } from "../shared" +import { deepMerge, fetchAvailableModels, resolveModelPipeline, AGENT_MODEL_REQUIREMENTS, readConnectedProvidersCache, isModelAvailable, isAnyFallbackModelAvailable } from "../shared" import { DEFAULT_CATEGORIES, CATEGORY_DESCRIPTIONS } from "../tools/delegate-task/constants" import { resolveMultipleSkills } from "../features/opencode-skill-loader/skill-content" import { createBuiltinSkills } from "../features/builtin-skills" @@ -21,6 +22,7 @@ type AgentSource = AgentFactory | AgentConfig const agentSources: Record = { sisyphus: createSisyphusAgent, + hephaestus: createHephaestusAgent, oracle: createOracleAgent, librarian: createLibrarianAgent, explore: createExploreAgent, @@ -260,10 +262,14 @@ export async function createBuiltinAgents( const availableSkills: AvailableSkill[] = [...builtinAvailable, ...discoveredAvailable] + // Collect general agents first (for availableAgents), but don't add to result yet + const pendingAgentConfigs: Map = new Map() + for (const [name, source] of Object.entries(agentSources)) { const agentName = name as BuiltinAgentName if (agentName === "sisyphus") continue + if (agentName === "hephaestus") continue if (agentName === "atlas") continue if (disabledAgents.some((name) => name.toLowerCase() === agentName.toLowerCase())) continue @@ -309,7 +315,8 @@ export async function createBuiltinAgents( config = applyOverrides(config, override, mergedCategories) - result[name] = config + // Store for later - will be added after sisyphus and hephaestus + pendingAgentConfigs.set(name, config) const metadata = agentMetadata[agentName] if (metadata) { @@ -321,10 +328,15 @@ export async function createBuiltinAgents( } } - if (!disabledAgents.includes("sisyphus")) { - const sisyphusOverride = agentOverrides["sisyphus"] - const sisyphusRequirement = AGENT_MODEL_REQUIREMENTS["sisyphus"] - + const sisyphusOverride = agentOverrides["sisyphus"] + const sisyphusRequirement = AGENT_MODEL_REQUIREMENTS["sisyphus"] + const hasSisyphusExplicitConfig = sisyphusOverride !== undefined + const meetsSisyphusAnyModelRequirement = + !sisyphusRequirement?.requiresAnyModel || + hasSisyphusExplicitConfig || + isAnyFallbackModelAvailable(sisyphusRequirement.fallbackChain, availableModels) + + if (!disabledAgents.includes("sisyphus") && meetsSisyphusAnyModelRequirement) { const sisyphusResolution = applyModelResolution({ uiSelectedModel, userModel: sisyphusOverride?.model, @@ -355,6 +367,61 @@ export async function createBuiltinAgents( } } + if (!disabledAgents.includes("hephaestus")) { + const hephaestusOverride = agentOverrides["hephaestus"] + const hephaestusRequirement = AGENT_MODEL_REQUIREMENTS["hephaestus"] + const hasHephaestusExplicitConfig = hephaestusOverride !== undefined + + const hasRequiredModel = + !hephaestusRequirement?.requiresModel || + hasHephaestusExplicitConfig || + (availableModels.size > 0 && isModelAvailable(hephaestusRequirement.requiresModel, availableModels)) + + if (hasRequiredModel) { + const hephaestusResolution = applyModelResolution({ + userModel: hephaestusOverride?.model, + requirement: hephaestusRequirement, + availableModels, + systemDefaultModel, + }) + + if (hephaestusResolution) { + const { model: hephaestusModel, variant: hephaestusResolvedVariant } = hephaestusResolution + + let hephaestusConfig = createHephaestusAgent( + hephaestusModel, + availableAgents, + undefined, + availableSkills, + availableCategories + ) + + hephaestusConfig = { ...hephaestusConfig, variant: hephaestusResolvedVariant ?? "medium" } + + const hepOverrideCategory = (hephaestusOverride as Record | undefined)?.category as string | undefined + if (hepOverrideCategory) { + hephaestusConfig = applyCategoryOverride(hephaestusConfig, hepOverrideCategory, mergedCategories) + } + + if (directory && hephaestusConfig.prompt) { + const envContext = createEnvContext() + hephaestusConfig = { ...hephaestusConfig, prompt: hephaestusConfig.prompt + envContext } + } + + if (hephaestusOverride) { + hephaestusConfig = mergeAgentConfig(hephaestusConfig, hephaestusOverride) + } + + result["hephaestus"] = hephaestusConfig + } + } + } + + // Add pending agents after sisyphus and hephaestus to maintain order + for (const [name, config] of pendingAgentConfigs) { + result[name] = config + } + if (!disabledAgents.includes("atlas")) { const orchestratorOverride = agentOverrides["atlas"] const atlasRequirement = AGENT_MODEL_REQUIREMENTS["atlas"] diff --git a/src/cli/AGENTS.md b/src/cli/AGENTS.md index 4adc3a0d..7b951faa 100644 --- a/src/cli/AGENTS.md +++ b/src/cli/AGENTS.md @@ -2,15 +2,17 @@ ## OVERVIEW -CLI entry: `bunx oh-my-opencode`. Interactive installer, doctor diagnostics. Commander.js + @clack/prompts. +CLI entry: `bunx oh-my-opencode`. 4 commands with Commander.js + @clack/prompts TUI. + +**Commands**: install (interactive setup), doctor (14 health checks), run (session launcher), get-local-version ## STRUCTURE ``` cli/ ├── index.ts # Commander.js entry (4 commands) -├── install.ts # Interactive TUI (520 lines) -├── config-manager.ts # JSONC parsing (664 lines) +├── install.ts # Interactive TUI (542 lines) +├── config-manager.ts # JSONC parsing (667 lines) ├── types.ts # InstallArgs, InstallConfig ├── model-fallback.ts # Model fallback configuration ├── doctor/ @@ -19,7 +21,7 @@ cli/ │ ├── formatter.ts # Colored output │ ├── constants.ts # Check IDs, symbols │ ├── types.ts # CheckResult, CheckDefinition (114 lines) -│ └── checks/ # 14 checks, 21 files +│ └── checks/ # 14 checks, 23 files │ ├── version.ts # OpenCode + plugin version │ ├── config.ts # JSONC validity, Zod │ ├── auth.ts # Anthropic, OpenAI, Google @@ -30,6 +32,8 @@ cli/ │ └── gh.ts # GitHub CLI ├── run/ │ └── index.ts # Session launcher +├── mcp-oauth/ +│ └── index.ts # MCP OAuth flow └── get-local-version/ └── index.ts # Version detection ``` diff --git a/src/cli/__snapshots__/model-fallback.test.ts.snap b/src/cli/__snapshots__/model-fallback.test.ts.snap index 8b7198e0..aece5116 100644 --- a/src/cli/__snapshots__/model-fallback.test.ts.snap +++ b/src/cli/__snapshots__/model-fallback.test.ts.snap @@ -10,6 +10,9 @@ exports[`generateModelConfig no providers available returns ULTIMATE_FALLBACK fo "explore": { "model": "opencode/glm-4.7-free", }, + "hephaestus": { + "model": "opencode/glm-4.7-free", + }, "librarian": { "model": "opencode/glm-4.7-free", }, @@ -28,9 +31,6 @@ exports[`generateModelConfig no providers available returns ULTIMATE_FALLBACK fo "prometheus": { "model": "opencode/glm-4.7-free", }, - "sisyphus": { - "model": "opencode/glm-4.7-free", - }, }, "categories": { "artistry": { @@ -94,18 +94,11 @@ exports[`generateModelConfig single native provider uses Claude models when only "variant": "max", }, "sisyphus": { - "model": "anthropic/claude-sonnet-4-5", + "model": "anthropic/claude-opus-4-5", + "variant": "max", }, }, "categories": { - "artistry": { - "model": "anthropic/claude-opus-4-5", - "variant": "max", - }, - "deep": { - "model": "anthropic/claude-opus-4-5", - "variant": "max", - }, "quick": { "model": "anthropic/claude-haiku-4-5", }, @@ -168,14 +161,6 @@ exports[`generateModelConfig single native provider uses Claude models with isMa }, }, "categories": { - "artistry": { - "model": "anthropic/claude-opus-4-5", - "variant": "max", - }, - "deep": { - "model": "anthropic/claude-opus-4-5", - "variant": "max", - }, "quick": { "model": "anthropic/claude-haiku-4-5", }, @@ -211,6 +196,10 @@ exports[`generateModelConfig single native provider uses OpenAI models when only "explore": { "model": "opencode/gpt-5-nano", }, + "hephaestus": { + "model": "openai/gpt-5.2-codex", + "variant": "medium", + }, "librarian": { "model": "opencode/glm-4.7-free", }, @@ -233,15 +222,8 @@ exports[`generateModelConfig single native provider uses OpenAI models when only "model": "openai/gpt-5.2", "variant": "high", }, - "sisyphus": { - "model": "openai/gpt-5.2", - "variant": "high", - }, }, "categories": { - "artistry": { - "model": "openai/gpt-5.2", - }, "deep": { "model": "openai/gpt-5.2-codex", "variant": "medium", @@ -281,6 +263,10 @@ exports[`generateModelConfig single native provider uses OpenAI models with isMa "explore": { "model": "opencode/gpt-5-nano", }, + "hephaestus": { + "model": "openai/gpt-5.2-codex", + "variant": "medium", + }, "librarian": { "model": "opencode/glm-4.7-free", }, @@ -303,15 +289,8 @@ exports[`generateModelConfig single native provider uses OpenAI models with isMa "model": "openai/gpt-5.2", "variant": "high", }, - "sisyphus": { - "model": "openai/gpt-5.2-codex", - "variant": "medium", - }, }, "categories": { - "artistry": { - "model": "openai/gpt-5.2", - }, "deep": { "model": "openai/gpt-5.2-codex", "variant": "medium", @@ -372,19 +351,12 @@ exports[`generateModelConfig single native provider uses Gemini models when only "prometheus": { "model": "google/gemini-3-pro", }, - "sisyphus": { - "model": "google/gemini-3-pro", - }, }, "categories": { "artistry": { "model": "google/gemini-3-pro", "variant": "max", }, - "deep": { - "model": "google/gemini-3-pro", - "variant": "max", - }, "quick": { "model": "google/gemini-3-flash", }, @@ -439,19 +411,12 @@ exports[`generateModelConfig single native provider uses Gemini models with isMa "prometheus": { "model": "google/gemini-3-pro", }, - "sisyphus": { - "model": "google/gemini-3-pro", - }, }, "categories": { "artistry": { "model": "google/gemini-3-pro", "variant": "max", }, - "deep": { - "model": "google/gemini-3-pro", - "variant": "max", - }, "quick": { "model": "google/gemini-3-flash", }, @@ -485,6 +450,10 @@ exports[`generateModelConfig all native providers uses preferred models from fal "explore": { "model": "anthropic/claude-haiku-4-5", }, + "hephaestus": { + "model": "openai/gpt-5.2-codex", + "variant": "medium", + }, "librarian": { "model": "anthropic/claude-sonnet-4-5", }, @@ -508,7 +477,8 @@ exports[`generateModelConfig all native providers uses preferred models from fal "variant": "max", }, "sisyphus": { - "model": "anthropic/claude-sonnet-4-5", + "model": "anthropic/claude-opus-4-5", + "variant": "max", }, }, "categories": { @@ -553,6 +523,10 @@ exports[`generateModelConfig all native providers uses preferred models with isM "explore": { "model": "anthropic/claude-haiku-4-5", }, + "hephaestus": { + "model": "openai/gpt-5.2-codex", + "variant": "medium", + }, "librarian": { "model": "anthropic/claude-sonnet-4-5", }, @@ -623,6 +597,10 @@ exports[`generateModelConfig fallback providers uses OpenCode Zen models when on "explore": { "model": "opencode/claude-haiku-4-5", }, + "hephaestus": { + "model": "opencode/gpt-5.2-codex", + "variant": "medium", + }, "librarian": { "model": "opencode/glm-4.7-free", }, @@ -646,7 +624,8 @@ exports[`generateModelConfig fallback providers uses OpenCode Zen models when on "variant": "max", }, "sisyphus": { - "model": "opencode/claude-sonnet-4-5", + "model": "opencode/claude-opus-4-5", + "variant": "max", }, }, "categories": { @@ -691,6 +670,10 @@ exports[`generateModelConfig fallback providers uses OpenCode Zen models with is "explore": { "model": "opencode/claude-haiku-4-5", }, + "hephaestus": { + "model": "opencode/gpt-5.2-codex", + "variant": "medium", + }, "librarian": { "model": "opencode/glm-4.7-free", }, @@ -761,6 +744,10 @@ exports[`generateModelConfig fallback providers uses GitHub Copilot models when "explore": { "model": "github-copilot/gpt-5-mini", }, + "hephaestus": { + "model": "github-copilot/gpt-5.2-codex", + "variant": "medium", + }, "librarian": { "model": "github-copilot/claude-sonnet-4.5", }, @@ -784,7 +771,8 @@ exports[`generateModelConfig fallback providers uses GitHub Copilot models when "variant": "max", }, "sisyphus": { - "model": "github-copilot/claude-sonnet-4.5", + "model": "github-copilot/claude-opus-4.5", + "variant": "max", }, }, "categories": { @@ -829,6 +817,10 @@ exports[`generateModelConfig fallback providers uses GitHub Copilot models with "explore": { "model": "github-copilot/gpt-5-mini", }, + "hephaestus": { + "model": "github-copilot/gpt-5.2-codex", + "variant": "medium", + }, "librarian": { "model": "github-copilot/claude-sonnet-4.5", }, @@ -918,16 +910,10 @@ exports[`generateModelConfig fallback providers uses ZAI model for librarian whe "model": "opencode/glm-4.7-free", }, "sisyphus": { - "model": "opencode/glm-4.7-free", + "model": "zai-coding-plan/glm-4.7", }, }, "categories": { - "artistry": { - "model": "opencode/glm-4.7-free", - }, - "deep": { - "model": "opencode/glm-4.7-free", - }, "quick": { "model": "opencode/glm-4.7-free", }, @@ -983,12 +969,6 @@ exports[`generateModelConfig fallback providers uses ZAI model for librarian wit }, }, "categories": { - "artistry": { - "model": "opencode/glm-4.7-free", - }, - "deep": { - "model": "opencode/glm-4.7-free", - }, "quick": { "model": "opencode/glm-4.7-free", }, @@ -1021,6 +1001,10 @@ exports[`generateModelConfig mixed provider scenarios uses Claude + OpenCode Zen "explore": { "model": "anthropic/claude-haiku-4-5", }, + "hephaestus": { + "model": "opencode/gpt-5.2-codex", + "variant": "medium", + }, "librarian": { "model": "opencode/glm-4.7-free", }, @@ -1044,7 +1028,8 @@ exports[`generateModelConfig mixed provider scenarios uses Claude + OpenCode Zen "variant": "max", }, "sisyphus": { - "model": "anthropic/claude-sonnet-4-5", + "model": "anthropic/claude-opus-4-5", + "variant": "max", }, }, "categories": { @@ -1089,6 +1074,10 @@ exports[`generateModelConfig mixed provider scenarios uses OpenAI + Copilot comb "explore": { "model": "github-copilot/gpt-5-mini", }, + "hephaestus": { + "model": "openai/gpt-5.2-codex", + "variant": "medium", + }, "librarian": { "model": "github-copilot/claude-sonnet-4.5", }, @@ -1112,7 +1101,8 @@ exports[`generateModelConfig mixed provider scenarios uses OpenAI + Copilot comb "variant": "max", }, "sisyphus": { - "model": "github-copilot/claude-sonnet-4.5", + "model": "github-copilot/claude-opus-4.5", + "variant": "max", }, }, "categories": { @@ -1180,18 +1170,11 @@ exports[`generateModelConfig mixed provider scenarios uses Claude + ZAI combinat "variant": "max", }, "sisyphus": { - "model": "anthropic/claude-sonnet-4-5", + "model": "anthropic/claude-opus-4-5", + "variant": "max", }, }, "categories": { - "artistry": { - "model": "anthropic/claude-opus-4-5", - "variant": "max", - }, - "deep": { - "model": "anthropic/claude-opus-4-5", - "variant": "max", - }, "quick": { "model": "anthropic/claude-haiku-4-5", }, @@ -1249,7 +1232,8 @@ exports[`generateModelConfig mixed provider scenarios uses Gemini + Claude combi "variant": "max", }, "sisyphus": { - "model": "anthropic/claude-sonnet-4-5", + "model": "anthropic/claude-opus-4-5", + "variant": "max", }, }, "categories": { @@ -1257,10 +1241,6 @@ exports[`generateModelConfig mixed provider scenarios uses Gemini + Claude combi "model": "google/gemini-3-pro", "variant": "max", }, - "deep": { - "model": "anthropic/claude-opus-4-5", - "variant": "max", - }, "quick": { "model": "anthropic/claude-haiku-4-5", }, @@ -1294,6 +1274,10 @@ exports[`generateModelConfig mixed provider scenarios uses all fallback provider "explore": { "model": "opencode/claude-haiku-4-5", }, + "hephaestus": { + "model": "github-copilot/gpt-5.2-codex", + "variant": "medium", + }, "librarian": { "model": "zai-coding-plan/glm-4.7", }, @@ -1317,7 +1301,8 @@ exports[`generateModelConfig mixed provider scenarios uses all fallback provider "variant": "max", }, "sisyphus": { - "model": "github-copilot/claude-sonnet-4.5", + "model": "github-copilot/claude-opus-4.5", + "variant": "max", }, }, "categories": { @@ -1362,6 +1347,10 @@ exports[`generateModelConfig mixed provider scenarios uses all providers togethe "explore": { "model": "anthropic/claude-haiku-4-5", }, + "hephaestus": { + "model": "openai/gpt-5.2-codex", + "variant": "medium", + }, "librarian": { "model": "zai-coding-plan/glm-4.7", }, @@ -1385,7 +1374,8 @@ exports[`generateModelConfig mixed provider scenarios uses all providers togethe "variant": "max", }, "sisyphus": { - "model": "anthropic/claude-sonnet-4-5", + "model": "anthropic/claude-opus-4-5", + "variant": "max", }, }, "categories": { @@ -1430,6 +1420,10 @@ exports[`generateModelConfig mixed provider scenarios uses all providers with is "explore": { "model": "anthropic/claude-haiku-4-5", }, + "hephaestus": { + "model": "openai/gpt-5.2-codex", + "variant": "medium", + }, "librarian": { "model": "zai-coding-plan/glm-4.7", }, diff --git a/src/cli/config-manager.test.ts b/src/cli/config-manager.test.ts index 3870972f..ee2bd560 100644 --- a/src/cli/config-manager.test.ts +++ b/src/cli/config-manager.test.ts @@ -11,7 +11,7 @@ describe("getPluginNameWithVersion", () => { }) test("returns @latest when current version matches latest tag", async () => { - // given npm dist-tags with latest=2.14.0 + // #given npm dist-tags with latest=2.14.0 globalThis.fetch = mock(() => Promise.resolve({ ok: true, @@ -19,15 +19,15 @@ describe("getPluginNameWithVersion", () => { } as Response) ) as unknown as typeof fetch - // when current version is 2.14.0 + // #when current version is 2.14.0 const result = await getPluginNameWithVersion("2.14.0") - // then should use @latest tag + // #then should use @latest tag expect(result).toBe("oh-my-opencode@latest") }) test("returns @beta when current version matches beta tag", async () => { - // given npm dist-tags with beta=3.0.0-beta.3 + // #given npm dist-tags with beta=3.0.0-beta.3 globalThis.fetch = mock(() => Promise.resolve({ ok: true, @@ -35,15 +35,15 @@ describe("getPluginNameWithVersion", () => { } as Response) ) as unknown as typeof fetch - // when current version is 3.0.0-beta.3 + // #when current version is 3.0.0-beta.3 const result = await getPluginNameWithVersion("3.0.0-beta.3") - // then should use @beta tag + // #then should use @beta tag expect(result).toBe("oh-my-opencode@beta") }) test("returns @next when current version matches next tag", async () => { - // given npm dist-tags with next=3.1.0-next.1 + // #given npm dist-tags with next=3.1.0-next.1 globalThis.fetch = mock(() => Promise.resolve({ ok: true, @@ -51,15 +51,15 @@ describe("getPluginNameWithVersion", () => { } as Response) ) as unknown as typeof fetch - // when current version is 3.1.0-next.1 + // #when current version is 3.1.0-next.1 const result = await getPluginNameWithVersion("3.1.0-next.1") - // then should use @next tag + // #then should use @next tag expect(result).toBe("oh-my-opencode@next") }) test("returns pinned version when no tag matches", async () => { - // given npm dist-tags with beta=3.0.0-beta.3 + // #given npm dist-tags with beta=3.0.0-beta.3 globalThis.fetch = mock(() => Promise.resolve({ ok: true, @@ -67,26 +67,26 @@ describe("getPluginNameWithVersion", () => { } as Response) ) as unknown as typeof fetch - // when current version is old beta 3.0.0-beta.2 + // #when current version is old beta 3.0.0-beta.2 const result = await getPluginNameWithVersion("3.0.0-beta.2") - // then should pin to specific version + // #then should pin to specific version expect(result).toBe("oh-my-opencode@3.0.0-beta.2") }) test("returns pinned version when fetch fails", async () => { - // given network failure + // #given network failure globalThis.fetch = mock(() => Promise.reject(new Error("Network error"))) as unknown as typeof fetch - // when current version is 3.0.0-beta.3 + // #when current version is 3.0.0-beta.3 const result = await getPluginNameWithVersion("3.0.0-beta.3") - // then should fall back to pinned version + // #then should fall back to pinned version expect(result).toBe("oh-my-opencode@3.0.0-beta.3") }) test("returns pinned version when npm returns non-ok response", async () => { - // given npm returns 404 + // #given npm returns 404 globalThis.fetch = mock(() => Promise.resolve({ ok: false, @@ -94,15 +94,15 @@ describe("getPluginNameWithVersion", () => { } as Response) ) as unknown as typeof fetch - // when current version is 2.14.0 + // #when current version is 2.14.0 const result = await getPluginNameWithVersion("2.14.0") - // then should fall back to pinned version + // #then should fall back to pinned version expect(result).toBe("oh-my-opencode@2.14.0") }) test("prioritizes latest over other tags when version matches multiple", async () => { - // given version matches both latest and beta (during release promotion) + // #given version matches both latest and beta (during release promotion) globalThis.fetch = mock(() => Promise.resolve({ ok: true, @@ -110,10 +110,10 @@ describe("getPluginNameWithVersion", () => { } as Response) ) as unknown as typeof fetch - // when current version matches both + // #when current version matches both const result = await getPluginNameWithVersion("3.0.0") - // then should prioritize @latest + // #then should prioritize @latest expect(result).toBe("oh-my-opencode@latest") }) }) @@ -126,7 +126,7 @@ describe("fetchNpmDistTags", () => { }) test("returns dist-tags on success", async () => { - // given npm returns dist-tags + // #given npm returns dist-tags globalThis.fetch = mock(() => Promise.resolve({ ok: true, @@ -134,26 +134,26 @@ describe("fetchNpmDistTags", () => { } as Response) ) as unknown as typeof fetch - // when fetching dist-tags + // #when fetching dist-tags const result = await fetchNpmDistTags("oh-my-opencode") - // then should return the tags + // #then should return the tags expect(result).toEqual({ latest: "2.14.0", beta: "3.0.0-beta.3" }) }) test("returns null on network failure", async () => { - // given network failure + // #given network failure globalThis.fetch = mock(() => Promise.reject(new Error("Network error"))) as unknown as typeof fetch - // when fetching dist-tags + // #when fetching dist-tags const result = await fetchNpmDistTags("oh-my-opencode") - // then should return null + // #then should return null expect(result).toBeNull() }) test("returns null on non-ok response", async () => { - // given npm returns 404 + // #given npm returns 404 globalThis.fetch = mock(() => Promise.resolve({ ok: false, @@ -161,10 +161,10 @@ describe("fetchNpmDistTags", () => { } as Response) ) as unknown as typeof fetch - // when fetching dist-tags + // #when fetching dist-tags const result = await fetchNpmDistTags("oh-my-opencode") - // then should return null + // #then should return null expect(result).toBeNull() }) }) @@ -202,19 +202,19 @@ describe("config-manager ANTIGRAVITY_PROVIDER_CONFIG", () => { }) test("Gemini models have variant definitions", () => { - // given the antigravity provider config + // #given the antigravity provider config const models = (ANTIGRAVITY_PROVIDER_CONFIG as any).google.models as Record - // when checking Gemini Pro variants + // #when checking Gemini Pro variants const pro = models["antigravity-gemini-3-pro"] - // then should have low and high variants + // #then should have low and high variants expect(pro.variants).toBeTruthy() expect(pro.variants.low).toBeTruthy() expect(pro.variants.high).toBeTruthy() - // when checking Gemini Flash variants + // #when checking Gemini Flash variants const flash = models["antigravity-gemini-3-flash"] - // then should have minimal, low, medium, high variants + // #then should have minimal, low, medium, high variants expect(flash.variants).toBeTruthy() expect(flash.variants.minimal).toBeTruthy() expect(flash.variants.low).toBeTruthy() @@ -223,14 +223,14 @@ describe("config-manager ANTIGRAVITY_PROVIDER_CONFIG", () => { }) test("Claude thinking models have variant definitions", () => { - // given the antigravity provider config + // #given the antigravity provider config const models = (ANTIGRAVITY_PROVIDER_CONFIG as any).google.models as Record - // when checking Claude thinking variants + // #when checking Claude thinking variants const sonnetThinking = models["antigravity-claude-sonnet-4-5-thinking"] const opusThinking = models["antigravity-claude-opus-4-5-thinking"] - // then both should have low and max variants + // #then both should have low and max variants for (const model of [sonnetThinking, opusThinking]) { expect(model.variants).toBeTruthy() expect(model.variants.low).toBeTruthy() @@ -241,7 +241,7 @@ describe("config-manager ANTIGRAVITY_PROVIDER_CONFIG", () => { describe("generateOmoConfig - model fallback system", () => { test("generates native sonnet models when Claude standard subscription", () => { - // given user has Claude standard subscription (not max20) + // #given user has Claude standard subscription (not max20) const config: InstallConfig = { hasClaude: true, isMax20: false, @@ -253,17 +253,17 @@ describe("generateOmoConfig - model fallback system", () => { hasKimiForCoding: false, } - // when generating config + // #when generating config const result = generateOmoConfig(config) - // then should use native anthropic sonnet (cost-efficient for standard plan) + // #then Sisyphus uses Claude (OR logic - at least one provider available) expect(result.$schema).toBe("https://raw.githubusercontent.com/code-yeongyu/oh-my-opencode/master/assets/oh-my-opencode.schema.json") expect(result.agents).toBeDefined() - expect((result.agents as Record).sisyphus.model).toBe("anthropic/claude-sonnet-4-5") + expect((result.agents as Record).sisyphus.model).toBe("anthropic/claude-opus-4-5") }) test("generates native opus models when Claude max20 subscription", () => { - // given user has Claude max20 subscription + // #given user has Claude max20 subscription const config: InstallConfig = { hasClaude: true, isMax20: true, @@ -275,15 +275,15 @@ describe("generateOmoConfig - model fallback system", () => { hasKimiForCoding: false, } - // when generating config + // #when generating config const result = generateOmoConfig(config) - // then should use native anthropic opus (max power for max20 plan) + // #then Sisyphus uses Claude (OR logic - at least one provider available) expect((result.agents as Record).sisyphus.model).toBe("anthropic/claude-opus-4-5") }) test("uses github-copilot sonnet fallback when only copilot available", () => { - // given user has only copilot (no max plan) + // #given user has only copilot (no max plan) const config: InstallConfig = { hasClaude: false, isMax20: false, @@ -295,15 +295,15 @@ describe("generateOmoConfig - model fallback system", () => { hasKimiForCoding: false, } - // when generating config + // #when generating config const result = generateOmoConfig(config) - // then should use github-copilot sonnet models (copilot fallback) - expect((result.agents as Record).sisyphus.model).toBe("github-copilot/claude-sonnet-4.5") + // #then Sisyphus uses Copilot (OR logic - copilot is in claude-opus-4-5 providers) + expect((result.agents as Record).sisyphus.model).toBe("github-copilot/claude-opus-4.5") }) test("uses ultimate fallback when no providers configured", () => { - // given user has no providers + // #given user has no providers const config: InstallConfig = { hasClaude: false, isMax20: false, @@ -315,16 +315,16 @@ describe("generateOmoConfig - model fallback system", () => { hasKimiForCoding: false, } - // when generating config + // #when generating config const result = generateOmoConfig(config) - // then should use ultimate fallback for all agents + // #then Sisyphus is omitted (requires all fallback providers) expect(result.$schema).toBe("https://raw.githubusercontent.com/code-yeongyu/oh-my-opencode/master/assets/oh-my-opencode.schema.json") - expect((result.agents as Record).sisyphus.model).toBe("opencode/glm-4.7-free") + expect((result.agents as Record).sisyphus).toBeUndefined() }) test("uses zai-coding-plan/glm-4.7 for librarian when Z.ai available", () => { - // given user has Z.ai and Claude max20 + // #given user has Z.ai and Claude max20 const config: InstallConfig = { hasClaude: true, isMax20: true, @@ -336,17 +336,17 @@ describe("generateOmoConfig - model fallback system", () => { hasKimiForCoding: false, } - // when generating config + // #when generating config const result = generateOmoConfig(config) - // then librarian should use zai-coding-plan/glm-4.7 + // #then librarian should use zai-coding-plan/glm-4.7 expect((result.agents as Record).librarian.model).toBe("zai-coding-plan/glm-4.7") - // then other agents should use native opus (max20 plan) + // #then Sisyphus uses Claude (OR logic) expect((result.agents as Record).sisyphus.model).toBe("anthropic/claude-opus-4-5") }) test("uses native OpenAI models when only ChatGPT available", () => { - // given user has only ChatGPT subscription + // #given user has only ChatGPT subscription const config: InstallConfig = { hasClaude: false, isMax20: false, @@ -358,19 +358,19 @@ describe("generateOmoConfig - model fallback system", () => { hasKimiForCoding: false, } - // when generating config + // #when generating config const result = generateOmoConfig(config) - // then Sisyphus should use native OpenAI (fallback within native tier) - expect((result.agents as Record).sisyphus.model).toBe("openai/gpt-5.2") - // then Oracle should use native OpenAI (first fallback entry) + // #then Sisyphus is omitted (requires all fallback providers) + expect((result.agents as Record).sisyphus).toBeUndefined() + // #then Oracle should use native OpenAI (first fallback entry) expect((result.agents as Record).oracle.model).toBe("openai/gpt-5.2") - // then multimodal-looker should use native OpenAI (fallback within native tier) + // #then multimodal-looker should use native OpenAI (fallback within native tier) expect((result.agents as Record)["multimodal-looker"].model).toBe("openai/gpt-5.2") }) test("uses haiku for explore when Claude max20", () => { - // given user has Claude max20 + // #given user has Claude max20 const config: InstallConfig = { hasClaude: true, isMax20: true, @@ -382,15 +382,15 @@ describe("generateOmoConfig - model fallback system", () => { hasKimiForCoding: false, } - // when generating config + // #when generating config const result = generateOmoConfig(config) - // then explore should use haiku (max20 plan uses Claude quota) + // #then explore should use haiku (max20 plan uses Claude quota) expect((result.agents as Record).explore.model).toBe("anthropic/claude-haiku-4-5") }) test("uses haiku for explore regardless of max20 flag", () => { - // given user has Claude but not max20 + // #given user has Claude but not max20 const config: InstallConfig = { hasClaude: true, isMax20: false, @@ -402,10 +402,10 @@ describe("generateOmoConfig - model fallback system", () => { hasKimiForCoding: false, } - // when generating config + // #when generating config const result = generateOmoConfig(config) - // then explore should use haiku (isMax20 doesn't affect explore anymore) + // #then explore should use haiku (isMax20 doesn't affect explore anymore) expect((result.agents as Record).explore.model).toBe("anthropic/claude-haiku-4-5") }) }) diff --git a/src/cli/model-fallback.test.ts b/src/cli/model-fallback.test.ts index 0e08102d..ef764e28 100644 --- a/src/cli/model-fallback.test.ts +++ b/src/cli/model-fallback.test.ts @@ -20,103 +20,103 @@ function createConfig(overrides: Partial = {}): InstallConfig { describe("generateModelConfig", () => { describe("no providers available", () => { test("returns ULTIMATE_FALLBACK for all agents and categories when no providers", () => { - // given no providers are available + // #given no providers are available const config = createConfig() - // when generateModelConfig is called + // #when generateModelConfig is called const result = generateModelConfig(config) - // then should use ULTIMATE_FALLBACK for everything + // #then should use ULTIMATE_FALLBACK for everything expect(result).toMatchSnapshot() }) }) describe("single native provider", () => { test("uses Claude models when only Claude is available", () => { - // given only Claude is available + // #given only Claude is available const config = createConfig({ hasClaude: true }) - // when generateModelConfig is called + // #when generateModelConfig is called const result = generateModelConfig(config) - // then should use Claude models per NATIVE_FALLBACK_CHAINS + // #then should use Claude models per NATIVE_FALLBACK_CHAINS expect(result).toMatchSnapshot() }) test("uses Claude models with isMax20 flag", () => { - // given Claude is available with Max 20 plan + // #given Claude is available with Max 20 plan const config = createConfig({ hasClaude: true, isMax20: true }) - // when generateModelConfig is called + // #when generateModelConfig is called const result = generateModelConfig(config) - // then should use higher capability models for Sisyphus + // #then should use higher capability models for Sisyphus expect(result).toMatchSnapshot() }) test("uses OpenAI models when only OpenAI is available", () => { - // given only OpenAI is available + // #given only OpenAI is available const config = createConfig({ hasOpenAI: true }) - // when generateModelConfig is called + // #when generateModelConfig is called const result = generateModelConfig(config) - // then should use OpenAI models + // #then should use OpenAI models expect(result).toMatchSnapshot() }) test("uses OpenAI models with isMax20 flag", () => { - // given OpenAI is available with Max 20 plan + // #given OpenAI is available with Max 20 plan const config = createConfig({ hasOpenAI: true, isMax20: true }) - // when generateModelConfig is called + // #when generateModelConfig is called const result = generateModelConfig(config) - // then should use higher capability models + // #then should use higher capability models expect(result).toMatchSnapshot() }) test("uses Gemini models when only Gemini is available", () => { - // given only Gemini is available + // #given only Gemini is available const config = createConfig({ hasGemini: true }) - // when generateModelConfig is called + // #when generateModelConfig is called const result = generateModelConfig(config) - // then should use Gemini models + // #then should use Gemini models expect(result).toMatchSnapshot() }) test("uses Gemini models with isMax20 flag", () => { - // given Gemini is available with Max 20 plan + // #given Gemini is available with Max 20 plan const config = createConfig({ hasGemini: true, isMax20: true }) - // when generateModelConfig is called + // #when generateModelConfig is called const result = generateModelConfig(config) - // then should use higher capability models + // #then should use higher capability models expect(result).toMatchSnapshot() }) }) describe("all native providers", () => { test("uses preferred models from fallback chains when all natives available", () => { - // given all native providers are available + // #given all native providers are available const config = createConfig({ hasClaude: true, hasOpenAI: true, hasGemini: true, }) - // when generateModelConfig is called + // #when generateModelConfig is called const result = generateModelConfig(config) - // then should use first provider in each fallback chain + // #then should use first provider in each fallback chain expect(result).toMatchSnapshot() }) test("uses preferred models with isMax20 flag when all natives available", () => { - // given all native providers are available with Max 20 plan + // #given all native providers are available with Max 20 plan const config = createConfig({ hasClaude: true, hasOpenAI: true, @@ -124,156 +124,156 @@ describe("generateModelConfig", () => { isMax20: true, }) - // when generateModelConfig is called + // #when generateModelConfig is called const result = generateModelConfig(config) - // then should use higher capability models + // #then should use higher capability models expect(result).toMatchSnapshot() }) }) describe("fallback providers", () => { test("uses OpenCode Zen models when only OpenCode Zen is available", () => { - // given only OpenCode Zen is available + // #given only OpenCode Zen is available const config = createConfig({ hasOpencodeZen: true }) - // when generateModelConfig is called + // #when generateModelConfig is called const result = generateModelConfig(config) - // then should use OPENCODE_ZEN_MODELS + // #then should use OPENCODE_ZEN_MODELS expect(result).toMatchSnapshot() }) test("uses OpenCode Zen models with isMax20 flag", () => { - // given OpenCode Zen is available with Max 20 plan + // #given OpenCode Zen is available with Max 20 plan const config = createConfig({ hasOpencodeZen: true, isMax20: true }) - // when generateModelConfig is called + // #when generateModelConfig is called const result = generateModelConfig(config) - // then should use higher capability models + // #then should use higher capability models expect(result).toMatchSnapshot() }) test("uses GitHub Copilot models when only Copilot is available", () => { - // given only GitHub Copilot is available + // #given only GitHub Copilot is available const config = createConfig({ hasCopilot: true }) - // when generateModelConfig is called + // #when generateModelConfig is called const result = generateModelConfig(config) - // then should use GITHUB_COPILOT_MODELS + // #then should use GITHUB_COPILOT_MODELS expect(result).toMatchSnapshot() }) test("uses GitHub Copilot models with isMax20 flag", () => { - // given GitHub Copilot is available with Max 20 plan + // #given GitHub Copilot is available with Max 20 plan const config = createConfig({ hasCopilot: true, isMax20: true }) - // when generateModelConfig is called + // #when generateModelConfig is called const result = generateModelConfig(config) - // then should use higher capability models + // #then should use higher capability models expect(result).toMatchSnapshot() }) test("uses ZAI model for librarian when only ZAI is available", () => { - // given only ZAI is available + // #given only ZAI is available const config = createConfig({ hasZaiCodingPlan: true }) - // when generateModelConfig is called + // #when generateModelConfig is called const result = generateModelConfig(config) - // then should use ZAI_MODEL for librarian + // #then should use ZAI_MODEL for librarian expect(result).toMatchSnapshot() }) test("uses ZAI model for librarian with isMax20 flag", () => { - // given ZAI is available with Max 20 plan + // #given ZAI is available with Max 20 plan const config = createConfig({ hasZaiCodingPlan: true, isMax20: true }) - // when generateModelConfig is called + // #when generateModelConfig is called const result = generateModelConfig(config) - // then should use ZAI_MODEL for librarian + // #then should use ZAI_MODEL for librarian expect(result).toMatchSnapshot() }) }) describe("mixed provider scenarios", () => { test("uses Claude + OpenCode Zen combination", () => { - // given Claude and OpenCode Zen are available + // #given Claude and OpenCode Zen are available const config = createConfig({ hasClaude: true, hasOpencodeZen: true, }) - // when generateModelConfig is called + // #when generateModelConfig is called const result = generateModelConfig(config) - // then should prefer Claude (native) over OpenCode Zen + // #then should prefer Claude (native) over OpenCode Zen expect(result).toMatchSnapshot() }) test("uses OpenAI + Copilot combination", () => { - // given OpenAI and Copilot are available + // #given OpenAI and Copilot are available const config = createConfig({ hasOpenAI: true, hasCopilot: true, }) - // when generateModelConfig is called + // #when generateModelConfig is called const result = generateModelConfig(config) - // then should prefer OpenAI (native) over Copilot + // #then should prefer OpenAI (native) over Copilot expect(result).toMatchSnapshot() }) test("uses Claude + ZAI combination (librarian uses ZAI)", () => { - // given Claude and ZAI are available + // #given Claude and ZAI are available const config = createConfig({ hasClaude: true, hasZaiCodingPlan: true, }) - // when generateModelConfig is called + // #when generateModelConfig is called const result = generateModelConfig(config) - // then librarian should use ZAI, others use Claude + // #then librarian should use ZAI, others use Claude expect(result).toMatchSnapshot() }) test("uses Gemini + Claude combination (explore uses Gemini)", () => { - // given Gemini and Claude are available + // #given Gemini and Claude are available const config = createConfig({ hasGemini: true, hasClaude: true, }) - // when generateModelConfig is called + // #when generateModelConfig is called const result = generateModelConfig(config) - // then explore should use Gemini flash + // #then explore should use Gemini flash expect(result).toMatchSnapshot() }) test("uses all fallback providers together", () => { - // given all fallback providers are available + // #given all fallback providers are available const config = createConfig({ hasOpencodeZen: true, hasCopilot: true, hasZaiCodingPlan: true, }) - // when generateModelConfig is called + // #when generateModelConfig is called const result = generateModelConfig(config) - // then should prefer OpenCode Zen, but librarian uses ZAI + // #then should prefer OpenCode Zen, but librarian uses ZAI expect(result).toMatchSnapshot() }) test("uses all providers together", () => { - // given all providers are available + // #given all providers are available const config = createConfig({ hasClaude: true, hasOpenAI: true, @@ -283,15 +283,15 @@ describe("generateModelConfig", () => { hasZaiCodingPlan: true, }) - // when generateModelConfig is called + // #when generateModelConfig is called const result = generateModelConfig(config) - // then should prefer native providers, librarian uses ZAI + // #then should prefer native providers, librarian uses ZAI expect(result).toMatchSnapshot() }) test("uses all providers with isMax20 flag", () => { - // given all providers are available with Max 20 plan + // #given all providers are available with Max 20 plan const config = createConfig({ hasClaude: true, hasOpenAI: true, @@ -302,131 +302,219 @@ describe("generateModelConfig", () => { isMax20: true, }) - // when generateModelConfig is called + // #when generateModelConfig is called const result = generateModelConfig(config) - // then should use higher capability models + // #then should use higher capability models expect(result).toMatchSnapshot() }) }) describe("explore agent special cases", () => { test("explore uses gpt-5-nano when only Gemini available (no Claude)", () => { - // given only Gemini is available (no Claude) + // #given only Gemini is available (no Claude) const config = createConfig({ hasGemini: true }) - // when generateModelConfig is called + // #when generateModelConfig is called const result = generateModelConfig(config) - // then explore should use gpt-5-nano (Claude haiku not available) + // #then explore should use gpt-5-nano (Claude haiku not available) expect(result.agents?.explore?.model).toBe("opencode/gpt-5-nano") }) test("explore uses Claude haiku when Claude available", () => { - // given Claude is available + // #given Claude is available const config = createConfig({ hasClaude: true, isMax20: true }) - // when generateModelConfig is called + // #when generateModelConfig is called const result = generateModelConfig(config) - // then explore should use claude-haiku-4-5 + // #then explore should use claude-haiku-4-5 expect(result.agents?.explore?.model).toBe("anthropic/claude-haiku-4-5") }) test("explore uses Claude haiku regardless of isMax20 flag", () => { - // given Claude is available without Max 20 plan + // #given Claude is available without Max 20 plan const config = createConfig({ hasClaude: true, isMax20: false }) - // when generateModelConfig is called + // #when generateModelConfig is called const result = generateModelConfig(config) - // then explore should use claude-haiku-4-5 (isMax20 doesn't affect explore) + // #then explore should use claude-haiku-4-5 (isMax20 doesn't affect explore) expect(result.agents?.explore?.model).toBe("anthropic/claude-haiku-4-5") }) test("explore uses gpt-5-nano when only OpenAI available", () => { - // given only OpenAI is available + // #given only OpenAI is available const config = createConfig({ hasOpenAI: true }) - // when generateModelConfig is called + // #when generateModelConfig is called const result = generateModelConfig(config) - // then explore should use gpt-5-nano (fallback) + // #then explore should use gpt-5-nano (fallback) expect(result.agents?.explore?.model).toBe("opencode/gpt-5-nano") }) test("explore uses gpt-5-mini when only Copilot available", () => { - // given only Copilot is available + // #given only Copilot is available const config = createConfig({ hasCopilot: true }) - // when generateModelConfig is called + // #when generateModelConfig is called const result = generateModelConfig(config) - // then explore should use gpt-5-mini (Copilot fallback) + // #then explore should use gpt-5-mini (Copilot fallback) expect(result.agents?.explore?.model).toBe("github-copilot/gpt-5-mini") }) }) describe("Sisyphus agent special cases", () => { - test("Sisyphus uses sisyphus-high capability when isMax20 is true", () => { - // given Claude is available with Max 20 plan + test("Sisyphus is created when at least one fallback provider is available (Claude)", () => { + // #given const config = createConfig({ hasClaude: true, isMax20: true }) - // when generateModelConfig is called + // #when const result = generateModelConfig(config) - // then Sisyphus should use opus (sisyphus-high) + // #then expect(result.agents?.sisyphus?.model).toBe("anthropic/claude-opus-4-5") }) - test("Sisyphus uses sisyphus-low capability when isMax20 is false", () => { - // given Claude is available without Max 20 plan - const config = createConfig({ hasClaude: true, isMax20: false }) + test("Sisyphus is created when multiple fallback providers are available", () => { + // #given + const config = createConfig({ + hasClaude: true, + hasKimiForCoding: true, + hasOpencodeZen: true, + hasZaiCodingPlan: true, + isMax20: true, + }) - // when generateModelConfig is called + // #when const result = generateModelConfig(config) - // then Sisyphus should use sonnet (sisyphus-low) - expect(result.agents?.sisyphus?.model).toBe("anthropic/claude-sonnet-4-5") + // #then + expect(result.agents?.sisyphus?.model).toBe("anthropic/claude-opus-4-5") + }) + + test("Sisyphus is omitted when no fallback provider is available (OpenAI not in chain)", () => { + // #given + const config = createConfig({ hasOpenAI: true }) + + // #when + const result = generateModelConfig(config) + + // #then + expect(result.agents?.sisyphus).toBeUndefined() + }) + }) + + describe("Hephaestus agent special cases", () => { + test("Hephaestus is created when OpenAI is available (has gpt-5.2-codex)", () => { + // #given + const config = createConfig({ hasOpenAI: true }) + + // #when + const result = generateModelConfig(config) + + // #then + expect(result.agents?.hephaestus?.model).toBe("openai/gpt-5.2-codex") + expect(result.agents?.hephaestus?.variant).toBe("medium") + }) + + test("Hephaestus is created when Copilot is available (has gpt-5.2-codex)", () => { + // #given + const config = createConfig({ hasCopilot: true }) + + // #when + const result = generateModelConfig(config) + + // #then + expect(result.agents?.hephaestus?.model).toBe("github-copilot/gpt-5.2-codex") + expect(result.agents?.hephaestus?.variant).toBe("medium") + }) + + test("Hephaestus is created when OpenCode Zen is available (has gpt-5.2-codex)", () => { + // #given + const config = createConfig({ hasOpencodeZen: true }) + + // #when + const result = generateModelConfig(config) + + // #then + expect(result.agents?.hephaestus?.model).toBe("opencode/gpt-5.2-codex") + expect(result.agents?.hephaestus?.variant).toBe("medium") + }) + + test("Hephaestus is omitted when only Claude is available (no gpt-5.2-codex)", () => { + // #given + const config = createConfig({ hasClaude: true }) + + // #when + const result = generateModelConfig(config) + + // #then + expect(result.agents?.hephaestus).toBeUndefined() + }) + + test("Hephaestus is omitted when only Gemini is available (no gpt-5.2-codex)", () => { + // #given + const config = createConfig({ hasGemini: true }) + + // #when + const result = generateModelConfig(config) + + // #then + expect(result.agents?.hephaestus).toBeUndefined() + }) + + test("Hephaestus is omitted when only ZAI is available (no gpt-5.2-codex)", () => { + // #given + const config = createConfig({ hasZaiCodingPlan: true }) + + // #when + const result = generateModelConfig(config) + + // #then + expect(result.agents?.hephaestus).toBeUndefined() }) }) describe("librarian agent special cases", () => { test("librarian uses ZAI when ZAI is available regardless of other providers", () => { - // given ZAI and Claude are available + // #given ZAI and Claude are available const config = createConfig({ hasClaude: true, hasZaiCodingPlan: true, }) - // when generateModelConfig is called + // #when generateModelConfig is called const result = generateModelConfig(config) - // then librarian should use ZAI_MODEL + // #then librarian should use ZAI_MODEL expect(result.agents?.librarian?.model).toBe("zai-coding-plan/glm-4.7") }) test("librarian uses claude-sonnet when ZAI not available but Claude is", () => { - // given only Claude is available (no ZAI) + // #given only Claude is available (no ZAI) const config = createConfig({ hasClaude: true }) - // when generateModelConfig is called + // #when generateModelConfig is called const result = generateModelConfig(config) - // then librarian should use claude-sonnet-4-5 (third in fallback chain after ZAI and opencode/glm) + // #then librarian should use claude-sonnet-4-5 (third in fallback chain after ZAI and opencode/glm) expect(result.agents?.librarian?.model).toBe("anthropic/claude-sonnet-4-5") }) }) describe("schema URL", () => { test("always includes correct schema URL", () => { - // given any config + // #given any config const config = createConfig() - // when generateModelConfig is called + // #when generateModelConfig is called const result = generateModelConfig(config) - // then should include correct schema URL + // #then should include correct schema URL expect(result.$schema).toBe( "https://raw.githubusercontent.com/code-yeongyu/oh-my-opencode/master/assets/oh-my-opencode.schema.json" ) diff --git a/src/cli/model-fallback.ts b/src/cli/model-fallback.ts index 458abbd0..08e16300 100644 --- a/src/cli/model-fallback.ts +++ b/src/cli/model-fallback.ts @@ -97,19 +97,27 @@ function resolveModelFromChain( return null } -function getSisyphusFallbackChain(isMaxPlan: boolean): FallbackEntry[] { - // Sisyphus uses opus when isMaxPlan, sonnet otherwise - if (isMaxPlan) { - return AGENT_MODEL_REQUIREMENTS.sisyphus.fallbackChain - } - // For non-max plan, use sonnet instead of opus - return [ - { providers: ["anthropic", "github-copilot", "opencode"], model: "claude-sonnet-4-5" }, - { providers: ["kimi-for-coding"], model: "k2p5" }, - { providers: ["opencode"], model: "kimi-k2.5-free" }, - { providers: ["openai", "github-copilot", "opencode"], model: "gpt-5.2", variant: "high" }, - { providers: ["google", "github-copilot", "opencode"], model: "gemini-3-pro" }, - ] +function getSisyphusFallbackChain(): FallbackEntry[] { + return AGENT_MODEL_REQUIREMENTS.sisyphus.fallbackChain +} + +function isAnyFallbackEntryAvailable( + fallbackChain: FallbackEntry[], + avail: ProviderAvailability +): boolean { + return fallbackChain.some((entry) => + entry.providers.some((provider) => isProviderAvailable(provider, avail)) + ) +} + +function isRequiredModelAvailable( + requiresModel: string, + fallbackChain: FallbackEntry[], + avail: ProviderAvailability +): boolean { + const matchingEntry = fallbackChain.find((entry) => entry.model === requiresModel) + if (!matchingEntry) return false + return matchingEntry.providers.some((provider) => isProviderAvailable(provider, avail)) } export function generateModelConfig(config: InstallConfig): GeneratedOmoConfig { @@ -127,7 +135,9 @@ export function generateModelConfig(config: InstallConfig): GeneratedOmoConfig { return { $schema: SCHEMA_URL, agents: Object.fromEntries( - Object.keys(AGENT_MODEL_REQUIREMENTS).map((role) => [role, { model: ULTIMATE_FALLBACK }]) + Object.entries(AGENT_MODEL_REQUIREMENTS) + .filter(([role, req]) => !(role === "sisyphus" && req.requiresAnyModel)) + .map(([role]) => [role, { model: ULTIMATE_FALLBACK }]) ), categories: Object.fromEntries( Object.keys(CATEGORY_MODEL_REQUIREMENTS).map((cat) => [cat, { model: ULTIMATE_FALLBACK }]) @@ -139,13 +149,11 @@ export function generateModelConfig(config: InstallConfig): GeneratedOmoConfig { const categories: Record = {} for (const [role, req] of Object.entries(AGENT_MODEL_REQUIREMENTS)) { - // Special case: librarian always uses ZAI first if available if (role === "librarian" && avail.zai) { agents[role] = { model: ZAI_MODEL } continue } - // Special case: explore uses Claude haiku → GitHub Copilot gpt-5-mini → OpenCode gpt-5-nano if (role === "explore") { if (avail.native.claude) { agents[role] = { model: "anthropic/claude-haiku-4-5" } @@ -159,11 +167,24 @@ export function generateModelConfig(config: InstallConfig): GeneratedOmoConfig { continue } - // Special case: Sisyphus uses different fallbackChain based on isMaxPlan - const fallbackChain = - role === "sisyphus" ? getSisyphusFallbackChain(avail.isMaxPlan) : req.fallbackChain + if (role === "sisyphus") { + const fallbackChain = getSisyphusFallbackChain() + if (req.requiresAnyModel && !isAnyFallbackEntryAvailable(fallbackChain, avail)) { + continue + } + const resolved = resolveModelFromChain(fallbackChain, avail) + if (resolved) { + const variant = resolved.variant ?? req.variant + agents[role] = variant ? { model: resolved.model, variant } : { model: resolved.model } + } + continue + } - const resolved = resolveModelFromChain(fallbackChain, avail) + if (req.requiresModel && !isRequiredModelAvailable(req.requiresModel, req.fallbackChain, avail)) { + continue + } + + const resolved = resolveModelFromChain(req.fallbackChain, avail) if (resolved) { const variant = resolved.variant ?? req.variant agents[role] = variant ? { model: resolved.model, variant } : { model: resolved.model } @@ -179,6 +200,10 @@ export function generateModelConfig(config: InstallConfig): GeneratedOmoConfig { ? CATEGORY_MODEL_REQUIREMENTS["unspecified-low"].fallbackChain : req.fallbackChain + if (req.requiresModel && !isRequiredModelAvailable(req.requiresModel, req.fallbackChain, avail)) { + continue + } + const resolved = resolveModelFromChain(fallbackChain, avail) if (resolved) { const variant = resolved.variant ?? req.variant diff --git a/src/config/schema.ts b/src/config/schema.ts index 4eef69c0..3d33d929 100644 --- a/src/config/schema.ts +++ b/src/config/schema.ts @@ -18,6 +18,7 @@ const AgentPermissionSchema = z.object({ export const BuiltinAgentNameSchema = z.enum([ "sisyphus", + "hephaestus", "prometheus", "oracle", "librarian", @@ -39,6 +40,7 @@ export const OverridableAgentNameSchema = z.enum([ "build", "plan", "sisyphus", + "hephaestus", "sisyphus-junior", "OpenCode-Builder", "prometheus", @@ -137,6 +139,7 @@ export const AgentOverridesSchema = z.object({ build: AgentOverrideConfigSchema.optional(), plan: AgentOverrideConfigSchema.optional(), sisyphus: AgentOverrideConfigSchema.optional(), + hephaestus: AgentOverrideConfigSchema.optional(), "sisyphus-junior": AgentOverrideConfigSchema.optional(), "OpenCode-Builder": AgentOverrideConfigSchema.optional(), prometheus: AgentOverrideConfigSchema.optional(), diff --git a/src/features/AGENTS.md b/src/features/AGENTS.md index d961cb25..ee2d04e2 100644 --- a/src/features/AGENTS.md +++ b/src/features/AGENTS.md @@ -2,18 +2,20 @@ ## OVERVIEW -Core feature modules + Claude Code compatibility layer. Orchestrates background agents, skill MCPs, builtin skills/commands, and 16 feature modules. +20 feature modules: background agents, skill MCPs, builtin skills/commands, Claude Code compatibility layer. + +**Feature Types**: Task orchestration, Skill definitions, Command templates, Claude Code loaders, Supporting utilities ## STRUCTURE ``` features/ -├── background-agent/ # Task lifecycle (1377 lines) +├── background-agent/ # Task lifecycle (1418 lines) │ ├── manager.ts # Launch → poll → complete │ └── concurrency.ts # Per-provider limits ├── builtin-skills/ # Core skills (1729 lines) -│ └── skills.ts # agent-browser, dev-browser, frontend-ui-ux, git-master, typescript-programmer -├── builtin-commands/ # ralph-loop, refactor, ulw-loop, init-deep, start-work, cancel-ralph +│ └── skills.ts # playwright, dev-browser, frontend-ui-ux, git-master, typescript-programmer +├── builtin-commands/ # ralph-loop, refactor, ulw-loop, init-deep, start-work, cancel-ralph, stop-continuation ├── claude-code-agent-loader/ # ~/.claude/agents/*.md ├── claude-code-command-loader/ # ~/.claude/commands/*.md ├── claude-code-mcp-loader/ # .mcp.json with ${VAR} expansion @@ -24,9 +26,11 @@ features/ ├── boulder-state/ # Todo state persistence ├── hook-message-injector/ # Message injection ├── task-toast-manager/ # Background task notifications -├── skill-mcp-manager/ # MCP client lifecycle (520 lines) +├── skill-mcp-manager/ # MCP client lifecycle (617 lines) ├── tmux-subagent/ # Tmux session management -└── ... (16 modules total) +├── mcp-oauth/ # MCP OAuth handling +├── sisyphus-swarm/ # Swarm coordination +└── sisyphus-tasks/ # Task tracking ``` ## LOADER PRIORITY diff --git a/src/hooks/AGENTS.md b/src/hooks/AGENTS.md index 9f9e68cc..0dda6e41 100644 --- a/src/hooks/AGENTS.md +++ b/src/hooks/AGENTS.md @@ -1,14 +1,22 @@ # HOOKS KNOWLEDGE BASE ## OVERVIEW -32 lifecycle hooks intercepting/modifying agent behavior. Events: PreToolUse, PostToolUse, UserPromptSubmit, Stop, onSummarize. + +34 lifecycle hooks intercepting/modifying agent behavior across 5 events. + +**Event Types**: +- `UserPromptSubmit` (`chat.message`) - Can block +- `PreToolUse` (`tool.execute.before`) - Can block +- `PostToolUse` (`tool.execute.after`) - Cannot block +- `Stop` (`event: session.stop`) - Cannot block +- `onSummarize` (Compaction) - Cannot block ## STRUCTURE ``` hooks/ -├── atlas/ # Main orchestration (752 lines) +├── atlas/ # Main orchestration (757 lines) ├── anthropic-context-window-limit-recovery/ # Auto-summarize -├── todo-continuation-enforcer.ts # Force TODO completion (16k lines) +├── todo-continuation-enforcer.ts # Force TODO completion ├── ralph-loop/ # Self-referential dev loop ├── claude-code-hooks/ # settings.json compat layer - see AGENTS.md ├── comment-checker/ # Prevents AI slop @@ -37,6 +45,8 @@ hooks/ ├── category-skill-reminder/ # Reminds of category skills ├── empty-task-response-detector.ts # Detects empty responses ├── sisyphus-junior-notepad/ # Sisyphus Junior notepad +├── stop-continuation-guard/ # Guards stop continuation +├── subagent-question-blocker/ # Blocks subagent questions └── index.ts # Hook aggregation + registration ``` @@ -51,7 +61,7 @@ hooks/ ## EXECUTION ORDER - **UserPromptSubmit**: keywordDetector → claudeCodeHooks → autoSlashCommand → startWork -- **PreToolUse**: questionLabelTruncator → claudeCodeHooks → nonInteractiveEnv → commentChecker → directoryAgentsInjector → directoryReadmeInjector → rulesInjector → prometheusMdOnly → sisyphusJuniorNotepad → atlasHook +- **PreToolUse**: subagentQuestionBlocker → questionLabelTruncator → claudeCodeHooks → nonInteractiveEnv → commentChecker → directoryAgentsInjector → directoryReadmeInjector → rulesInjector → prometheusMdOnly → sisyphusJuniorNotepad → atlasHook - **PostToolUse**: claudeCodeHooks → toolOutputTruncator → contextWindowMonitor → commentChecker → directoryAgentsInjector → directoryReadmeInjector → rulesInjector → emptyTaskResponseDetector → agentUsageReminder → interactiveBashSession → editErrorRecovery → delegateTaskRetry → atlasHook → taskResumeInfo ## HOW TO ADD diff --git a/src/hooks/claude-code-hooks/AGENTS.md b/src/hooks/claude-code-hooks/AGENTS.md index 27ff024b..0f021ecb 100644 --- a/src/hooks/claude-code-hooks/AGENTS.md +++ b/src/hooks/claude-code-hooks/AGENTS.md @@ -1,7 +1,10 @@ # CLAUDE CODE HOOKS COMPATIBILITY ## OVERVIEW -Full Claude Code `settings.json` hook compatibility layer. Intercepts OpenCode events to execute external scripts/commands defined in Claude Code configuration. + +Full Claude Code `settings.json` hook compatibility layer. Intercepts OpenCode events to execute external scripts/commands. + +**Config Sources** (priority): `.claude/settings.json` (project) > `~/.claude/settings.json` (global) ## STRUCTURE ``` @@ -30,8 +33,9 @@ claude-code-hooks/ ## CONFIG SOURCES Priority (highest first): -1. `.claude/settings.json` (Project-local) -2. `~/.claude/settings.json` (Global user) +1. `.claude/settings.local.json` (Project-local, git-ignored) +2. `.claude/settings.json` (Project) +3. `~/.claude/settings.json` (Global user) ## HOOK EXECUTION - **Matchers**: Hooks filter by tool name or event type via regex/glob. diff --git a/src/hooks/keyword-detector/analyze/default.ts b/src/hooks/keyword-detector/analyze/default.ts new file mode 100644 index 00000000..ac758627 --- /dev/null +++ b/src/hooks/keyword-detector/analyze/default.ts @@ -0,0 +1,27 @@ +/** + * Analyze mode keyword detector. + * + * Triggers on analysis-related keywords across multiple languages: + * - English: analyze, analyse, investigate, examine, research, study, deep-dive, inspect, audit, evaluate, assess, review, diagnose, scrutinize, dissect, debug, comprehend, interpret, breakdown, understand, why is, how does, how to + * - Korean: 분석, 조사, 파악, 연구, 검토, 진단, 이해, 설명, 원인, 이유, 뜯어봐, 따져봐, 평가, 해석, 디버깅, 디버그, 어떻게, 왜, 살펴 + * - Japanese: 分析, 調査, 解析, 検討, 研究, 診断, 理解, 説明, 検証, 精査, 究明, デバッグ, なぜ, どう, 仕組み + * - Chinese: 调查, 检查, 剖析, 深入, 诊断, 解释, 调试, 为什么, 原理, 搞清楚, 弄明白 + * - Vietnamese: phân tích, điều tra, nghiên cứu, kiểm tra, xem xét, chẩn đoán, giải thích, tìm hiểu, gỡ lỗi, tại sao + */ + +export const ANALYZE_PATTERN = + /\b(analyze|analyse|investigate|examine|research|study|deep[\s-]?dive|inspect|audit|evaluate|assess|review|diagnose|scrutinize|dissect|debug|comprehend|interpret|breakdown|understand)\b|why\s+is|how\s+does|how\s+to|분석|조사|파악|연구|검토|진단|이해|설명|원인|이유|뜯어봐|따져봐|평가|해석|디버깅|디버그|어떻게|왜|살펴|分析|調査|解析|検討|研究|診断|理解|説明|検証|精査|究明|デバッグ|なぜ|どう|仕組み|调查|检查|剖析|深入|诊断|解释|调试|为什么|原理|搞清楚|弄明白|phân tích|điều tra|nghiên cứu|kiểm tra|xem xét|chẩn đoán|giải thích|tìm hiểu|gỡ lỗi|tại sao/i + +export const ANALYZE_MESSAGE = `[analyze-mode] +ANALYSIS MODE. Gather context before diving deep: + +CONTEXT GATHERING (parallel): +- 1-2 explore agents (codebase patterns, implementations) +- 1-2 librarian agents (if external library involved) +- Direct tools: Grep, AST-grep, LSP for targeted searches + +IF COMPLEX - DO NOT STRUGGLE ALONE. Consult specialists: +- **Oracle**: Conventional problems (architecture, debugging, complex logic) +- **Artistry**: Non-conventional problems (different approach needed) + +SYNTHESIZE findings before proceeding.` diff --git a/src/hooks/keyword-detector/analyze/index.ts b/src/hooks/keyword-detector/analyze/index.ts new file mode 100644 index 00000000..ba85da56 --- /dev/null +++ b/src/hooks/keyword-detector/analyze/index.ts @@ -0,0 +1 @@ +export { ANALYZE_PATTERN, ANALYZE_MESSAGE } from "./default" diff --git a/src/hooks/keyword-detector/constants.ts b/src/hooks/keyword-detector/constants.ts index 8475c4ab..6c9bec4a 100644 --- a/src/hooks/keyword-detector/constants.ts +++ b/src/hooks/keyword-detector/constants.ts @@ -1,506 +1,31 @@ export const CODE_BLOCK_PATTERN = /```[\s\S]*?```/g export const INLINE_CODE_PATTERN = /`[^`]+`/g -const ULTRAWORK_PLANNER_SECTION = `## CRITICAL: YOU ARE A PLANNER, NOT AN IMPLEMENTER +// Re-export from submodules +export { isPlannerAgent, getUltraworkMessage } from "./ultrawork" +export { SEARCH_PATTERN, SEARCH_MESSAGE } from "./search" +export { ANALYZE_PATTERN, ANALYZE_MESSAGE } from "./analyze" -**IDENTITY CONSTRAINT (NON-NEGOTIABLE):** -You ARE the planner. You ARE NOT an implementer. You DO NOT write code. You DO NOT execute tasks. +import { getUltraworkMessage } from "./ultrawork" +import { SEARCH_PATTERN, SEARCH_MESSAGE } from "./search" +import { ANALYZE_PATTERN, ANALYZE_MESSAGE } from "./analyze" -**TOOL RESTRICTIONS (SYSTEM-ENFORCED):** -| Tool | Allowed | Blocked | -|------|---------|---------| -| Write/Edit | \`.sisyphus/**/*.md\` ONLY | Everything else | -| Read | All files | - | -| Bash | Research commands only | Implementation commands | -| delegate_task | explore, librarian | - | - -**IF YOU TRY TO WRITE/EDIT OUTSIDE \`.sisyphus/\`:** -- System will BLOCK your action -- You will receive an error -- DO NOT retry - you are not supposed to implement - -**YOUR ONLY WRITABLE PATHS:** -- \`.sisyphus/plans/*.md\` - Final work plans -- \`.sisyphus/drafts/*.md\` - Working drafts during interview - -**WHEN USER ASKS YOU TO IMPLEMENT:** -REFUSE. Say: "I'm a planner. I create work plans, not implementations. Run \`/start-work\` after I finish planning." - ---- - -## CONTEXT GATHERING (MANDATORY BEFORE PLANNING) - -You ARE the planner. Your job: create bulletproof work plans. -**Before drafting ANY plan, gather context via explore/librarian agents.** - -### Research Protocol -1. **Fire parallel background agents** for comprehensive context: - \`\`\` - delegate_task(agent="explore", prompt="Find existing patterns for [topic] in codebase", background=true) - delegate_task(agent="explore", prompt="Find test infrastructure and conventions", background=true) - delegate_task(agent="librarian", prompt="Find official docs and best practices for [technology]", background=true) - \`\`\` -2. **Wait for results** before planning - rushed plans fail -3. **Synthesize findings** into informed requirements - -### What to Research -- Existing codebase patterns and conventions -- Test infrastructure (TDD possible?) -- External library APIs and constraints -- Similar implementations in OSS (via librarian) - -**NEVER plan blind. Context first, plan second.** - ---- - -## MANDATORY OUTPUT: PARALLEL TASK GRAPH + TODO LIST - -**YOUR PRIMARY OUTPUT IS A PARALLEL EXECUTION TASK GRAPH.** - -When you finalize a plan, you MUST structure it for maximum parallel execution: - -### 1. Parallel Execution Waves (REQUIRED) - -Analyze task dependencies and group independent tasks into parallel waves: - -\`\`\` -Wave 1 (Start Immediately - No Dependencies): -├── Task 1: [description] → category: X, skills: [a, b] -└── Task 4: [description] → category: Y, skills: [c] - -Wave 2 (After Wave 1 Completes): -├── Task 2: [depends: 1] → category: X, skills: [a] -├── Task 3: [depends: 1] → category: Z, skills: [d] -└── Task 5: [depends: 4] → category: Y, skills: [c] - -Wave 3 (After Wave 2 Completes): -└── Task 6: [depends: 2, 3] → category: X, skills: [a, b] - -Critical Path: Task 1 → Task 2 → Task 6 -Estimated Parallel Speedup: ~40% faster than sequential -\`\`\` - -### 2. Dependency Matrix (REQUIRED) - -| Task | Depends On | Blocks | Can Parallelize With | -|------|------------|--------|---------------------| -| 1 | None | 2, 3 | 4 | -| 2 | 1 | 6 | 3, 5 | -| 3 | 1 | 6 | 2, 5 | -| 4 | None | 5 | 1 | -| 5 | 4 | None | 2, 3 | -| 6 | 2, 3 | None | None (final) | - -### 3. TODO List Structure (REQUIRED) - -Each TODO item MUST include: - -\`\`\`markdown -- [ ] N. [Task Title] - - **What to do**: [Clear steps] - - **Dependencies**: [Task numbers this depends on] | None - **Blocks**: [Task numbers that depend on this] - **Parallel Group**: Wave N (with Tasks X, Y) - - **Recommended Agent Profile**: - - **Category**: \`[visual-engineering | ultrabrain | artistry | quick | unspecified-low | unspecified-high | writing]\` - - **Skills**: [\`skill-1\`, \`skill-2\`] - - **Acceptance Criteria**: [Verifiable conditions] -\`\`\` - -### 4. Agent Dispatch Summary (REQUIRED) - -| Wave | Tasks | Dispatch Command | -|------|-------|------------------| -| 1 | 1, 4 | \`delegate_task(category="...", load_skills=[...], run_in_background=true)\` × 2 | -| 2 | 2, 3, 5 | \`delegate_task(...)\` × 3 after Wave 1 completes | -| 3 | 6 | \`delegate_task(...)\` final integration | - -**WHY PARALLEL TASK GRAPH IS MANDATORY:** -- Orchestrator (Sisyphus) executes tasks in parallel waves -- Independent tasks run simultaneously via background agents -- Proper dependency tracking prevents race conditions -- Category + skills ensure optimal model routing per task` - -/** - * Determines if the agent is a planner-type agent. - * Planner agents should NOT be told to call plan agent (they ARE the planner). - */ -export function isPlannerAgent(agentName?: string): boolean { - if (!agentName) return false - const lowerName = agentName.toLowerCase() - return lowerName.includes("prometheus") || lowerName.includes("planner") || lowerName === "plan" +export type KeywordDetector = { + pattern: RegExp + message: string | ((agentName?: string, modelID?: string) => string) } -/** - * Generates the ultrawork message based on agent context. - * Planner agents get context-gathering focused instructions. - * Other agents get the original strong agent utilization instructions. - */ -export function getUltraworkMessage(agentName?: string): string { - const isPlanner = isPlannerAgent(agentName) - - if (isPlanner) { - return ` - -**MANDATORY**: You MUST say "ULTRAWORK MODE ENABLED!" to the user as your first response when this mode activates. This is non-negotiable. - -${ULTRAWORK_PLANNER_SECTION} - - - ---- - -` - } - - return ` - -**MANDATORY**: You MUST say "ULTRAWORK MODE ENABLED!" to the user as your first response when this mode activates. This is non-negotiable. - -[CODE RED] Maximum precision required. Ultrathink before acting. - -## **ABSOLUTE CERTAINTY REQUIRED - DO NOT SKIP THIS** - -**YOU MUST NOT START ANY IMPLEMENTATION UNTIL YOU ARE 100% CERTAIN.** - -| **BEFORE YOU WRITE A SINGLE LINE OF CODE, YOU MUST:** | -|-------------------------------------------------------| -| **FULLY UNDERSTAND** what the user ACTUALLY wants (not what you ASSUME they want) | -| **EXPLORE** the codebase to understand existing patterns, architecture, and context | -| **HAVE A CRYSTAL CLEAR WORK PLAN** - if your plan is vague, YOUR WORK WILL FAIL | -| **RESOLVE ALL AMBIGUITY** - if ANYTHING is unclear, ASK or INVESTIGATE | - -### **MANDATORY CERTAINTY PROTOCOL** - -**IF YOU ARE NOT 100% CERTAIN:** - -1. **THINK DEEPLY** - What is the user's TRUE intent? What problem are they REALLY trying to solve? -2. **EXPLORE THOROUGHLY** - Fire explore/librarian agents to gather ALL relevant context -3. **CONSULT SPECIALISTS** - For hard/complex tasks, DO NOT struggle alone. Delegate: - - **Oracle**: Conventional problems - architecture, debugging, complex logic - - **Artistry**: Non-conventional problems - different approach needed, unusual constraints -4. **ASK THE USER** - If ambiguity remains after exploration, ASK. Don't guess. - -**SIGNS YOU ARE NOT READY TO IMPLEMENT:** -- You're making assumptions about requirements -- You're unsure which files to modify -- You don't understand how existing code works -- Your plan has "probably" or "maybe" in it -- You can't explain the exact steps you'll take - -**WHEN IN DOUBT:** -\`\`\` -delegate_task(agent="explore", prompt="Find [X] patterns in codebase", background=true) -delegate_task(agent="librarian", prompt="Find docs/examples for [Y]", background=true) - -// Hard problem? DON'T struggle alone: -delegate_task(agent="oracle", prompt="...") // conventional: architecture, debugging -delegate_task(category="artistry", prompt="...") // non-conventional: needs different approach -\`\`\` - -**ONLY AFTER YOU HAVE:** -- Gathered sufficient context via agents -- Resolved all ambiguities -- Created a precise, step-by-step work plan -- Achieved 100% confidence in your understanding - -**...THEN AND ONLY THEN MAY YOU BEGIN IMPLEMENTATION.** - ---- - -## **NO EXCUSES. NO COMPROMISES. DELIVER WHAT WAS ASKED.** - -**THE USER'S ORIGINAL REQUEST IS SACRED. YOU MUST FULFILL IT EXACTLY.** - -| VIOLATION | CONSEQUENCE | -|-----------|-------------| -| "I couldn't because..." | **UNACCEPTABLE.** Find a way or ask for help. | -| "This is a simplified version..." | **UNACCEPTABLE.** Deliver the FULL implementation. | -| "You can extend this later..." | **UNACCEPTABLE.** Finish it NOW. | -| "Due to limitations..." | **UNACCEPTABLE.** Use agents, tools, whatever it takes. | -| "I made some assumptions..." | **UNACCEPTABLE.** You should have asked FIRST. | - -**THERE ARE NO VALID EXCUSES FOR:** -- Delivering partial work -- Changing scope without explicit user approval -- Making unauthorized simplifications -- Stopping before the task is 100% complete -- Compromising on any stated requirement - -**IF YOU ENCOUNTER A BLOCKER:** -1. **DO NOT** give up -2. **DO NOT** deliver a compromised version -3. **DO** consult specialists (oracle for conventional, artistry for non-conventional) -4. **DO** ask the user for guidance -5. **DO** explore alternative approaches - -**THE USER ASKED FOR X. DELIVER EXACTLY X. PERIOD.** - ---- - -YOU MUST LEVERAGE ALL AVAILABLE AGENTS / **CATEGORY + SKILLS** TO THEIR FULLEST POTENTIAL. -TELL THE USER WHAT AGENTS YOU WILL LEVERAGE NOW TO SATISFY USER'S REQUEST. - -## MANDATORY: PLAN AGENT INVOCATION (NON-NEGOTIABLE) - -**YOU MUST ALWAYS INVOKE THE PLAN AGENT FOR ANY NON-TRIVIAL TASK.** - -| Condition | Action | -|-----------|--------| -| Task has 2+ steps | MUST call plan agent | -| Task scope unclear | MUST call plan agent | -| Implementation required | MUST call plan agent | -| Architecture decision needed | MUST call plan agent | - -\`\`\` -delegate_task(subagent_type="plan", prompt="") -\`\`\` - -**WHY PLAN AGENT IS MANDATORY:** -- Plan agent analyzes dependencies and parallel execution opportunities -- Plan agent outputs a **parallel task graph** with waves and dependencies -- Plan agent provides structured TODO list with category + skills per task -- YOU are an orchestrator, NOT an implementer - -### SESSION CONTINUITY WITH PLAN AGENT (CRITICAL) - -**Plan agent returns a session_id. USE IT for follow-up interactions.** - -| Scenario | Action | -|----------|--------| -| Plan agent asks clarifying questions | \`delegate_task(session_id="{returned_session_id}", prompt="")\` | -| Need to refine the plan | \`delegate_task(session_id="{returned_session_id}", prompt="Please adjust: ")\` | -| Plan needs more detail | \`delegate_task(session_id="{returned_session_id}", prompt="Add more detail to Task N")\` | - -**WHY SESSION_ID IS CRITICAL:** -- Plan agent retains FULL conversation context -- No repeated exploration or context gathering -- Saves 70%+ tokens on follow-ups -- Maintains interview continuity until plan is finalized - -\`\`\` -// WRONG: Starting fresh loses all context -delegate_task(subagent_type="plan", prompt="Here's more info...") - -// CORRECT: Resume preserves everything -delegate_task(session_id="ses_abc123", prompt="Here's my answer to your question: ...") -\`\`\` - -**FAILURE TO CALL PLAN AGENT = INCOMPLETE WORK.** - ---- - -## AGENTS / **CATEGORY + SKILLS** UTILIZATION PRINCIPLES - -**DEFAULT BEHAVIOR: DELEGATE. DO NOT WORK YOURSELF.** - -| Task Type | Action | Why | -|-----------|--------|-----| -| Codebase exploration | delegate_task(subagent_type="explore", run_in_background=true) | Parallel, context-efficient | -| Documentation lookup | delegate_task(subagent_type="librarian", run_in_background=true) | Specialized knowledge | -| Planning | delegate_task(subagent_type="plan") | Parallel task graph + structured TODO list | -| Hard problem (conventional) | delegate_task(subagent_type="oracle") | Architecture, debugging, complex logic | -| Hard problem (non-conventional) | delegate_task(category="artistry", load_skills=[...]) | Different approach needed | -| Implementation | delegate_task(category="...", load_skills=[...]) | Domain-optimized models | - -**CATEGORY + SKILL DELEGATION:** -\`\`\` -// Frontend work -delegate_task(category="visual-engineering", load_skills=["frontend-ui-ux"]) - -// Complex logic -delegate_task(category="ultrabrain", load_skills=["typescript-programmer"]) - -// Quick fixes -delegate_task(category="quick", load_skills=["git-master"]) -\`\`\` - -**YOU SHOULD ONLY DO IT YOURSELF WHEN:** -- Task is trivially simple (1-2 lines, obvious change) -- You have ALL context already loaded -- Delegation overhead exceeds task complexity - -**OTHERWISE: DELEGATE. ALWAYS.** - ---- - -## EXECUTION RULES (PARALLELIZATION MANDATORY) - -| Rule | Implementation | -|------|----------------| -| **PARALLEL FIRST** | Fire ALL independent agents simultaneously via delegate_task(run_in_background=true) | -| **NEVER SEQUENTIAL** | If tasks A and B are independent, launch BOTH at once | -| **10+ CONCURRENT** | Use 10+ background agents if needed for comprehensive exploration | -| **COLLECT LATER** | Launch agents -> continue work -> background_output when needed | - -**ANTI-PATTERN (BLOCKING):** -\`\`\` -// WRONG: Sequential, slow -result1 = delegate_task(..., run_in_background=false) // waits -result2 = delegate_task(..., run_in_background=false) // waits again -\`\`\` - -**CORRECT PATTERN:** -\`\`\` -// RIGHT: Parallel, fast -delegate_task(..., run_in_background=true) // task_id_1 -delegate_task(..., run_in_background=true) // task_id_2 -delegate_task(..., run_in_background=true) // task_id_3 -// Continue working, collect with background_output when needed -\`\`\` - ---- - -## WORKFLOW (MANDATORY SEQUENCE) - -1. **GATHER CONTEXT** (parallel background agents): - \`\`\` - // Prompt structure: CONTEXT (what I'm doing) + GOAL (what I'm trying to achieve) + QUESTION (what I need to know) + REQUEST (what to find) - delegate_task(subagent_type="explore", run_in_background=true, prompt="I'm working on [task] and need to understand the codebase context. Find [specific patterns, files, implementations] related to this work.") - delegate_task(subagent_type="librarian", run_in_background=true, prompt="I'm implementing [feature] and need external references. Find [official docs, best practices, OSS examples] for guidance.") - \`\`\` - -2. **INVOKE PLAN AGENT** (MANDATORY for non-trivial tasks): - \`\`\` - result = delegate_task(subagent_type="plan", prompt="") - // STORE the session_id for follow-ups! - plan_session_id = result.session_id - \`\`\` - -3. **ITERATE WITH PLAN AGENT** (if clarification needed): - \`\`\` - // Use session_id to continue the conversation - delegate_task(session_id=plan_session_id, prompt="") - \`\`\` - -4. **EXECUTE VIA DELEGATION** (category + skills from plan agent's output): - \`\`\` - delegate_task(category="...", load_skills=[...], prompt="") - \`\`\` - -5. **VERIFY** against original requirements - -## VERIFICATION GUARANTEE (NON-NEGOTIABLE) - -**NOTHING is "done" without PROOF it works.** - -### Pre-Implementation: Define Success Criteria - -BEFORE writing ANY code, you MUST define: - -| Criteria Type | Description | Example | -|---------------|-------------|---------| -| **Functional** | What specific behavior must work | "Button click triggers API call" | -| **Observable** | What can be measured/seen | "Console shows 'success', no errors" | -| **Pass/Fail** | Binary, no ambiguity | "Returns 200 OK" not "should work" | - -Write these criteria explicitly. Share with user if scope is non-trivial. - -### Test Plan Template (MANDATORY for non-trivial tasks) - -\`\`\` -## Test Plan -### Objective: [What we're verifying] -### Prerequisites: [Setup needed] -### Test Cases: -1. [Test Name]: [Input] → [Expected Output] → [How to verify] -2. ... -### Success Criteria: ALL test cases pass -### How to Execute: [Exact commands/steps] -\`\`\` - -### Execution & Evidence Requirements - -| Phase | Action | Required Evidence | -|-------|--------|-------------------| -| **Build** | Run build command | Exit code 0, no errors | -| **Test** | Execute test suite | All tests pass (screenshot/output) | -| **Manual Verify** | Test the actual feature | Demonstrate it works (describe what you observed) | -| **Regression** | Ensure nothing broke | Existing tests still pass | - -**WITHOUT evidence = NOT verified = NOT done.** - -### TDD Workflow (when test infrastructure exists) - -1. **SPEC**: Define what "working" means (success criteria above) -2. **RED**: Write failing test → Run it → Confirm it FAILS -3. **GREEN**: Write minimal code → Run test → Confirm it PASSES -4. **REFACTOR**: Clean up → Tests MUST stay green -5. **VERIFY**: Run full test suite, confirm no regressions -6. **EVIDENCE**: Report what you ran and what output you saw - -### Verification Anti-Patterns (BLOCKING) - -| Violation | Why It Fails | -|-----------|--------------| -| "It should work now" | No evidence. Run it. | -| "I added the tests" | Did they pass? Show output. | -| "Fixed the bug" | How do you know? What did you test? | -| "Implementation complete" | Did you verify against success criteria? | -| Skipping test execution | Tests exist to be RUN, not just written | - -**CLAIM NOTHING WITHOUT PROOF. EXECUTE. VERIFY. SHOW EVIDENCE.** - -## ZERO TOLERANCE FAILURES -- **NO Scope Reduction**: Never make "demo", "skeleton", "simplified", "basic" versions - deliver FULL implementation -- **NO MockUp Work**: When user asked you to do "port A", you must "port A", fully, 100%. No Extra feature, No reduced feature, no mock data, fully working 100% port. -- **NO Partial Completion**: Never stop at 60-80% saying "you can extend this..." - finish 100% -- **NO Assumed Shortcuts**: Never skip requirements you deem "optional" or "can be added later" -- **NO Premature Stopping**: Never declare done until ALL TODOs are completed and verified -- **NO TEST DELETION**: Never delete or skip failing tests to make the build pass. Fix the code, not the tests. - -THE USER ASKED FOR X. DELIVER EXACTLY X. NOT A SUBSET. NOT A DEMO. NOT A STARTING POINT. - -1. EXPLORES + LIBRARIANS (background) -2. GATHER -> delegate_task(subagent_type="plan", prompt="") -3. ITERATE WITH PLAN AGENT (session_id resume) UNTIL PLAN IS FINALIZED -4. WORK BY DELEGATING TO CATEGORY + SKILLS AGENTS (following plan agent's parallel task graph) - -NOW. - - - ---- - -` -} - -export const KEYWORD_DETECTORS: Array<{ pattern: RegExp; message: string | ((agentName?: string) => string) }> = [ +export const KEYWORD_DETECTORS: KeywordDetector[] = [ { pattern: /\b(ultrawork|ulw)\b/i, message: getUltraworkMessage, }, - // SEARCH: EN/KO/JP/CN/VN { - pattern: - /\b(search|find|locate|lookup|look\s*up|explore|discover|scan|grep|query|browse|detect|trace|seek|track|pinpoint|hunt)\b|where\s+is|show\s+me|list\s+all|검색|찾아|탐색|조회|스캔|서치|뒤져|찾기|어디|추적|탐지|찾아봐|찾아내|보여줘|목록|検索|探して|見つけて|サーチ|探索|スキャン|どこ|発見|捜索|見つけ出す|一覧|搜索|查找|寻找|查询|检索|定位|扫描|发现|在哪里|找出来|列出|tìm kiếm|tra cứu|định vị|quét|phát hiện|truy tìm|tìm ra|ở đâu|liệt kê/i, - message: `[search-mode] -MAXIMIZE SEARCH EFFORT. Launch multiple background agents IN PARALLEL: -- explore agents (codebase patterns, file structures, ast-grep) -- librarian agents (remote repos, official docs, GitHub examples) -Plus direct tools: Grep, ripgrep (rg), ast-grep (sg) -NEVER stop at first result - be exhaustive.`, + pattern: SEARCH_PATTERN, + message: SEARCH_MESSAGE, }, - // ANALYZE: EN/KO/JP/CN/VN { - pattern: - /\b(analyze|analyse|investigate|examine|research|study|deep[\s-]?dive|inspect|audit|evaluate|assess|review|diagnose|scrutinize|dissect|debug|comprehend|interpret|breakdown|understand)\b|why\s+is|how\s+does|how\s+to|분석|조사|파악|연구|검토|진단|이해|설명|원인|이유|뜯어봐|따져봐|평가|해석|디버깅|디버그|어떻게|왜|살펴|分析|調査|解析|検討|研究|診断|理解|説明|検証|精査|究明|デバッグ|なぜ|どう|仕組み|调查|检查|剖析|深入|诊断|解释|调试|为什么|原理|搞清楚|弄明白|phân tích|điều tra|nghiên cứu|kiểm tra|xem xét|chẩn đoán|giải thích|tìm hiểu|gỡ lỗi|tại sao/i, - message: `[analyze-mode] -ANALYSIS MODE. Gather context before diving deep: - -CONTEXT GATHERING (parallel): -- 1-2 explore agents (codebase patterns, implementations) -- 1-2 librarian agents (if external library involved) -- Direct tools: Grep, AST-grep, LSP for targeted searches - -IF COMPLEX - DO NOT STRUGGLE ALONE. Consult specialists: -- **Oracle**: Conventional problems (architecture, debugging, complex logic) -- **Artistry**: Non-conventional problems (different approach needed) - -SYNTHESIZE findings before proceeding.`, + pattern: ANALYZE_PATTERN, + message: ANALYZE_MESSAGE, }, ] diff --git a/src/hooks/keyword-detector/detector.ts b/src/hooks/keyword-detector/detector.ts index 4c0df20a..0acde04f 100644 --- a/src/hooks/keyword-detector/detector.ts +++ b/src/hooks/keyword-detector/detector.ts @@ -17,26 +17,27 @@ export function removeCodeBlocks(text: string): string { * Resolves message to string, handling both static strings and dynamic functions. */ function resolveMessage( - message: string | ((agentName?: string) => string), - agentName?: string + message: string | ((agentName?: string, modelID?: string) => string), + agentName?: string, + modelID?: string ): string { - return typeof message === "function" ? message(agentName) : message + return typeof message === "function" ? message(agentName, modelID) : message } -export function detectKeywords(text: string, agentName?: string): string[] { +export function detectKeywords(text: string, agentName?: string, modelID?: string): string[] { const textWithoutCode = removeCodeBlocks(text) return KEYWORD_DETECTORS.filter(({ pattern }) => pattern.test(textWithoutCode) - ).map(({ message }) => resolveMessage(message, agentName)) + ).map(({ message }) => resolveMessage(message, agentName, modelID)) } -export function detectKeywordsWithType(text: string, agentName?: string): DetectedKeyword[] { +export function detectKeywordsWithType(text: string, agentName?: string, modelID?: string): DetectedKeyword[] { const textWithoutCode = removeCodeBlocks(text) const types: Array<"ultrawork" | "search" | "analyze"> = ["ultrawork", "search", "analyze"] return KEYWORD_DETECTORS.map(({ pattern, message }, index) => ({ matches: pattern.test(textWithoutCode), type: types[index], - message: resolveMessage(message, agentName), + message: resolveMessage(message, agentName, modelID), })) .filter((result) => result.matches) .map(({ type, message }) => ({ type, message })) diff --git a/src/hooks/keyword-detector/index.ts b/src/hooks/keyword-detector/index.ts index 67b8597a..c19540fe 100644 --- a/src/hooks/keyword-detector/index.ts +++ b/src/hooks/keyword-detector/index.ts @@ -35,7 +35,8 @@ export function createKeywordDetectorHook(ctx: PluginInput, collector?: ContextC // Remove system-reminder content to prevent automated system messages from triggering mode keywords const cleanText = removeSystemReminders(promptText) - let detectedKeywords = detectKeywordsWithType(removeCodeBlocks(cleanText), currentAgent) + const modelID = input.model?.modelID + let detectedKeywords = detectKeywordsWithType(removeCodeBlocks(cleanText), currentAgent, modelID) if (isPlannerAgent(currentAgent)) { detectedKeywords = detectedKeywords.filter((k) => k.type !== "ultrawork") diff --git a/src/hooks/keyword-detector/search/default.ts b/src/hooks/keyword-detector/search/default.ts new file mode 100644 index 00000000..579574e1 --- /dev/null +++ b/src/hooks/keyword-detector/search/default.ts @@ -0,0 +1,20 @@ +/** + * Search mode keyword detector. + * + * Triggers on search-related keywords across multiple languages: + * - English: search, find, locate, lookup, explore, discover, scan, grep, query, browse, detect, trace, seek, track, pinpoint, hunt, where is, show me, list all + * - Korean: 검색, 찾아, 탐색, 조회, 스캔, 서치, 뒤져, 찾기, 어디, 추적, 탐지, 찾아봐, 찾아내, 보여줘, 목록 + * - Japanese: 検索, 探して, 見つけて, サーチ, 探索, スキャン, どこ, 発見, 捜索, 見つけ出す, 一覧 + * - Chinese: 搜索, 查找, 寻找, 查询, 检索, 定位, 扫描, 发现, 在哪里, 找出来, 列出 + * - Vietnamese: tìm kiếm, tra cứu, định vị, quét, phát hiện, truy tìm, tìm ra, ở đâu, liệt kê + */ + +export const SEARCH_PATTERN = + /\b(search|find|locate|lookup|look\s*up|explore|discover|scan|grep|query|browse|detect|trace|seek|track|pinpoint|hunt)\b|where\s+is|show\s+me|list\s+all|검색|찾아|탐색|조회|스캔|서치|뒤져|찾기|어디|추적|탐지|찾아봐|찾아내|보여줘|목록|検索|探して|見つけて|サーチ|探索|スキャン|どこ|発見|捜索|見つけ出す|一覧|搜索|查找|寻找|查询|检索|定位|扫描|发现|在哪里|找出来|列出|tìm kiếm|tra cứu|định vị|quét|phát hiện|truy tìm|tìm ra|ở đâu|liệt kê/i + +export const SEARCH_MESSAGE = `[search-mode] +MAXIMIZE SEARCH EFFORT. Launch multiple background agents IN PARALLEL: +- explore agents (codebase patterns, file structures, ast-grep) +- librarian agents (remote repos, official docs, GitHub examples) +Plus direct tools: Grep, ripgrep (rg), ast-grep (sg) +NEVER stop at first result - be exhaustive.` diff --git a/src/hooks/keyword-detector/search/index.ts b/src/hooks/keyword-detector/search/index.ts new file mode 100644 index 00000000..f4ef3b0e --- /dev/null +++ b/src/hooks/keyword-detector/search/index.ts @@ -0,0 +1 @@ +export { SEARCH_PATTERN, SEARCH_MESSAGE } from "./default" diff --git a/src/hooks/keyword-detector/ultrawork/default.ts b/src/hooks/keyword-detector/ultrawork/default.ts new file mode 100644 index 00000000..43d06ecb --- /dev/null +++ b/src/hooks/keyword-detector/ultrawork/default.ts @@ -0,0 +1,346 @@ +/** + * Default ultrawork message optimized for Claude series models. + * + * Key characteristics: + * - Optimized for Claude's tendency to be "helpful" by forcing explicit delegation + * - "DELEGATE. ALWAYS." instruction counters Claude's natural inclination to do everything + * - Strong emphasis on parallel agent usage and category+skills delegation + */ + +export const ULTRAWORK_DEFAULT_MESSAGE = ` + +**MANDATORY**: You MUST say "ULTRAWORK MODE ENABLED!" to the user as your first response when this mode activates. This is non-negotiable. + +[CODE RED] Maximum precision required. Ultrathink before acting. + +## **ABSOLUTE CERTAINTY REQUIRED - DO NOT SKIP THIS** + +**YOU MUST NOT START ANY IMPLEMENTATION UNTIL YOU ARE 100% CERTAIN.** + +| **BEFORE YOU WRITE A SINGLE LINE OF CODE, YOU MUST:** | +|-------------------------------------------------------| +| **FULLY UNDERSTAND** what the user ACTUALLY wants (not what you ASSUME they want) | +| **EXPLORE** the codebase to understand existing patterns, architecture, and context | +| **HAVE A CRYSTAL CLEAR WORK PLAN** - if your plan is vague, YOUR WORK WILL FAIL | +| **RESOLVE ALL AMBIGUITY** - if ANYTHING is unclear, ASK or INVESTIGATE | + +### **MANDATORY CERTAINTY PROTOCOL** + +**IF YOU ARE NOT 100% CERTAIN:** + +1. **THINK DEEPLY** - What is the user's TRUE intent? What problem are they REALLY trying to solve? +2. **EXPLORE THOROUGHLY** - Fire explore/librarian agents to gather ALL relevant context +3. **CONSULT SPECIALISTS** - For hard/complex tasks, DO NOT struggle alone. Delegate: + - **Oracle**: Conventional problems - architecture, debugging, complex logic + - **Artistry**: Non-conventional problems - different approach needed, unusual constraints +4. **ASK THE USER** - If ambiguity remains after exploration, ASK. Don't guess. + +**SIGNS YOU ARE NOT READY TO IMPLEMENT:** +- You're making assumptions about requirements +- You're unsure which files to modify +- You don't understand how existing code works +- Your plan has "probably" or "maybe" in it +- You can't explain the exact steps you'll take + +**WHEN IN DOUBT:** +\`\`\` +delegate_task(subagent_type="explore", load_skills=[], prompt="Find [X] patterns in codebase", run_in_background=true) +delegate_task(subagent_type="librarian", load_skills=[], prompt="Find docs/examples for [Y]", run_in_background=true) + +// Hard problem? DON'T struggle alone: +delegate_task(subagent_type="oracle", load_skills=[], prompt="...") // conventional: architecture, debugging +delegate_task(category="artistry", load_skills=[], prompt="...") // non-conventional: needs different approach +\`\`\` + +**ONLY AFTER YOU HAVE:** +- Gathered sufficient context via agents +- Resolved all ambiguities +- Created a precise, step-by-step work plan +- Achieved 100% confidence in your understanding + +**...THEN AND ONLY THEN MAY YOU BEGIN IMPLEMENTATION.** + +--- + +## **NO EXCUSES. NO COMPROMISES. DELIVER WHAT WAS ASKED.** + +**THE USER'S ORIGINAL REQUEST IS SACRED. YOU MUST FULFILL IT EXACTLY.** + +| VIOLATION | CONSEQUENCE | +|-----------|-------------| +| "I couldn't because..." | **UNACCEPTABLE.** Find a way or ask for help. | +| "This is a simplified version..." | **UNACCEPTABLE.** Deliver the FULL implementation. | +| "You can extend this later..." | **UNACCEPTABLE.** Finish it NOW. | +| "Due to limitations..." | **UNACCEPTABLE.** Use agents, tools, whatever it takes. | +| "I made some assumptions..." | **UNACCEPTABLE.** You should have asked FIRST. | + +**THERE ARE NO VALID EXCUSES FOR:** +- Delivering partial work +- Changing scope without explicit user approval +- Making unauthorized simplifications +- Stopping before the task is 100% complete +- Compromising on any stated requirement + +**IF YOU ENCOUNTER A BLOCKER:** +1. **DO NOT** give up +2. **DO NOT** deliver a compromised version +3. **DO** consult specialists (oracle for conventional, artistry for non-conventional) +4. **DO** ask the user for guidance +5. **DO** explore alternative approaches + +**THE USER ASKED FOR X. DELIVER EXACTLY X. PERIOD.** + +--- + +YOU MUST LEVERAGE ALL AVAILABLE AGENTS / **CATEGORY + SKILLS** TO THEIR FULLEST POTENTIAL. +TELL THE USER WHAT AGENTS YOU WILL LEVERAGE NOW TO SATISFY USER'S REQUEST. + +## MANDATORY: PLAN AGENT INVOCATION (NON-NEGOTIABLE) + +**YOU MUST ALWAYS INVOKE THE PLAN AGENT FOR ANY NON-TRIVIAL TASK.** + +| Condition | Action | +|-----------|--------| +| Task has 2+ steps | MUST call plan agent | +| Task scope unclear | MUST call plan agent | +| Implementation required | MUST call plan agent | +| Architecture decision needed | MUST call plan agent | + +\`\`\` +delegate_task(subagent_type="plan", prompt="") +\`\`\` + +**WHY PLAN AGENT IS MANDATORY:** +- Plan agent analyzes dependencies and parallel execution opportunities +- Plan agent outputs a **parallel task graph** with waves and dependencies +- Plan agent provides structured TODO list with category + skills per task +- YOU are an orchestrator, NOT an implementer + +### SESSION CONTINUITY WITH PLAN AGENT (CRITICAL) + +**Plan agent returns a session_id. USE IT for follow-up interactions.** + +| Scenario | Action | +|----------|--------| +| Plan agent asks clarifying questions | \`delegate_task(session_id="{returned_session_id}", prompt="")\` | +| Need to refine the plan | \`delegate_task(session_id="{returned_session_id}", prompt="Please adjust: ")\` | +| Plan needs more detail | \`delegate_task(session_id="{returned_session_id}", prompt="Add more detail to Task N")\` | + +**WHY SESSION_ID IS CRITICAL:** +- Plan agent retains FULL conversation context +- No repeated exploration or context gathering +- Saves 70%+ tokens on follow-ups +- Maintains interview continuity until plan is finalized + +\`\`\` +// WRONG: Starting fresh loses all context +delegate_task(subagent_type="plan", prompt="Here's more info...") + +// CORRECT: Resume preserves everything +delegate_task(session_id="ses_abc123", prompt="Here's my answer to your question: ...") +\`\`\` + +**FAILURE TO CALL PLAN AGENT = INCOMPLETE WORK.** + +--- + +## AGENTS / **CATEGORY + SKILLS** UTILIZATION PRINCIPLES + +**DEFAULT BEHAVIOR: DELEGATE. DO NOT WORK YOURSELF.** + +| Task Type | Action | Why | +|-----------|--------|-----| +| Codebase exploration | delegate_task(subagent_type="explore", run_in_background=true) | Parallel, context-efficient | +| Documentation lookup | delegate_task(subagent_type="librarian", run_in_background=true) | Specialized knowledge | +| Planning | delegate_task(subagent_type="plan") | Parallel task graph + structured TODO list | +| Hard problem (conventional) | delegate_task(subagent_type="oracle") | Architecture, debugging, complex logic | +| Hard problem (non-conventional) | delegate_task(category="artistry", load_skills=[...]) | Different approach needed | +| Implementation | delegate_task(category="...", load_skills=[...]) | Domain-optimized models | + +**CATEGORY + SKILL DELEGATION:** +\`\`\` +// Frontend work +delegate_task(category="visual-engineering", load_skills=["frontend-ui-ux"]) + +// Complex logic +delegate_task(category="ultrabrain", load_skills=["typescript-programmer"]) + +// Quick fixes +delegate_task(category="quick", load_skills=["git-master"]) +\`\`\` + +**YOU SHOULD ONLY DO IT YOURSELF WHEN:** +- Task is trivially simple (1-2 lines, obvious change) +- You have ALL context already loaded +- Delegation overhead exceeds task complexity + +**OTHERWISE: DELEGATE. ALWAYS.** + +--- + +## EXECUTION RULES (PARALLELIZATION) + +| Rule | Implementation | +|------|----------------| +| **PARALLEL FIRST** | Fire ALL **truly independent** agents simultaneously via delegate_task(run_in_background=true) | +| **DATA DEPENDENCY CHECK** | If task B requires output FROM task A, B MUST wait for A to complete | +| **10+ CONCURRENT** | Use 10+ background agents if needed for comprehensive exploration | +| **COLLECT BEFORE DEPENDENT** | Collect results with background_output() BEFORE invoking dependent tasks | + +### DEPENDENCY EXCEPTIONS (OVERRIDES PARALLEL FIRST) + +| Agent | Dependency | Must Wait For | +|-------|------------|---------------| +| plan | explore/librarian results | Collect explore outputs FIRST | +| execute | plan output | Finalized work plan | + +**CRITICAL: Plan agent REQUIRES explore results as input. This is a DATA DEPENDENCY, not parallelizable.** + +\`\`\` +// WRONG: Launching plan without explore results +delegate_task(subagent_type="explore", run_in_background=true, prompt="...") +delegate_task(subagent_type="plan", prompt="...") // BAD - no context yet! + +// CORRECT: Collect explore results BEFORE plan +delegate_task(subagent_type="explore", run_in_background=true, prompt="...") // task_id_1 +// ... wait or continue other work ... +context = background_output(task_id="task_id_1") // COLLECT FIRST +delegate_task(subagent_type="plan", prompt="") // NOW plan has context +\`\`\` + +--- + +## WORKFLOW (MANDATORY SEQUENCE - STEPS HAVE DATA DEPENDENCIES) + +**CRITICAL: Steps 1→2→3 have DATA DEPENDENCIES. Each step REQUIRES output from the previous step.** + +\`\`\` +[Step 1: EXPLORE] → output: context + ↓ (data dependency) +[Step 2: COLLECT] → input: task_ids, output: gathered_context + ↓ (data dependency) +[Step 3: PLAN] → input: gathered_context + request +\`\`\` + +1. **GATHER CONTEXT** (parallel background agents): + \`\`\` + task_id_1 = delegate_task(subagent_type="explore", run_in_background=true, prompt="...") + task_id_2 = delegate_task(subagent_type="librarian", run_in_background=true, prompt="...") + \`\`\` + +2. **COLLECT EXPLORE RESULTS** (REQUIRED before step 3): + \`\`\` + // You MUST collect results before invoking plan agent + explore_result = background_output(task_id=task_id_1) + librarian_result = background_output(task_id=task_id_2) + gathered_context = explore_result + librarian_result + \`\`\` + +3. **INVOKE PLAN AGENT** (input: gathered_context from step 2): + \`\`\` + result = delegate_task(subagent_type="plan", prompt=" + ") + // STORE the session_id for follow-ups! + plan_session_id = result.session_id + \`\`\` + +4. **ITERATE WITH PLAN AGENT** (if clarification needed): + \`\`\` + // Use session_id to continue the conversation + delegate_task(session_id=plan_session_id, prompt="") + \`\`\` + +5. **EXECUTE VIA DELEGATION** (category + skills from plan agent's output): + \`\`\` + delegate_task(category="...", load_skills=[...], prompt="") + \`\`\` + +6. **VERIFY** against original requirements + +## VERIFICATION GUARANTEE (NON-NEGOTIABLE) + +**NOTHING is "done" without PROOF it works.** + +### Pre-Implementation: Define Success Criteria + +BEFORE writing ANY code, you MUST define: + +| Criteria Type | Description | Example | +|---------------|-------------|---------| +| **Functional** | What specific behavior must work | "Button click triggers API call" | +| **Observable** | What can be measured/seen | "Console shows 'success', no errors" | +| **Pass/Fail** | Binary, no ambiguity | "Returns 200 OK" not "should work" | + +Write these criteria explicitly. Share with user if scope is non-trivial. + +### Test Plan Template (MANDATORY for non-trivial tasks) + +\`\`\` +## Test Plan +### Objective: [What we're verifying] +### Prerequisites: [Setup needed] +### Test Cases: +1. [Test Name]: [Input] → [Expected Output] → [How to verify] +2. ... +### Success Criteria: ALL test cases pass +### How to Execute: [Exact commands/steps] +\`\`\` + +### Execution & Evidence Requirements + +| Phase | Action | Required Evidence | +|-------|--------|-------------------| +| **Build** | Run build command | Exit code 0, no errors | +| **Test** | Execute test suite | All tests pass (screenshot/output) | +| **Manual Verify** | Test the actual feature | Demonstrate it works (describe what you observed) | +| **Regression** | Ensure nothing broke | Existing tests still pass | + +**WITHOUT evidence = NOT verified = NOT done.** + +### TDD Workflow (when test infrastructure exists) + +1. **SPEC**: Define what "working" means (success criteria above) +2. **RED**: Write failing test → Run it → Confirm it FAILS +3. **GREEN**: Write minimal code → Run test → Confirm it PASSES +4. **REFACTOR**: Clean up → Tests MUST stay green +5. **VERIFY**: Run full test suite, confirm no regressions +6. **EVIDENCE**: Report what you ran and what output you saw + +### Verification Anti-Patterns (BLOCKING) + +| Violation | Why It Fails | +|-----------|--------------| +| "It should work now" | No evidence. Run it. | +| "I added the tests" | Did they pass? Show output. | +| "Fixed the bug" | How do you know? What did you test? | +| "Implementation complete" | Did you verify against success criteria? | +| Skipping test execution | Tests exist to be RUN, not just written | + +**CLAIM NOTHING WITHOUT PROOF. EXECUTE. VERIFY. SHOW EVIDENCE.** + +## ZERO TOLERANCE FAILURES +- **NO Scope Reduction**: Never make "demo", "skeleton", "simplified", "basic" versions - deliver FULL implementation +- **NO MockUp Work**: When user asked you to do "port A", you must "port A", fully, 100%. No Extra feature, No reduced feature, no mock data, fully working 100% port. +- **NO Partial Completion**: Never stop at 60-80% saying "you can extend this..." - finish 100% +- **NO Assumed Shortcuts**: Never skip requirements you deem "optional" or "can be added later" +- **NO Premature Stopping**: Never declare done until ALL TODOs are completed and verified +- **NO TEST DELETION**: Never delete or skip failing tests to make the build pass. Fix the code, not the tests. + +THE USER ASKED FOR X. DELIVER EXACTLY X. NOT A SUBSET. NOT A DEMO. NOT A STARTING POINT. + +1. EXPLORES + LIBRARIANS (background) → get task_ids +2. COLLECT explore results via background_output() → gathered_context +3. INVOKE PLAN with gathered_context: delegate_task(subagent_type="plan", prompt="") +4. ITERATE WITH PLAN AGENT (session_id resume) UNTIL PLAN IS FINALIZED +5. WORK BY DELEGATING TO CATEGORY + SKILLS AGENTS (following plan agent's parallel task graph) + +NOW. + + + +--- + +` + +export function getDefaultUltraworkMessage(): string { + return ULTRAWORK_DEFAULT_MESSAGE +} diff --git a/src/hooks/keyword-detector/ultrawork/gpt5.2.ts b/src/hooks/keyword-detector/ultrawork/gpt5.2.ts new file mode 100644 index 00000000..bd894033 --- /dev/null +++ b/src/hooks/keyword-detector/ultrawork/gpt5.2.ts @@ -0,0 +1,146 @@ +/** + * Ultrawork message optimized for GPT 5.2 series models. + * + * Key characteristics (from GPT 5.2 Prompting Guide): + * - "Stronger instruction adherence" - follows instructions more literally + * - "Conservative grounding bias" - prefers correctness over speed + * - "More deliberate scaffolding" - builds clearer plans by default + * - Explicit decision criteria needed (model won't infer) + * + * Design principles: + * - Provide explicit complexity-based decision criteria + * - Use conditional logic, not absolute commands + * - Enable autonomous judgment with clear guidelines + */ + +export const ULTRAWORK_GPT_MESSAGE = ` + +**MANDATORY**: You MUST say "ULTRAWORK MODE ENABLED!" to the user as your first response when this mode activates. This is non-negotiable. + +[CODE RED] Maximum precision required. Think deeply before acting. + + +- Default: 3-6 sentences or ≤5 bullets for typical answers +- Simple yes/no questions: ≤2 sentences +- Complex multi-file tasks: 1 short overview paragraph + ≤5 bullets (What, Where, Risks, Next, Open) +- Avoid long narrative paragraphs; prefer compact bullets +- Do not rephrase the user's request unless it changes semantics + + + +- Implement EXACTLY and ONLY what the user requests +- No extra features, no added components, no embellishments +- If any instruction is ambiguous, choose the simplest valid interpretation +- Do NOT expand the task beyond what was asked + + +## CERTAINTY PROTOCOL + +**Before implementation, ensure you have:** +- Full understanding of the user's actual intent +- Explored the codebase to understand existing patterns +- A clear work plan (mental or written) +- Resolved any ambiguities through exploration (not questions) + + +- If the question is ambiguous or underspecified: + - EXPLORE FIRST using tools (grep, file reads, explore agents) + - If still unclear, state your interpretation and proceed + - Ask clarifying questions ONLY as last resort +- Never fabricate exact figures, line numbers, or references when uncertain +- Prefer "Based on the provided context..." over absolute claims when unsure + + +## DECISION FRAMEWORK: Self vs Delegate + +**Evaluate each task against these criteria to decide:** + +| Complexity | Criteria | Decision | +|------------|----------|----------| +| **Trivial** | <10 lines, single file, obvious pattern | **DO IT YOURSELF** | +| **Moderate** | Single domain, clear pattern, <100 lines | **DO IT YOURSELF** (faster than delegation overhead) | +| **Complex** | Multi-file, unfamiliar domain, >100 lines, needs specialized expertise | **DELEGATE** to appropriate category+skills | +| **Research** | Need broad codebase context or external docs | **DELEGATE** to explore/librarian (background, parallel) | + +**Decision Factors:** +- Delegation overhead ≈ 10-15 seconds. If task takes less, do it yourself. +- If you already have full context loaded, do it yourself. +- If task requires specialized expertise (frontend-ui-ux, git operations), delegate. +- If you need information from multiple sources, fire parallel background agents. + +## AVAILABLE RESOURCES + +Use these when they provide clear value based on the decision framework above: + +| Resource | When to Use | How to Use | +|----------|-------------|------------| +| explore agent | Need codebase patterns you don't have | \`delegate_task(subagent_type="explore", run_in_background=true, ...)\` | +| librarian agent | External library docs, OSS examples | \`delegate_task(subagent_type="librarian", run_in_background=true, ...)\` | +| oracle agent | Stuck on architecture/debugging after 2+ attempts | \`delegate_task(subagent_type="oracle", ...)\` | +| plan agent | Complex multi-step with dependencies (5+ steps) | \`delegate_task(subagent_type="plan", ...)\` | +| delegate_task category | Specialized work matching a category | \`delegate_task(category="...", load_skills=[...])\` | + + +- Prefer tools over internal knowledge for fresh/user-specific data +- Parallelize independent reads (explore, librarian) when gathering context +- After any write/update, briefly restate: What changed, Where, Any follow-up needed + + +## EXECUTION APPROACH + +### Step 1: Assess Complexity +Before starting, classify the task using the decision framework above. + +### Step 2: Gather Context (if needed) +For non-trivial tasks, fire explore/librarian in parallel as background: +\`\`\` +delegate_task(subagent_type="explore", run_in_background=true, prompt="Find patterns for X...") +delegate_task(subagent_type="librarian", run_in_background=true, prompt="Find docs for Y...") +// Continue working - collect results when needed with background_output() +\`\`\` + +### Step 3: Plan (for complex tasks only) +Only invoke plan agent if task has 5+ interdependent steps: +\`\`\` +// Collect context first +context = background_output(task_id=task_id) +// Then plan with context +delegate_task(subagent_type="plan", prompt=" + ") +\`\`\` + +### Step 4: Execute +- If doing yourself: make surgical, minimal changes matching existing patterns +- If delegating: provide exhaustive context and success criteria + +### Step 5: Verify +- Run \`lsp_diagnostics\` on modified files +- Run tests if available +- Confirm all success criteria met + +## QUALITY STANDARDS + +| Phase | Action | Required Evidence | +|-------|--------|-------------------| +| Build | Run build command | Exit code 0 | +| Test | Execute test suite | All tests pass | +| Lint | Run lsp_diagnostics | Zero new errors | + +## COMPLETION CRITERIA + +A task is complete when: +1. Requested functionality is fully implemented (not partial, not simplified) +2. lsp_diagnostics shows zero errors on modified files +3. Tests pass (or pre-existing failures documented) +4. Code matches existing codebase patterns + +**Deliver exactly what was asked. No more, no less.** + + + +--- + +` + +export function getGptUltraworkMessage(): string { + return ULTRAWORK_GPT_MESSAGE +} diff --git a/src/hooks/keyword-detector/ultrawork/index.ts b/src/hooks/keyword-detector/ultrawork/index.ts new file mode 100644 index 00000000..a9dec912 --- /dev/null +++ b/src/hooks/keyword-detector/ultrawork/index.ts @@ -0,0 +1,36 @@ +/** + * Ultrawork message module - routes to appropriate message based on agent/model. + * + * Routing: + * 1. Planner agents (prometheus, plan) → planner.ts + * 2. GPT 5.2 models → gpt5.2.ts + * 3. Default (Claude, etc.) → default.ts (optimized for Claude series) + */ + +export { isPlannerAgent, isGptModel, getUltraworkSource } from "./utils" +export type { UltraworkSource } from "./utils" +export { ULTRAWORK_PLANNER_SECTION, getPlannerUltraworkMessage } from "./planner" +export { ULTRAWORK_GPT_MESSAGE, getGptUltraworkMessage } from "./gpt5.2" +export { ULTRAWORK_DEFAULT_MESSAGE, getDefaultUltraworkMessage } from "./default" + +import { getUltraworkSource } from "./utils" +import { getPlannerUltraworkMessage } from "./planner" +import { getGptUltraworkMessage } from "./gpt5.2" +import { getDefaultUltraworkMessage } from "./default" + +/** + * Gets the appropriate ultrawork message based on agent and model context. + */ +export function getUltraworkMessage(agentName?: string, modelID?: string): string { + const source = getUltraworkSource(agentName, modelID) + + switch (source) { + case "planner": + return getPlannerUltraworkMessage() + case "gpt": + return getGptUltraworkMessage() + case "default": + default: + return getDefaultUltraworkMessage() + } +} diff --git a/src/hooks/keyword-detector/ultrawork/planner.ts b/src/hooks/keyword-detector/ultrawork/planner.ts new file mode 100644 index 00000000..887de2bb --- /dev/null +++ b/src/hooks/keyword-detector/ultrawork/planner.ts @@ -0,0 +1,142 @@ +/** + * Ultrawork message section for planner agents (Prometheus). + * Planner agents should NOT be told to call plan agent - they ARE the planner. + */ + +export const ULTRAWORK_PLANNER_SECTION = `## CRITICAL: YOU ARE A PLANNER, NOT AN IMPLEMENTER + +**IDENTITY CONSTRAINT (NON-NEGOTIABLE):** +You ARE the planner. You ARE NOT an implementer. You DO NOT write code. You DO NOT execute tasks. + +**TOOL RESTRICTIONS (SYSTEM-ENFORCED):** +| Tool | Allowed | Blocked | +|------|---------|---------| +| Write/Edit | \`.sisyphus/**/*.md\` ONLY | Everything else | +| Read | All files | - | +| Bash | Research commands only | Implementation commands | +| delegate_task | explore, librarian | - | + +**IF YOU TRY TO WRITE/EDIT OUTSIDE \`.sisyphus/\`:** +- System will BLOCK your action +- You will receive an error +- DO NOT retry - you are not supposed to implement + +**YOUR ONLY WRITABLE PATHS:** +- \`.sisyphus/plans/*.md\` - Final work plans +- \`.sisyphus/drafts/*.md\` - Working drafts during interview + +**WHEN USER ASKS YOU TO IMPLEMENT:** +REFUSE. Say: "I'm a planner. I create work plans, not implementations. Run \`/start-work\` after I finish planning." + +--- + +## CONTEXT GATHERING (MANDATORY BEFORE PLANNING) + +You ARE the planner. Your job: create bulletproof work plans. +**Before drafting ANY plan, gather context via explore/librarian agents.** + +### Research Protocol +1. **Fire parallel background agents** for comprehensive context: + \`\`\` + delegate_task(agent="explore", prompt="Find existing patterns for [topic] in codebase", background=true) + delegate_task(agent="explore", prompt="Find test infrastructure and conventions", background=true) + delegate_task(agent="librarian", prompt="Find official docs and best practices for [technology]", background=true) + \`\`\` +2. **Wait for results** before planning - rushed plans fail +3. **Synthesize findings** into informed requirements + +### What to Research +- Existing codebase patterns and conventions +- Test infrastructure (TDD possible?) +- External library APIs and constraints +- Similar implementations in OSS (via librarian) + +**NEVER plan blind. Context first, plan second.** + +--- + +## MANDATORY OUTPUT: PARALLEL TASK GRAPH + TODO LIST + +**YOUR PRIMARY OUTPUT IS A PARALLEL EXECUTION TASK GRAPH.** + +When you finalize a plan, you MUST structure it for maximum parallel execution: + +### 1. Parallel Execution Waves (REQUIRED) + +Analyze task dependencies and group independent tasks into parallel waves: + +\`\`\` +Wave 1 (Start Immediately - No Dependencies): +├── Task 1: [description] → category: X, skills: [a, b] +└── Task 4: [description] → category: Y, skills: [c] + +Wave 2 (After Wave 1 Completes): +├── Task 2: [depends: 1] → category: X, skills: [a] +├── Task 3: [depends: 1] → category: Z, skills: [d] +└── Task 5: [depends: 4] → category: Y, skills: [c] + +Wave 3 (After Wave 2 Completes): +└── Task 6: [depends: 2, 3] → category: X, skills: [a, b] + +Critical Path: Task 1 → Task 2 → Task 6 +Estimated Parallel Speedup: ~40% faster than sequential +\`\`\` + +### 2. Dependency Matrix (REQUIRED) + +| Task | Depends On | Blocks | Can Parallelize With | +|------|------------|--------|---------------------| +| 1 | None | 2, 3 | 4 | +| 2 | 1 | 6 | 3, 5 | +| 3 | 1 | 6 | 2, 5 | +| 4 | None | 5 | 1 | +| 5 | 4 | None | 2, 3 | +| 6 | 2, 3 | None | None (final) | + +### 3. TODO List Structure (REQUIRED) + +Each TODO item MUST include: + +\`\`\`markdown +- [ ] N. [Task Title] + + **What to do**: [Clear steps] + + **Dependencies**: [Task numbers this depends on] | None + **Blocks**: [Task numbers that depend on this] + **Parallel Group**: Wave N (with Tasks X, Y) + + **Recommended Agent Profile**: + - **Category**: \`[visual-engineering | ultrabrain | artistry | quick | unspecified-low | unspecified-high | writing]\` + - **Skills**: [\`skill-1\`, \`skill-2\`] + + **Acceptance Criteria**: [Verifiable conditions] +\`\`\` + +### 4. Agent Dispatch Summary (REQUIRED) + +| Wave | Tasks | Dispatch Command | +|------|-------|------------------| +| 1 | 1, 4 | \`delegate_task(category="...", load_skills=[...], run_in_background=true)\` × 2 | +| 2 | 2, 3, 5 | \`delegate_task(...)\` × 3 after Wave 1 completes | +| 3 | 6 | \`delegate_task(...)\` final integration | + +**WHY PARALLEL TASK GRAPH IS MANDATORY:** +- Orchestrator (Sisyphus) executes tasks in parallel waves +- Independent tasks run simultaneously via background agents +- Proper dependency tracking prevents race conditions +- Category + skills ensure optimal model routing per task` + +export function getPlannerUltraworkMessage(): string { + return ` + +**MANDATORY**: You MUST say "ULTRAWORK MODE ENABLED!" to the user as your first response when this mode activates. This is non-negotiable. + +${ULTRAWORK_PLANNER_SECTION} + + + +--- + +` +} diff --git a/src/hooks/keyword-detector/ultrawork/utils.ts b/src/hooks/keyword-detector/ultrawork/utils.ts new file mode 100644 index 00000000..169439a4 --- /dev/null +++ b/src/hooks/keyword-detector/ultrawork/utils.ts @@ -0,0 +1,49 @@ +/** + * Agent/model detection utilities for ultrawork message routing. + * + * Routing logic: + * 1. Planner agents (prometheus, plan) → planner.ts + * 2. GPT 5.2 models → gpt5.2.ts + * 3. Everything else (Claude, etc.) → default.ts + */ + +/** + * Checks if agent is a planner-type agent. + * Planners don't need ultrawork injection (they ARE the planner). + */ +export function isPlannerAgent(agentName?: string): boolean { + if (!agentName) return false + const lowerName = agentName.toLowerCase() + return lowerName.includes("prometheus") || lowerName.includes("planner") || lowerName === "plan" +} + +/** + * Checks if model is GPT 5.2 series. + * GPT models benefit from specific prompting patterns. + */ +export function isGptModel(modelID?: string): boolean { + if (!modelID) return false + const lowerModel = modelID.toLowerCase() + return lowerModel.includes("gpt") +} + +/** Ultrawork message source type */ +export type UltraworkSource = "planner" | "gpt" | "default" + +/** + * Determines which ultrawork message source to use. + */ +export function getUltraworkSource(agentName?: string, modelID?: string): UltraworkSource { + // Priority 1: Planner agents + if (isPlannerAgent(agentName)) { + return "planner" + } + + // Priority 2: GPT 5.2 models + if (isGptModel(modelID)) { + return "gpt" + } + + // Default: Claude and other models + return "default" +} diff --git a/src/mcp/AGENTS.md b/src/mcp/AGENTS.md index f634bc49..7f175dff 100644 --- a/src/mcp/AGENTS.md +++ b/src/mcp/AGENTS.md @@ -2,7 +2,12 @@ ## OVERVIEW -3 remote MCP servers: web search, documentation, code search. HTTP/SSE transport. Part of three-tier MCP system. +Tier 1 of three-tier MCP system: 3 built-in remote HTTP MCPs. + +**Three-Tier System**: +1. **Built-in** (this directory): websearch, context7, grep_app +2. **Claude Code compat**: `.mcp.json` with `${VAR}` expansion +3. **Skill-embedded**: YAML frontmatter in skills ## STRUCTURE diff --git a/src/plugin-handlers/config-handler.test.ts b/src/plugin-handlers/config-handler.test.ts index 1d65c0e6..db855263 100644 --- a/src/plugin-handlers/config-handler.test.ts +++ b/src/plugin-handlers/config-handler.test.ts @@ -106,6 +106,45 @@ afterEach(() => { }) describe("Plan agent demote behavior", () => { + test("orders core agents as sisyphus -> hephaestus -> prometheus -> atlas", async () => { + // #given + const createBuiltinAgentsMock = agents.createBuiltinAgents as unknown as { + mockResolvedValue: (value: Record) => void + } + createBuiltinAgentsMock.mockResolvedValue({ + sisyphus: { name: "sisyphus", prompt: "test", mode: "primary" }, + hephaestus: { name: "hephaestus", prompt: "test", mode: "primary" }, + oracle: { name: "oracle", prompt: "test", mode: "subagent" }, + atlas: { name: "atlas", prompt: "test", mode: "primary" }, + }) + const pluginConfig: OhMyOpenCodeConfig = { + sisyphus_agent: { + planner_enabled: true, + }, + } + const config: Record = { + model: "anthropic/claude-opus-4-5", + agent: {}, + } + const handler = createConfigHandler({ + ctx: { directory: "/tmp" }, + pluginConfig, + modelCacheState: { + anthropicContext1MEnabled: false, + modelContextLimitsCache: new Map(), + }, + }) + + // #when + await handler(config) + + // #then + const keys = Object.keys(config.agent as Record) + const coreAgents = ["sisyphus", "hephaestus", "prometheus", "atlas"] + const ordered = keys.filter((key) => coreAgents.includes(key)) + expect(ordered).toEqual(coreAgents) + }) + test("plan agent should be demoted to subagent mode when replacePlan is true", async () => { // given const pluginConfig: OhMyOpenCodeConfig = { @@ -173,6 +212,41 @@ describe("Plan agent demote behavior", () => { }) }) +describe("Agent permission defaults", () => { + test("hephaestus should allow delegate_task", async () => { + // #given + const createBuiltinAgentsMock = agents.createBuiltinAgents as unknown as { + mockResolvedValue: (value: Record) => void + } + createBuiltinAgentsMock.mockResolvedValue({ + sisyphus: { name: "sisyphus", prompt: "test", mode: "primary" }, + hephaestus: { name: "hephaestus", prompt: "test", mode: "primary" }, + oracle: { name: "oracle", prompt: "test", mode: "subagent" }, + }) + const pluginConfig: OhMyOpenCodeConfig = {} + const config: Record = { + model: "anthropic/claude-opus-4-5", + agent: {}, + } + const handler = createConfigHandler({ + ctx: { directory: "/tmp" }, + pluginConfig, + modelCacheState: { + anthropicContext1MEnabled: false, + modelContextLimitsCache: new Map(), + }, + }) + + // #when + await handler(config) + + // #then + const agentConfig = config.agent as Record }> + expect(agentConfig.hephaestus).toBeDefined() + expect(agentConfig.hephaestus.permission?.delegate_task).toBe("allow") + }) +}) + describe("Prometheus category config resolution", () => { test("resolves ultrabrain category config", () => { // given diff --git a/src/plugin-handlers/config-handler.ts b/src/plugin-handlers/config-handler.ts index e420cd94..744a377c 100644 --- a/src/plugin-handlers/config-handler.ts +++ b/src/plugin-handlers/config-handler.ts @@ -48,6 +48,28 @@ export function resolveCategoryConfig( return userCategories?.[categoryName] ?? DEFAULT_CATEGORIES[categoryName]; } +const CORE_AGENT_ORDER = ["sisyphus", "hephaestus", "prometheus", "atlas"] as const; + +function reorderAgentsByPriority(agents: Record): Record { + const ordered: Record = {}; + const seen = new Set(); + + for (const key of CORE_AGENT_ORDER) { + if (Object.prototype.hasOwnProperty.call(agents, key)) { + ordered[key] = agents[key]; + seen.add(key); + } + } + + for (const [key, value] of Object.entries(agents)) { + if (!seen.has(key)) { + ordered[key] = value; + } + } + + return ordered; +} + export function createConfigHandler(deps: ConfigHandlerDeps) { const { ctx, pluginConfig, modelCacheState } = deps; @@ -287,7 +309,7 @@ export function createConfigHandler(deps: ConfigHandlerDeps) { prompt: PROMETHEUS_SYSTEM_PROMPT, permission: PROMETHEUS_PERMISSION, description: `${configAgent?.plan?.description ?? "Plan agent"} (Prometheus - OhMyOpenCode)`, - color: (configAgent?.plan?.color as string) ?? "#FF6347", + color: (configAgent?.plan?.color as string) ?? "#9D4EDD", // Amethyst Purple - wisdom/foresight ...(temperatureToUse !== undefined ? { temperature: temperatureToUse } : {}), ...(topPToUse !== undefined ? { top_p: topPToUse } : {}), ...(maxTokensToUse !== undefined ? { maxTokens: maxTokensToUse } : {}), @@ -369,6 +391,10 @@ export function createConfigHandler(deps: ConfigHandlerDeps) { }; } + if (config.agent) { + config.agent = reorderAgentsByPriority(config.agent as Record); + } + const agentResult = config.agent as AgentConfig; config.tools = { @@ -397,6 +423,10 @@ export function createConfigHandler(deps: ConfigHandlerDeps) { const agent = agentResult.sisyphus as AgentWithPermission; agent.permission = { ...agent.permission, call_omo_agent: "deny", delegate_task: "allow", question: "allow" }; } + if (agentResult.hephaestus) { + const agent = agentResult.hephaestus as AgentWithPermission; + agent.permission = { ...agent.permission, call_omo_agent: "deny", delegate_task: "allow", question: "allow" }; + } if (agentResult["prometheus"]) { const agent = agentResult["prometheus"] as AgentWithPermission; agent.permission = { ...agent.permission, call_omo_agent: "deny", delegate_task: "allow", question: "allow" }; diff --git a/src/shared/AGENTS.md b/src/shared/AGENTS.md index 5808b55f..18fc404e 100644 --- a/src/shared/AGENTS.md +++ b/src/shared/AGENTS.md @@ -1,7 +1,10 @@ # SHARED UTILITIES KNOWLEDGE BASE ## OVERVIEW -55 cross-cutting utilities: path resolution, token truncation, config parsing, model resolution. + +55 cross-cutting utilities. Import via barrel pattern: `import { log, deepMerge } from "../../shared"` + +**Categories**: Path resolution, Token truncation, Config parsing, Model resolution, System directives, Tool restrictions ## STRUCTURE ``` @@ -10,7 +13,7 @@ shared/ ├── logger.ts # File-based logging (/tmp/oh-my-opencode.log) ├── dynamic-truncator.ts # Token-aware context window management (194 lines) ├── model-resolver.ts # 3-step resolution (Override → Fallback → Default) -├── model-requirements.ts # Agent/category model fallback chains (132 lines) +├── model-requirements.ts # Agent/category model fallback chains (162 lines) ├── model-availability.ts # Provider model fetching & fuzzy matching (154 lines) ├── jsonc-parser.ts # JSONC parsing with comment support ├── frontmatter.ts # YAML frontmatter extraction (JSON_SCHEMA only) diff --git a/src/shared/agent-variant.test.ts b/src/shared/agent-variant.test.ts index e320a7b3..8f8e2acf 100644 --- a/src/shared/agent-variant.test.ts +++ b/src/shared/agent-variant.test.ts @@ -95,22 +95,22 @@ describe("resolveVariantForModel", () => { expect(variant).toBe("max") }) - test("returns correct variant for openai provider", () => { - // given + test("returns correct variant for openai provider (hephaestus agent)", () => { + // #given hephaestus has openai/gpt-5.2-codex with variant "medium" in its chain const config = {} as OhMyOpenCodeConfig - const model = { providerID: "openai", modelID: "gpt-5.2" } + const model = { providerID: "openai", modelID: "gpt-5.2-codex" } - // when - const variant = resolveVariantForModel(config, "sisyphus", model) + // #when + const variant = resolveVariantForModel(config, "hephaestus", model) // then expect(variant).toBe("medium") }) - test("returns undefined for provider with no variant in chain", () => { - // given + test("returns undefined for provider not in sisyphus chain", () => { + // #given openai is not in sisyphus fallback chain anymore const config = {} as OhMyOpenCodeConfig - const model = { providerID: "google", modelID: "gemini-3-pro" } + const model = { providerID: "openai", modelID: "gpt-5.2" } // when const variant = resolveVariantForModel(config, "sisyphus", model) diff --git a/src/shared/model-availability.ts b/src/shared/model-availability.ts index 21a4985a..1b7ba0c5 100644 --- a/src/shared/model-availability.ts +++ b/src/shared/model-availability.ts @@ -259,6 +259,26 @@ export async function fetchAvailableModels( return modelSet } +export function isAnyFallbackModelAvailable( + fallbackChain: Array<{ providers: string[]; model: string }>, + availableModels: Set, +): boolean { + if (availableModels.size === 0) { + return false + } + + for (const entry of fallbackChain) { + const hasAvailableProvider = entry.providers.some((provider) => { + return fuzzyMatchModel(entry.model, availableModels, [provider]) !== null + }) + if (hasAvailableProvider) { + return true + } + } + log("[isAnyFallbackModelAvailable] no model available in chain", { chainLength: fallbackChain.length }) + return false +} + export function __resetModelCache(): void {} export function isModelCacheAvailable(): boolean { diff --git a/src/shared/model-requirements.test.ts b/src/shared/model-requirements.test.ts index 4e7f49c7..f8bb2527 100644 --- a/src/shared/model-requirements.test.ts +++ b/src/shared/model-requirements.test.ts @@ -23,20 +23,25 @@ describe("AGENT_MODEL_REQUIREMENTS", () => { expect(primary.variant).toBe("high") }) - test("sisyphus has valid fallbackChain with claude-opus-4-5 as primary", () => { - // given - sisyphus agent requirement + test("sisyphus has valid fallbackChain with claude-opus-4-5 as primary and requiresAnyModel", () => { + // #given - sisyphus agent requirement const sisyphus = AGENT_MODEL_REQUIREMENTS["sisyphus"] - // when - accessing Sisyphus requirement - // then - fallbackChain exists with claude-opus-4-5 as first entry + // #when - accessing Sisyphus requirement + // #then - fallbackChain exists with claude-opus-4-5 as first entry, glm-4.7-free as last expect(sisyphus).toBeDefined() expect(sisyphus.fallbackChain).toBeArray() - expect(sisyphus.fallbackChain.length).toBeGreaterThan(0) + expect(sisyphus.fallbackChain).toHaveLength(5) + expect(sisyphus.requiresAnyModel).toBe(true) const primary = sisyphus.fallbackChain[0] expect(primary.providers[0]).toBe("anthropic") expect(primary.model).toBe("claude-opus-4-5") expect(primary.variant).toBe("max") + + const last = sisyphus.fallbackChain[4] + expect(last.providers[0]).toBe("opencode") + expect(last.model).toBe("glm-4.7-free") }) test("librarian has valid fallbackChain with glm-4.7 as primary", () => { @@ -156,10 +161,21 @@ describe("AGENT_MODEL_REQUIREMENTS", () => { expect(primary.providers[0]).toBe("kimi-for-coding") }) - test("all 9 builtin agents have valid fallbackChain arrays", () => { - // given - list of 9 agent names + test("hephaestus requires gpt-5.2-codex", () => { + // #given - hephaestus agent requirement + const hephaestus = AGENT_MODEL_REQUIREMENTS["hephaestus"] + + // #when - accessing hephaestus requirement + // #then - requiresModel is set to gpt-5.2-codex + expect(hephaestus).toBeDefined() + expect(hephaestus.requiresModel).toBe("gpt-5.2-codex") + }) + + test("all 10 builtin agents have valid fallbackChain arrays", () => { + // #given - list of 10 agent names const expectedAgents = [ "sisyphus", + "hephaestus", "oracle", "librarian", "explore", @@ -173,8 +189,8 @@ describe("AGENT_MODEL_REQUIREMENTS", () => { // when - checking AGENT_MODEL_REQUIREMENTS const definedAgents = Object.keys(AGENT_MODEL_REQUIREMENTS) - // then - all agents present with valid fallbackChain - expect(definedAgents).toHaveLength(9) + // #then - all agents present with valid fallbackChain + expect(definedAgents).toHaveLength(10) for (const agent of expectedAgents) { const requirement = AGENT_MODEL_REQUIREMENTS[agent] expect(requirement).toBeDefined() diff --git a/src/shared/model-requirements.ts b/src/shared/model-requirements.ts index 9bf4b763..94ebf9fd 100644 --- a/src/shared/model-requirements.ts +++ b/src/shared/model-requirements.ts @@ -8,6 +8,7 @@ export type ModelRequirement = { fallbackChain: FallbackEntry[] variant?: string // Default variant (used when entry doesn't specify one) requiresModel?: string // If set, only activates when this model is available (fuzzy match) + requiresAnyModel?: boolean // If true, requires at least ONE model in fallbackChain to be available (or empty availability treated as unavailable) } export const AGENT_MODEL_REQUIREMENTS: Record = { @@ -17,9 +18,15 @@ export const AGENT_MODEL_REQUIREMENTS: Record = { { providers: ["kimi-for-coding"], model: "k2p5" }, { providers: ["opencode"], model: "kimi-k2.5-free" }, { providers: ["zai-coding-plan"], model: "glm-4.7" }, - { providers: ["openai", "github-copilot", "opencode"], model: "gpt-5.2-codex", variant: "medium" }, - { providers: ["google", "github-copilot", "opencode"], model: "gemini-3-pro" }, + { providers: ["opencode"], model: "glm-4.7-free" }, ], + requiresAnyModel: true, + }, + hephaestus: { + fallbackChain: [ + { providers: ["openai", "github-copilot", "opencode"], model: "gpt-5.2-codex", variant: "medium" }, + ], + requiresModel: "gpt-5.2-codex", }, oracle: { fallbackChain: [ diff --git a/src/tools/AGENTS.md b/src/tools/AGENTS.md index 873dc26b..feea6dcb 100644 --- a/src/tools/AGENTS.md +++ b/src/tools/AGENTS.md @@ -2,7 +2,9 @@ ## OVERVIEW -20+ tools: LSP (6), AST-Grep (2), Search (2), Session (4), Agent delegation (4), System (2), Skill (3). +20+ tools across 7 categories. Two patterns: Direct ToolDefinition (static) and Factory Function (context-dependent). + +**Categories**: LSP (6), AST-Grep (2), Search (2), Session (4), Agent delegation (2), Background (2), Skill (3) ## STRUCTURE @@ -13,9 +15,9 @@ tools/ │ ├── tools.ts # ToolDefinition or factory │ ├── types.ts # Zod schemas │ └── constants.ts # Fixed values -├── lsp/ # 6 tools: definition, references, symbols, diagnostics, rename (client.ts 596 lines) +├── lsp/ # 6 tools: definition, references, symbols, diagnostics, rename (client.ts 540 lines) ├── ast-grep/ # 2 tools: search, replace (25 languages) -├── delegate-task/ # Category-based routing (1070 lines) +├── delegate-task/ # Category-based routing (1135 lines) ├── session-manager/ # 4 tools: list, read, search, info ├── grep/ # Custom grep with timeout (60s, 10MB) ├── glob/ # 60s timeout, 100 file limit diff --git a/src/tools/delegate-task/categories.ts b/src/tools/delegate-task/categories.ts index 6b8f6a94..1ee544d2 100644 --- a/src/tools/delegate-task/categories.ts +++ b/src/tools/delegate-task/categories.ts @@ -28,17 +28,18 @@ export function resolveCategoryConfig( ): ResolveCategoryConfigResult | null { const { userCategories, inheritedModel, systemDefaultModel, availableModels } = options - // Check if category requires a specific model + const defaultConfig = DEFAULT_CATEGORIES[categoryName] + const userConfig = userCategories?.[categoryName] + const hasExplicitUserConfig = userConfig !== undefined + + // Check if category requires a specific model - bypass if user explicitly provides config const categoryReq = CATEGORY_MODEL_REQUIREMENTS[categoryName] - if (categoryReq?.requiresModel && availableModels) { + if (categoryReq?.requiresModel && availableModels && !hasExplicitUserConfig) { if (!isModelAvailable(categoryReq.requiresModel, availableModels)) { log(`[resolveCategoryConfig] Category ${categoryName} requires ${categoryReq.requiresModel} but not available`) return null } } - - const defaultConfig = DEFAULT_CATEGORIES[categoryName] - const userConfig = userCategories?.[categoryName] const defaultPromptAppend = CATEGORY_PROMPT_APPENDS[categoryName] ?? "" if (!defaultConfig && !userConfig) { diff --git a/src/tools/delegate-task/tools.test.ts b/src/tools/delegate-task/tools.test.ts index e72dc25e..64465ba9 100644 --- a/src/tools/delegate-task/tools.test.ts +++ b/src/tools/delegate-task/tools.test.ts @@ -11,6 +11,7 @@ const SYSTEM_DEFAULT_MODEL = "anthropic/claude-sonnet-4-5" describe("sisyphus-task", () => { let cacheSpy: ReturnType + let providerModelsSpy: ReturnType beforeEach(() => { __resetModelCache() @@ -25,11 +26,21 @@ describe("sisyphus-task", () => { SESSION_CONTINUATION_STABILITY_MS: 50, }) cacheSpy = spyOn(connectedProvidersCache, "readConnectedProvidersCache").mockReturnValue(["anthropic", "google", "openai"]) + providerModelsSpy = spyOn(connectedProvidersCache, "readProviderModelsCache").mockReturnValue({ + models: { + anthropic: ["claude-opus-4-5", "claude-sonnet-4-5", "claude-haiku-4-5"], + google: ["gemini-3-pro", "gemini-3-flash"], + openai: ["gpt-5.2", "gpt-5.2-codex"], + }, + connected: ["anthropic", "google", "openai"], + updatedAt: "2026-01-01T00:00:00.000Z", + }) }) afterEach(() => { __resetTimingConfig() cacheSpy?.mockRestore() + providerModelsSpy?.mockRestore() }) describe("DEFAULT_CATEGORIES", () => { @@ -200,14 +211,17 @@ describe("sisyphus-task", () => { // given a mock client with no model in config const { createDelegateTask } = require("./tools") - const mockManager = { launch: async () => ({ id: "task-123" }) } + const mockManager = { launch: async () => ({ id: "task-123", status: "pending", description: "Test task", agent: "sisyphus-junior", sessionID: "test-session" }) } const mockClient = { app: { agents: async () => ({ data: [] }) }, config: { get: async () => ({}) }, // No model configured + provider: { list: async () => ({ data: { connected: ["openai"] } }) }, + model: { list: async () => ({ data: [{ provider: "openai", id: "gpt-5.2-codex" }] }) }, session: { create: async () => ({ data: { id: "test-session" } }), prompt: async () => ({ data: {} }), messages: async () => ({ data: [] }), + status: async () => ({ data: {} }), }, } @@ -332,6 +346,46 @@ describe("sisyphus-task", () => { expect(result).toBeNull() }) + test("bypasses requiresModel when explicit user config provided", () => { + // #given + const categoryName = "deep" + const availableModels = new Set(["anthropic/claude-opus-4-5"]) + const userCategories = { + deep: { model: "anthropic/claude-opus-4-5" }, + } + + // #when + const result = resolveCategoryConfig(categoryName, { + systemDefaultModel: SYSTEM_DEFAULT_MODEL, + availableModels, + userCategories, + }) + + // #then + expect(result).not.toBeNull() + expect(result!.config.model).toBe("anthropic/claude-opus-4-5") + }) + + test("bypasses requiresModel when explicit user config provided even with empty availability", () => { + // #given + const categoryName = "deep" + const availableModels = new Set() + const userCategories = { + deep: { model: "anthropic/claude-opus-4-5" }, + } + + // #when + const result = resolveCategoryConfig(categoryName, { + systemDefaultModel: SYSTEM_DEFAULT_MODEL, + availableModels, + userCategories, + }) + + // #then + expect(result).not.toBeNull() + expect(result!.config.model).toBe("anthropic/claude-opus-4-5") + }) + test("returns default model from DEFAULT_CATEGORIES for builtin category", () => { // given const categoryName = "visual-engineering" @@ -559,7 +613,7 @@ describe("sisyphus-task", () => { const mockClient = { app: { agents: async () => ({ data: [] }) }, config: { get: async () => ({ data: { model: SYSTEM_DEFAULT_MODEL } }) }, - model: { list: async () => [{ id: "anthropic/claude-opus-4-5" }] }, + model: { list: async () => [{ provider: "anthropic", id: "claude-opus-4-5" }] }, session: { create: async () => ({ data: { id: "test-session" } }), prompt: async () => ({ data: {} }), @@ -610,7 +664,7 @@ describe("sisyphus-task", () => { const mockClient = { app: { agents: async () => ({ data: [] }) }, config: { get: async () => ({ data: { model: SYSTEM_DEFAULT_MODEL } }) }, - model: { list: async () => [{ id: "anthropic/claude-opus-4-5" }] }, + model: { list: async () => [{ provider: "anthropic", id: "claude-opus-4-5" }] }, session: { get: async () => ({ data: { directory: "/project" } }), create: async () => ({ data: { id: "ses_sync_default_variant" } }), @@ -1159,7 +1213,7 @@ describe("sisyphus-task", () => { const mockClient = { app: { agents: async () => ({ data: [] }) }, config: { get: async () => ({ data: { model: SYSTEM_DEFAULT_MODEL } }) }, - model: { list: async () => ({ data: [{ provider: "google", id: "gemini-3-pro" }] }) }, + model: { list: async () => [{ provider: "google", id: "gemini-3-pro" }] }, session: { get: async () => ({ data: { directory: "/project" } }), create: async () => ({ data: { id: "ses_unstable_gemini" } }), @@ -1394,13 +1448,6 @@ describe("sisyphus-task", () => { test("artistry category (gemini) with run_in_background=false should force background but wait for result", async () => { // given - artistry also uses gemini model const { createDelegateTask } = require("./tools") - const providerModelsSpy = spyOn(connectedProvidersCache, "readProviderModelsCache").mockReturnValue({ - connected: ["anthropic", "google", "openai"], - updatedAt: new Date().toISOString(), - models: { - google: ["gemini-3-pro", "gemini-3-flash"], - }, - }) let launchCalled = false const mockManager = { @@ -1419,7 +1466,7 @@ describe("sisyphus-task", () => { const mockClient = { app: { agents: async () => ({ data: [] }) }, config: { get: async () => ({ data: { model: SYSTEM_DEFAULT_MODEL } }) }, - model: { list: async () => ({ data: [{ provider: "google", id: "gemini-3-pro" }] }) }, + model: { list: async () => [{ provider: "google", id: "gemini-3-pro" }] }, session: { get: async () => ({ data: { directory: "/project" } }), create: async () => ({ data: { id: "ses_artistry_gemini" } }), @@ -1461,7 +1508,6 @@ describe("sisyphus-task", () => { expect(launchCalled).toBe(true) expect(result).toContain("SUPERVISED TASK COMPLETED") expect(result).toContain("Artistry result here") - providerModelsSpy.mockRestore() }, { timeout: 20000 }) test("writing category (gemini-flash) with run_in_background=false should force background but wait for result", async () => { @@ -1485,7 +1531,7 @@ describe("sisyphus-task", () => { const mockClient = { app: { agents: async () => ({ data: [] }) }, config: { get: async () => ({ data: { model: SYSTEM_DEFAULT_MODEL } }) }, - model: { list: async () => [{ id: "google/gemini-3-flash" }] }, + model: { list: async () => [{ provider: "google", id: "gemini-3-flash" }] }, session: { get: async () => ({ data: { directory: "/project" } }), create: async () => ({ data: { id: "ses_writing_gemini" } }),