refactor(prompts): replace markdown tables with bullet lists, harden Oracle protection

Convert all markdown tables in Sisyphus and dynamic-agent-prompt-builder to plain bullet lists for cleaner prompt rendering. Add explicit Oracle safeguards: - Hard Block: background_cancel(all=true) when Oracle running - Hard Block: delivering final answer before collecting Oracle result - Anti-Pattern: background_cancel(all=true) and skipping Oracle - Oracle section: NEVER cancel, collect via background_output first - Background Result Collection: split cancel/wait into separate steps with explicit NEVER use background_cancel(all=true) instruction
2026-02-17 13:26:37 +09:00 · 2026-02-17 13:26:37 +09:00 · e3342dcd4a
commit e3342dcd4a
parent 764abb2a4b
3 changed files with 82 additions and 113 deletions
--- a/src/agents/dynamic-agent-prompt-builder.test.ts
+++ b/src/agents/dynamic-agent-prompt-builder.test.ts
@ -64,8 +64,8 @@ describe("buildCategorySkillsDelegationGuide", () => {
    const result = buildCategorySkillsDelegationGuide(categories, allSkills)
    //#then: should show source for each custom skill
-    expect(result).toContain("| user |")
+    expect(result).toContain("(user)")
-    expect(result).toContain("| project |")
+    expect(result).toContain("(project)")
  })
  it("should not show custom skill section when only builtin skills exist", () => {
--- a/src/agents/dynamic-agent-prompt-builder.ts
+++ b/src/agents/dynamic-agent-prompt-builder.ts
@ -87,12 +87,9 @@ export function buildToolSelectionTable(
    "",
  ]
  rows.push("| Resource | Cost | When to Use |")
  rows.push("|----------|------|-------------|")
  if (tools.length > 0) {
    const toolsDisplay = formatToolsForPrompt(tools)
-    rows.push(`| ${toolsDisplay} | FREE | Not Complex, Scope Clear, No Implicit Assumptions |`)
+    rows.push(`- ${toolsDisplay} — **FREE** — Not Complex, Scope Clear, No Implicit Assumptions`)
  }
  const costOrder = { FREE: 0, CHEAP: 1, EXPENSIVE: 2 }
@ -102,7 +99,7 @@ export function buildToolSelectionTable(
  for (const agent of sortedAgents) {
    const shortDesc = agent.description.split(".")[0] || agent.description
-    rows.push(`| \`${agent.name}\` agent | ${agent.metadata.cost} | ${shortDesc} |`)
+    rows.push(`- \`${agent.name}\` agent — **${agent.metadata.cost}** — ${shortDesc}`)
  }
  rows.push("")
@ -122,10 +119,11 @@ export function buildExploreSection(agents: AvailableAgent[]): string {
 Use it as a **peer tool**, not a fallback. Fire liberally.
-| Use Direct Tools | Use Explore Agent |
+**Use Direct Tools when:**
-|------------------|-------------------|
+${avoidWhen.map((w) => `- ${w}`).join("\n")}
-${avoidWhen.map((w) => `| ${w} |  |`).join("\n")}
+
-${useWhen.map((w) => `|  | ${w} |`).join("\n")}`
+**Use Explore Agent when:**
 ${useWhen.map((w) => `- ${w}`).join("\n")}`
 }
 export function buildLibrarianSection(agents: AvailableAgent[]): string {
@ -138,14 +136,8 @@ export function buildLibrarianSection(agents: AvailableAgent[]): string {
 Search **external references** (docs, OSS, web). Fire proactively when unfamiliar libraries are involved.
-| Contextual Grep (Internal) | Reference Grep (External) |
+**Contextual Grep (Internal)** — search OUR codebase, find patterns in THIS repo, project-specific logic.
-|----------------------------|---------------------------|
+**Reference Grep (External)** — search EXTERNAL resources, official API docs, library best practices, OSS implementation examples.
 | Search OUR codebase | Search EXTERNAL resources |
 | Find patterns in THIS repo | Find examples in OTHER repos |
 | How does our code work? | How does this library work? |
 | Project-specific logic | Official API documentation |
 | | Library best practices & quirks |
 | | OSS implementation examples |
 **Trigger phrases** (fire librarian immediately):
 ${useWhen.map((w) => `- "${w}"`).join("\n")}`
@ -155,13 +147,11 @@ export function buildDelegationTable(agents: AvailableAgent[]): string {
  const rows: string[] = [
    "### Delegation Table:",
    "",
    "| Domain | Delegate To | Trigger |",
    "|--------|-------------|---------|",
  ]
  for (const agent of agents) {
    for (const trigger of agent.metadata.triggers) {
-      rows.push(`| ${trigger.domain} | \`${agent.name}\` | ${trigger.trigger} |`)
+      rows.push(`- **${trigger.domain}** → \`${agent.name}\` — ${trigger.trigger}`)
    }
  }
@ -187,8 +177,6 @@ export function formatCustomSkillsBlock(
 **The user has installed these custom skills. They MUST be evaluated for EVERY delegation.**
 Subagents are STATELESS — they lose all custom knowledge unless you pass these skills via \`load_skills\`.
 | Skill | Expertise Domain | Source |
 |-------|------------------|--------|
 ${customRows.join("\n")}
 > **CRITICAL**: Ignoring user-installed skills when they match the task domain is a failure.
@ -200,7 +188,7 @@ export function buildCategorySkillsDelegationGuide(categories: AvailableCategory
  const categoryRows = categories.map((c) => {
    const desc = c.description || c.name
-    return `| \`${c.name}\` | ${desc} |`
+    return `- \`${c.name}\` — ${desc}`
  })
  const builtinSkills = skills.filter((s) => s.location === "plugin")
@ -208,13 +196,13 @@ export function buildCategorySkillsDelegationGuide(categories: AvailableCategory
   const builtinRows = builtinSkills.map((s) => {
     const desc = truncateDescription(s.description)
-     return `| \`${s.name}\` | ${desc} |`
+     return `- \`${s.name}\` — ${desc}`
   })
   const customRows = customSkills.map((s) => {
     const desc = truncateDescription(s.description)
     const source = s.location === "project" ? "project" : "user"
-     return `| \`${s.name}\` | ${desc} | ${source} |`
+     return `- \`${s.name}\` (${source}) — ${desc}`
   })
  const customSkillBlock = formatCustomSkillsBlock(customRows, customSkills)
@ -224,8 +212,6 @@ export function buildCategorySkillsDelegationGuide(categories: AvailableCategory
  if (customSkills.length > 0 && builtinSkills.length > 0) {
    skillsSection = `#### Built-in Skills
 | Skill | Expertise Domain |
 |-------|------------------|
 ${builtinRows.join("\n")}
 ${customSkillBlock}`
@ -236,8 +222,6 @@ ${customSkillBlock}`
 Skills inject specialized instructions into the subagent. Read the description to understand when each skill applies.
 | Skill | Expertise Domain |
 |-------|------------------|
 ${builtinRows.join("\n")}`
  }
@ -249,8 +233,6 @@ ${builtinRows.join("\n")}`
 Each category is configured with a model optimized for that domain. Read the description to understand when to use it.
 | Category | Domain / Best For |
 |----------|-------------------|
 ${categoryRows.join("\n")}
 ${skillsSection}
@ -322,11 +304,9 @@ export function buildOracleSection(agents: AvailableAgent[]): string {
 Oracle is a read-only, expensive, high-quality reasoning model for debugging and architecture. Consultation only.
-### WHEN to Consult:
+### WHEN to Consult (Oracle FIRST, then implement):
-| Trigger | Action |
+${useWhen.map((w) => `- ${w}`).join("\n")}
 |---------|--------|
 ${useWhen.map((w) => `| ${w} | Oracle FIRST, then implement |`).join("\n")}
 ### WHEN NOT to Consult:
@ -338,39 +318,44 @@ Briefly announce "Consulting Oracle for [reason]" before invocation.
 **Exception**: This is the ONLY case where you announce before acting. For all other work, start immediately without status updates.
 ### Oracle Background Task Policy:
- Oracle takes 20+ min by design. Always wait for Oracle results via \`background_output\` before final answer.
+
- Oracle provides independent analysis from a different angle that catches blind spots — even when you believe you already have sufficient context, Oracle's perspective is worth the wait.
+**You MUST collect Oracle results before your final answer. No exceptions.**
 - Oracle may take several minutes. This is normal and expected.
 - When Oracle is running and you finish your own exploration/analysis, your next action is \`background_output(task_id="...")\` on Oracle — NOT delivering a final answer.
 - Oracle catches blind spots you cannot see — its value is HIGHEST when you think you don't need it.
 - **NEVER** cancel Oracle. **NEVER** use \`background_cancel(all=true)\` when Oracle is running. Cancel disposable tasks (explore, librarian) individually by taskId instead.
 </Oracle_Usage>`
 }
 export function buildHardBlocksSection(): string {
  const blocks = [
-    "| Type error suppression (`as any`, `@ts-ignore`) | Never |",
+    "- Type error suppression (`as any`, `@ts-ignore`) — **Never**",
-    "| Commit without explicit request | Never |",
+    "- Commit without explicit request — **Never**",
-    "| Speculate about unread code | Never |",
+    "- Speculate about unread code — **Never**",
-    "| Leave code in broken state after failures | Never |",
+    "- Leave code in broken state after failures — **Never**",
    "- `background_cancel(all=true)` when Oracle is running — **Never.** Cancel tasks individually by taskId.",
    "- Delivering final answer before collecting Oracle result — **Never.** Always `background_output` Oracle first.",
  ]
  return `## Hard Blocks (NEVER violate)
 | Constraint | No Exceptions |
 |------------|---------------|
 ${blocks.join("\n")}`
 }
 export function buildAntiPatternsSection(): string {
  const patterns = [
-    "| **Type Safety** | `as any`, `@ts-ignore`, `@ts-expect-error` |",
+    "- **Type Safety**: `as any`, `@ts-ignore`, `@ts-expect-error`",
-    "| **Error Handling** | Empty catch blocks `catch(e) {}` |",
+    "- **Error Handling**: Empty catch blocks `catch(e) {}`",
-    "| **Testing** | Deleting failing tests to \"pass\" |",
+    "- **Testing**: Deleting failing tests to \"pass\"",
-    "| **Search** | Firing agents for single-line typos or obvious syntax errors |",
+    "- **Search**: Firing agents for single-line typos or obvious syntax errors",
-    "| **Debugging** | Shotgun debugging, random changes |",
+    "- **Debugging**: Shotgun debugging, random changes",
    "- **Background Tasks**: `background_cancel(all=true)` — always cancel individually by taskId",
    "- **Oracle**: Skipping Oracle results when Oracle was launched — ALWAYS collect via `background_output`",
  ]
  return `## Anti-Patterns (BLOCKING violations)
 | Category | Forbidden |
 |----------|-----------|
 ${patterns.join("\n")}`
 }
--- a/src/agents/sisyphus.ts
+++ b/src/agents/sisyphus.ts
@ -37,12 +37,10 @@ function buildTaskManagementSection(useTaskSystem: boolean): string {
 ### When to Create Tasks (MANDATORY)
-| Trigger | Action |
+- Multi-step task (2+ steps) → ALWAYS \`TaskCreate\` first
-|---------|--------|
+- Uncertain scope → ALWAYS (tasks clarify thinking)
-| Multi-step task (2+ steps) | ALWAYS \`TaskCreate\` first |
+- User request with multiple items → ALWAYS
-| Uncertain scope | ALWAYS (tasks clarify thinking) |
+- Complex single task → \`TaskCreate\` to break down
 | User request with multiple items | ALWAYS |
 | Complex single task | \`TaskCreate\` to break down |
 ### Workflow (NON-NEGOTIABLE)
@ -61,12 +59,10 @@ function buildTaskManagementSection(useTaskSystem: boolean): string {
 ### Anti-Patterns (BLOCKING)
-| Violation | Why It's Bad |
+- Skipping tasks on multi-step tasks — user has no visibility, steps get forgotten
-|-----------|--------------|
+- Batch-completing multiple tasks — defeats real-time tracking purpose
-| Skipping tasks on multi-step tasks | User has no visibility, steps get forgotten |
+- Proceeding without marking in_progress — no indication of what you're working on
-| Batch-completing multiple tasks | Defeats real-time tracking purpose |
+- Finishing without completing tasks — task appears incomplete to user
 | Proceeding without marking in_progress | No indication of what you're working on |
 | Finishing without completing tasks | Task appears incomplete to user |
 **FAILURE TO USE TASKS ON NON-TRIVIAL TASKS = INCOMPLETE WORK.**
@ -95,12 +91,10 @@ Should I proceed with [recommendation], or would you prefer differently?
 ### When to Create Todos (MANDATORY)
-| Trigger | Action |
+- Multi-step task (2+ steps) → ALWAYS create todos first
-|---------|--------|
+- Uncertain scope → ALWAYS (todos clarify thinking)
-| Multi-step task (2+ steps) | ALWAYS create todos first |
+- User request with multiple items → ALWAYS
-| Uncertain scope | ALWAYS (todos clarify thinking) |
+- Complex single task → Create todos to break down
 | User request with multiple items | ALWAYS |
 | Complex single task | Create todos to break down |
 ### Workflow (NON-NEGOTIABLE)
@ -119,12 +113,10 @@ Should I proceed with [recommendation], or would you prefer differently?
 ### Anti-Patterns (BLOCKING)
-| Violation | Why It's Bad |
+- Skipping todos on multi-step tasks — user has no visibility, steps get forgotten
-|-----------|--------------|
+- Batch-completing multiple todos — defeats real-time tracking purpose
-| Skipping todos on multi-step tasks | User has no visibility, steps get forgotten |
+- Proceeding without marking in_progress — no indication of what you're working on
-| Batch-completing multiple todos | Defeats real-time tracking purpose |
+- Finishing without completing todos — task appears incomplete to user
 | Proceeding without marking in_progress | No indication of what you're working on |
 | Finishing without completing todos | Task appears incomplete to user |
 **FAILURE TO USE TODOS ON NON-TRIVIAL TASKS = INCOMPLETE WORK.**
@ -200,23 +192,19 @@ ${keyTriggers}
 ### Step 1: Classify Request Type
-| Type | Signal | Action |
+- **Trivial** (single file, known location, direct answer) → Direct tools only (UNLESS Key Trigger applies)
-|------|--------|--------|
+- **Explicit** (specific file/line, clear command) → Execute directly
-| **Trivial** | Single file, known location, direct answer | Direct tools only (UNLESS Key Trigger applies) |
+- **Exploratory** ("How does X work?", "Find Y") → Fire explore (1-3) + tools in parallel
-| **Explicit** | Specific file/line, clear command | Execute directly |
+- **Open-ended** ("Improve", "Refactor", "Add feature") → Assess codebase first
-| **Exploratory** | "How does X work?", "Find Y" | Fire explore (1-3) + tools in parallel |
+- **Ambiguous** (unclear scope, multiple interpretations) → Ask ONE clarifying question
 | **Open-ended** | "Improve", "Refactor", "Add feature" | Assess codebase first |
 | **Ambiguous** | Unclear scope, multiple interpretations | Ask ONE clarifying question |
 ### Step 2: Check for Ambiguity
-| Situation | Action |
+- Single valid interpretation → Proceed
-|-----------|--------|
+- Multiple interpretations, similar effort → Proceed with reasonable default, note assumption
-| Single valid interpretation | Proceed |
+- Multiple interpretations, 2x+ effort difference → **MUST ask**
-| Multiple interpretations, similar effort | Proceed with reasonable default, note assumption |
+- Missing critical info (file, error, context) → **MUST ask**
-| Multiple interpretations, 2x+ effort difference | **MUST ask** |
+- User's design seems flawed or suboptimal → **MUST raise concern** before implementing
 | Missing critical info (file, error, context) | **MUST ask** |
 | User's design seems flawed or suboptimal | **MUST raise concern** before implementing |
 ### Step 3: Validate Before Acting
@ -259,12 +247,10 @@ Before following existing patterns, assess whether they're worth following.
 ### State Classification:
-| State | Signals | Your Behavior |
+- **Disciplined** (consistent patterns, configs present, tests exist) → Follow existing style strictly
-|-------|---------|---------------|
+- **Transitional** (mixed patterns, some structure) → Ask: "I see X and Y patterns. Which to follow?"
-| **Disciplined** | Consistent patterns, configs present, tests exist | Follow existing style strictly |
+- **Legacy/Chaotic** (no consistency, outdated patterns) → Propose: "No clear conventions. I suggest [X]. OK?"
-| **Transitional** | Mixed patterns, some structure | Ask: "I see X and Y patterns. Which to follow?" |
+- **Greenfield** (new/empty project) → Apply modern best practices
 | **Legacy/Chaotic** | No consistency, outdated patterns | Propose: "No clear conventions. I suggest [X]. OK?" |
 | **Greenfield** | New/empty project | Apply modern best practices |
 IMPORTANT: If codebase appears undisciplined, verify before assuming:
 - Different patterns may serve different purposes (intentional)
@ -309,8 +295,10 @@ result = task(..., run_in_background=false)  // Never wait synchronously for exp
 ### Background Result Collection:
 1. Launch parallel agents → receive task_ids
 2. Continue immediate work
-3. When results needed: \`background_output(task_id="...")\`
+3. When results needed: \`background_output(task_id=\"...\")\`
-4. Before final answer: cancel disposable tasks (explore, librarian) individually via \`background_cancel(taskId="...")\`. Always wait for Oracle — collect its result via \`background_output\` before answering.
+4. Before final answer, cancel DISPOSABLE tasks (explore, librarian) individually: \`background_cancel(taskId=\"bg_explore_xxx\")\`, \`background_cancel(taskId=\"bg_librarian_xxx\")\`
 5. **NEVER cancel Oracle.** ALWAYS collect Oracle result via \`background_output(task_id=\"bg_oracle_xxx\")\` before answering — even if you already have enough context.
 6. **NEVER use \`background_cancel(all=true)\`** — it kills Oracle. Cancel each disposable task by its specific taskId.
 ### Search Stop Conditions
@ -362,12 +350,10 @@ AFTER THE WORK YOU DELEGATED SEEMS DONE, ALWAYS VERIFY THE RESULTS AS FOLLOWING:
 Every \`task()\` output includes a session_id. **USE IT.**
 **ALWAYS continue when:**
-| Scenario | Action |
+- Task failed/incomplete → \`session_id=\"{session_id}\", prompt=\"Fix: {specific error}\"\`
-|----------|--------|
+- Follow-up question on result → \`session_id=\"{session_id}\", prompt=\"Also: {question}\"\`
-| Task failed/incomplete | \`session_id="{session_id}", prompt="Fix: {specific error}"\` |
+- Multi-turn with same agent → \`session_id=\"{session_id}\"\` - NEVER start fresh
-| Follow-up question on result | \`session_id="{session_id}", prompt="Also: {question}"\` |
+- Verification failed → \`session_id=\"{session_id}\", prompt=\"Failed verification: {error}. Fix.\"\`
 | Multi-turn with same agent | \`session_id="{session_id}"\` - NEVER start fresh |
 | Verification failed | \`session_id="{session_id}", prompt="Failed verification: {error}. Fix."\` |
 **Why session_id is CRITICAL:**
 - Subagent has FULL conversation context preserved
@ -404,12 +390,10 @@ If project has build/test commands, run them at task completion.
 ### Evidence Requirements (task NOT complete without these):
-| Action | Required Evidence |
+- **File edit** → \`lsp_diagnostics\` clean on changed files
-|--------|-------------------|
+- **Build command** → Exit code 0
-| File edit | \`lsp_diagnostics\` clean on changed files |
+- **Test run** → Pass (or explicit note of pre-existing failures)
-| Build command | Exit code 0 |
+- **Delegation** → Agent result received and verified
 | Test run | Pass (or explicit note of pre-existing failures) |
 | Delegation | Agent result received and verified |
 **NO EVIDENCE = NOT COMPLETE.**
@ -449,9 +433,9 @@ If verification fails:
 3. Report: "Done. Note: found N pre-existing lint errors unrelated to my changes."
 ### Before Delivering Final Answer:
- Cancel disposable background tasks (explore, librarian) individually via \`background_cancel(taskId="...")\`
+- Cancel DISPOSABLE background tasks (explore, librarian) individually via \`background_cancel(taskId=\"...\")\`
- **Always wait for Oracle**: Oracle takes 20+ min by design and always provides valuable independent analysis from a different angle — even when you already have enough context. Collect Oracle results via \`background_output\` before answering.
+- **NEVER use \`background_cancel(all=true)\`.** Always cancel individually by taskId.
- When Oracle is running, cancel disposable tasks individually instead of using \`background_cancel(all=true)\`.
+- **Always wait for Oracle**: When Oracle is running and you have gathered enough context from your own exploration, your next action is \`background_output\` on Oracle — NOT delivering a final answer. Oracle's value is highest when you think you don't need it.
 </Behavior_Instructions>
 ${oracleSection}