feat(hooks): add mandatory hands-on verification enforcement for orchestrated tasks

- sisyphus-orchestrator: Add verification reminder with tool matrix (playwright/interactive_bash/curl) - start-work: Inject detailed verification workflow with deliverable-specific guidance 🤖 Generated with [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode) assistance
2026-01-06 13:16:51 +09:00 · 2026-01-06 13:16:51 +09:00 · 39e92b1900
commit 39e92b1900
parent 7567c40a81
2 changed files with 66 additions and 2 deletions
--- a/src/hooks/sisyphus-orchestrator/index.ts
+++ b/src/hooks/sisyphus-orchestrator/index.ts
@ -42,7 +42,19 @@ Subagents FREQUENTLY claim completion when:
 5. Verify notepad was updated - Must have substantive content
 DO NOT TRUST THE AGENT'S SELF-REPORT.
-VERIFY EACH CLAIM WITH YOUR OWN TOOL CALLS.`
+They are non-deterministic and not exceptional - they CANNOT distinguish between completed and incomplete states.
 VERIFY EACH CLAIM WITH YOUR OWN TOOL CALLS.
 **HANDS-ON QA REQUIRED (after ALL tasks complete):**
 | Deliverable Type | Verification Tool | Action |
 |------------------|-------------------|--------|
 | **Frontend/UI** | \`/playwright\` skill | Navigate, interact, screenshot evidence |
 | **TUI/CLI** | \`interactive_bash\` (tmux) | Run interactively, verify output |
 | **API/Backend** | \`bash\` with curl | Send requests, verify responses |
 Static analysis CANNOT catch: visual bugs, animation issues, user flow breakages, integration problems.
 **FAILURE TO DO HANDS-ON QA = INCOMPLETE WORK.**`
 function buildOrchestratorReminder(planName: string, progress: { total: number; completed: number }): string {
  const remaining = progress.total - progress.completed
--- a/src/hooks/start-work/index.ts
+++ b/src/hooks/start-work/index.ts
@ -126,13 +126,65 @@ Which plan would you like to work on? Reply with the number or plan name.`
        }
      }
      const verificationEnforcement = `
 ---
 ## MANDATORY VERIFICATION ENFORCEMENT (NON-NEGOTIABLE)
 **CRITICAL: You MUST perform hands-on verification after completing ALL tasks. Static analysis alone is NOT sufficient.**
 ### Verification by Deliverable Type
 | Type | Tool | How to Verify |
 |------|------|---------------|
 | **Frontend/UI** | \`/playwright\` skill | Navigate, click, verify visual state, take screenshots |
 | **TUI/CLI** | \`interactive_bash\` (tmux) | Run commands interactively, verify output |
 | **API/Backend** | \`bash\` with curl/httpie | Send requests, verify responses |
 | **Library/Module** | REPL via \`interactive_bash\` | Import, call functions, verify results |
 ### Verification Workflow
 1. **After ALL tasks complete** (not after each task):
   - Start dev server if needed: \`bun run dev\` / \`npm run dev\`
   - Wait for server to be ready
 2. **For Frontend changes**:
   \`\`\`
   Load /playwright skill → Navigate to page → Interact with UI → Verify expected behavior → Screenshot evidence
   \`\`\`
 3. **For TUI/CLI changes**:
   \`\`\`
   interactive_bash(tmux_command="new-session -d -s qa") → send-keys with commands → capture-pane output → verify
   \`\`\`
 4. **Evidence required**:
   - Screenshots for visual changes (saved to \`.sisyphus/evidence/\`)
   - Terminal output for CLI changes
   - Response bodies for API changes
 ### What Static Analysis CANNOT Catch
 - Visual rendering issues (wrong colors, broken layouts)
 - Animation/transition bugs
 - Race conditions in UI interactions
 - User flow breakages
 - Integration issues between components
 ### FAILURE TO VERIFY = INCOMPLETE WORK
 **Do NOT mark tasks complete or report "done" without hands-on verification.**
 If you skip this step, the user will find bugs you could have caught.
 `
      const idx = output.parts.findIndex((p) => p.type === "text" && p.text)
      if (idx >= 0 && output.parts[idx].text) {
        output.parts[idx].text = output.parts[idx].text
          .replace(/\$SESSION_ID/g, sessionId)
          .replace(/\$TIMESTAMP/g, timestamp)
-        output.parts[idx].text += `\n\n---\n${contextInfo}`
+        output.parts[idx].text += `\n\n---\n${contextInfo}${verificationEnforcement}`
      }
      log(`[${HOOK_NAME}] Context injected`, {