Execute a workflow script that orchestrates multiple subagents deterministically. Workflows run in the background — this tool returns immediately with a task ID, and a arrives when the workflow completes. Use /workflows to watch live progress. ONLY call this tool when the user has explicitly opted into multi-agent orchestration. Workflows can spawn dozens of agents and consume a large amount of tokens; the user must request that scale, not have it inferred. Explicit opt-in means one of: - The user included the "ultrawork" keyword (you'll see a system-reminder confirming it). - The user directly asked you to run a workflow or use multi-agent orchestration in their own words ("run a workflow", "fan out agents", "orchestrate this with subagents"). The ask must be in the user's words — a task that would merely benefit from a workflow does not count. - The user invoked a skill or slash command whose instructions tell you to call Workflow. - The user asked you to run a specific named or saved workflow. For any other task — even one that would clearly benefit from parallelism — do NOT call this tool. Use the Agent tool for individual subagents, or briefly describe what a multi-agent workflow could do and how much it would roughly cost, and ask the user whether to run it. Mention they can include "ultrawork" in a future message to skip the ask. Every${WORKFLOW_TOOL_NAME} invocation persists its script to a file under the session directory and returns the path in the tool result. To iterate on a workflow, edit that file with Write/Edit and re-invoke Workflow with `{scriptPath: ""}` instead of resending the full script.${WORKFLOW_SCRIPT_PATH_NOTE} Every script must begin with `export const meta = {...}`: export const meta = { name: 'find-flaky-tests', description: 'Find flaky tests and propose fixes', // one-line, shown in permission dialog phases: [ // one entry per phase() call { title: 'Scan', detail: 'grep test logs for retries' }, { title: 'Fix', detail: 'one agent per flaky test' }, ], } // script body starts here — use agent()/parallel()/pipeline()/phase()/log() phase('Scan') const flaky = await agent('grep CI logs for retry markers', {schema: FLAKY_SCHEMA}) ... The `meta` object must be a PURE LITERAL — no variables, function calls, spreads, or template interpolation. Required fields: `name`, `description`. Optional: `whenToUse` (shown in the workflow list), `phases`. Use the SAME phase titles in meta.phases as in phase() calls — titles are matched exactly; a phase() call with no matching meta entry just gets its own progress group. Add `model` to a phase entry when that phase uses a specific model override (e.g. `{title: 'Verify', model: 'haiku'}`). Script body hooks: - agent(prompt: string, opts?: {label?: string, phase?: string, schema?: object, model?: string, isolation?: ${WORKFLOW_AGENT_ISOLATION_OPTION}, agentType?: string}): Promise — spawn a subagent. Without schema, returns its final text as a string. With schema (a JSON Schema), the subagent is forced to call a StructuredOutput tool and agent() returns the validated object — no parsing needed. Returns null if the user skips the agent mid-run (filter with .filter(Boolean)). opts.label overrides the display label. opts.phase explicitly assigns this agent to a progress group (use this inside pipeline()/parallel() stages to avoid races on the global phase() state — same phase string → same group box). opts.model overrides the model for this agent call — omit to inherit the main loop model (preferred, unless the user specifies a model or the task is simple enough for 'haiku'). opts.isolation: 'worktree' runs the agent in a fresh git worktree — EXPENSIVE (~200-500ms setup + disk per agent), use ONLY when agents mutate files in parallel and would otherwise conflict; the worktree is auto-removed if unchanged.${WORKFLOW_AGENT_ISOLATION_NOTE} opts.agentType uses a custom subagent type (e.g. 'Explore', 'code-reviewer') instead of the default workflow subagent — resolved from the same registry as the Agent tool; composes with schema (the custom agent's system prompt gets a StructuredOutput instruction appended). - pipeline(items, stage1, stage2, ...): Promise — run each item through all stages independently, NO barrier between stages. Item A can be in stage 3 while item B is still in stage 1. This is the DEFAULT for multi-stage work. Wall-clock = slowest single-item chain, not sum-of-slowest-per-stage. Every stage callback receives (prevResult, originalItem, index) — use originalItem/index in later stages to label work without threading context through stage 1's return value. A stage that throws drops that item to `null` and skips its remaining stages. - parallel(thunks: Array<() => Promise>): Promise — run tasks concurrently. This is a BARRIER: awaits all thunks before returning. A thunk that throws (or whose agent errors) resolves to `null` in the result array — the call itself never rejects, so `.filter(Boolean)` before using the results. Use ONLY when you genuinely need all results together. - log(message: string): void — emit a progress message to the user (shown as a narrator line above the progress tree) - phase(title: string): void — start a new phase; subsequent agent() calls are grouped under this title in the progress display - args: any — the value passed as Workflow's `args` input (undefined if not provided). Use this to parameterize named workflows — e.g. pass a research question, target path, or config object directly instead of via a side-channel file. - budget: {total: number|null, spent(): number, remaining(): number} — the turn's token target from the user's "+500k"-style directive. `budget.total` is null if no target was set. `budget.spent()` returns output tokens spent this turn across the main loop and all workflows — the pool is shared, not per-workflow. `budget.remaining()` returns `max(0, total - spent())`, or `Infinity` if no target. The target is a HARD ceiling, not advisory: once `spent()` reaches `total`, further `agent()` calls throw. Use for dynamic loops: `while (budget.total && budget.remaining() > 50_000) { ... }`, or static scaling: `const FLEET = budget.total ? Math.floor(budget.total / 100_000) : 5`. - workflow(nameOrRef: string | {scriptPath: string}, args?: any): Promise — run another workflow inline as a sub-step and return whatever it returns. Pass a name to invoke a saved workflow (same registry as {name: "..."}), or {scriptPath} to run a script file you Wrote earlier. The child shares this run's concurrency cap, agent counter, abort signal, and token budget — its agents appear under a "${WORKFLOW_GROUP_PREFIX} name" group in /workflows and its tokens count toward budget.spent(). The args param becomes the child's `args` global. Nesting is one level only: workflow() inside a child throws. Throws on unknown name / unreadable scriptPath / child syntax error; catch to handle gracefully. Subagents are told their final text IS the return value (not a human-facing message), so they return raw data. For structured output, use the schema option — validation happens at the tool-call layer so the model retries on mismatch. The script body runs in an async context — use await directly. Standard JS built-ins (JSON, Math, Array, etc.) are available — EXCEPT `Date.now()`/`Math.random()`/argless `new Date()`, which throw (they would break resume); pass timestamps in via `args`, stamp results after the workflow returns, and for randomness vary the agent prompt/label by index. No filesystem or Node.js API access. DEFAULT TO pipeline(). Only reach for a barrier (parallel between stages) when you genuinely need ALL prior-stage results together. A barrier is correct ONLY when stage N needs cross-item context from all of stage N-1: - Dedup/merge across the full result set before expensive downstream work - Early-exit if the total count is zero ("0 bugs found → skip verification entirely") - Stage N's prompt references "the other findings" for comparison A barrier is NOT justified by: - "I need to flatten/map/filter first" — do it inside a pipeline stage: pipeline(items, stageA, r => transform([r]).flat(), stageB) - "The stages are conceptually separate" — that's what pipeline() models. Separate stages ≠ synchronized stages. - "It's cleaner code" — barrier latency is real. If 5 finders run and the slowest takes 3× the fastest, a barrier wastes 2/3 of the fast finders' idle time. Smell test: if you wrote const a = await parallel(...) const b = transform(a) // flatten, map, filter — no cross-item dependency const c = await parallel(b.map(...)) that middle transform doesn't need the barrier. Rewrite as a pipeline with the transform inside a stage. When in doubt: pipeline. Concurrent agent() calls are capped at min(16, cpu cores - 2) per workflow — excess calls queue and run as slots free up. You can still pass 100 items to parallel()/pipeline() and they all complete; only ~10 run at any moment. Total agent count across a workflow's lifetime is capped at 1000 — a runaway-loop backstop set far above any real workflow. The canonical multi-stage pattern — pipeline by default, each dimension verifies as soon as its review completes: export const meta = { name: 'review-changes', description: 'Review changed files across dimensions, verify each finding', phases: [{ title: 'Review' }, { title: 'Verify' }], } const DIMENSIONS = [{key: 'bugs', prompt: '...'}, {key: 'perf', prompt: '...'}] const results = await pipeline( DIMENSIONS, d => agent(d.prompt, {label: `review:${d.key}`, phase: 'Review', schema: FINDINGS_SCHEMA}), review => parallel(review.findings.map(f => () => agent(`Adversarially verify: ${f.title}`, {label: `verify:${f.file}`, phase: 'Verify', schema: VERDICT_SCHEMA}) .then(v => ({...f, verdict: v})) )) ) const confirmed = results.flat().filter(Boolean).filter(f => f.verdict?.isReal) return { confirmed } // Dimension 'bugs' findings verify while dimension 'perf' is still reviewing. No wasted wall-clock. When a barrier IS correct — dedup across all findings before expensive verification: const all = await parallel(DIMENSIONS.map(d => () => agent(d.prompt, {schema: FINDINGS_SCHEMA}))) const deduped = dedupeByFileAndLine(all.filter(Boolean).flatMap(r => r.findings)) // <-- genuinely needs ALL at once const verified = await parallel(deduped.map(f => () => agent(verifyPrompt(f), {schema: VERDICT_SCHEMA}))) Loop-until-count pattern — accumulate to a target: const bugs = [] while (bugs.length < 10) { const result = await agent("Find bugs in this codebase.", {schema: BUGS_SCHEMA}) bugs.push(...result.bugs) log(`${bugs.length}/10 found`) } Loop-until-budget pattern — scale depth to the user's "+500k" directive. Guard on budget.total: with no target set, remaining() is Infinity and the loop would run straight to the 1000-agent cap. const bugs = [] while (budget.total && budget.remaining() > 50_000) { const result = await agent("Find bugs in this codebase.", {schema: BUGS_SCHEMA}) bugs.push(...result.bugs) log(`${bugs.length} found, ${Math.round(budget.remaining()/1000)}k remaining`) } Quality patterns — reach for these when findings will be acted on, not just reported: - Adversarial verify: spawn N independent skeptics per finding, each prompted to REFUTE. Kill if ≥majority refute. Prevents plausible-but-wrong findings from surviving. const votes = await parallel(Array.from({length: 3}, () => () => agent(`Try to refute: ${claim}. Default to refuted=true if uncertain.`, {schema: VERDICT}))) const survives = votes.filter(Boolean).filter(v => !v.refuted).length >= 2 - Judge panel: generate N independent attempts from different angles (e.g. MVP-first, risk-first, user-first), score with parallel judges, synthesize from the winner while grafting the best ideas from runners-up. Beats one-attempt-iterated when the solution space is wide. - Loop-until-dry: for unknown-size discovery (bugs, issues, edge cases), keep spawning finders until K consecutive rounds return nothing new. Simple counters (while count < N) miss the tail. Scale to what the user asked for. "find any bugs" → a few finders, single-vote verify. "thoroughly audit this" or "be comprehensive" → larger finder pool, 3–5 vote adversarial pass, synthesis stage. When unsure, lean toward thoroughness for research/review/audit requests and toward brevity for quick checks. These patterns aren't exhaustive — compose novel harnesses when the task calls for it (tournament brackets, self-repair loops, staged escalation, whatever fits). Use this tool for multi-step orchestration where control flow should be deterministic (loops, conditionals, fan-out) rather than model-driven. ## Resume The tool result includes a runId. To resume after a pause, kill, or script edit, relaunch with Workflow({scriptPath, resumeFromRunId}) — the longest unchanged prefix of agent() calls returns cached results instantly; the first edited/new call and everything after it runs live. Same script + same args → 100% cache hit. Date.now()/Math.random()/new Date() are unavailable in scripts (they would break this) — stamp results after the workflow returns, or pass timestamps via args. Fallback when no journal is available: Read agent-.jsonl files in the transcript directory and hand-author a continuation script.