JongHyeok Park 1031d312cc
feat(workflows): add orch-review native Workflow pilot (#2363)
* feat(workflows): add orch-review native Workflow pilot

Port orch-pipeline Phase 5 (Review) to a native Claude Code Workflow
script. The gated outer loop stays in the main conversation; this script
owns only the autonomous review+verify segment between the two human
gates:

1. Review — reviewers fan out in parallel: ecc:code-reviewer always,
   ecc:<language>-reviewer when args.language maps, ecc:security-reviewer
   when the orch-pipeline security trigger matches the diff/paths.
2. Dedup — merge findings across dimensions keyed on the normalized
   evidence snippet, since independent reviewers flag the same line.
3. Verify — each unique CRITICAL/HIGH finding goes to an independent
   adversarial verifier; MEDIUM/LOW pass through as advisory.

The Review->Verify barrier is deliberate: deduping before verification
stops the verifier running N times on the same bug (local testing: 11
raw findings collapsed to 4 unique, ~halving verifier cost).

Existing ECC reviewer subagents are reused via agentType; reviewer
output is validated by JSON schema. args is accepted as an object or a
JSON-encoded string.

- workflows/orch-review.workflow.js — the workflow script
- workflows/README.md — invocation contract, returns shape, follow-ups

CI lint is scoped to scripts/ and tests/, so the script (validated with
node --check) and the README (passes markdownlint) are untouched.

* fix(workflows): fail closed on invalid args and lost review dimensions

Addresses the two safety findings from the PR bot review:

1. Lost review dimension (Greptile P1 / CodeRabbit Major): a reviewer
   agent that returns null or rejects was silently dropped by
   filter(Boolean), so an unreviewed security dimension could still
   return APPROVE. Each dimension's outcome is now captured; failures
   land in failedDimensions and force CHANGES_REQUESTED (incomplete).

2. Invalid args (CodeRabbit Major): an empty diff returned APPROVE and
   bad JSON / non-array changedFiles threw inconsistently. Input is now
   validated up front and rejected with a clear error — the gate fails
   closed instead of approving an unreviewed payload.

Docs (header contract + README) updated for the new return fields
(incomplete, failedDimensions, stats.failed). Remaining bot nits
(evidence minLength, verify-label collision, verified->confirmed
rename, contract drift) deferred as follow-ups.

* fix(workflows): address remaining orch-review review nits

Follow-up to the bot review (deferred items from the safety pass):

- evidence: require minLength 1 in the schema, and fall back to a
  title+line dedup key when evidence is empty, so empty-evidence
  findings in one file no longer collapse onto a single key and drop
  (CodeRabbit).
- verify label: include a slice of the normalized evidence so two
  CRITICAL/HIGH findings from the same file get distinct labels and do
  not alias under resumability (Greptile).
- stats.verified -> stats.confirmed to match the "confirmed" wording
  used in the log and avoid ambiguity vs the refuted count (Greptile);
  header contract and README updated to match.

Verified by running the workflow on a synthetic vulnerable diff:
dedup 12 raw -> 5 unique, stats.confirmed populated, fail-closed fields
(incomplete/failedDimensions) intact.

* fix(workflows): harden verify stage and diff-only verification

Addresses the second-round bot review:

- Verify stage now has the same failure guard as the review stage: a
  rejected verifier no longer nulls out its slot (which crashed the
  later filter). A null return is treated as unconfirmed; a rejection
  keeps the finding as blocking (fail closed) so an unverifiable
  CRITICAL is never silently demoted to advisory (CodeRabbit @221).
- verifyPrompt now instructs the skeptic to judge solely from the
  provided diff text and not to refute merely because the referenced
  file is absent from the working tree (the diff may be an unapplied
  PR). Fixes the false-refute seen when testing on a synthetic diff.

CodeRabbit @81 (evidence minLength) was already addressed in the prior
commit; this is a stale re-post on the unresolved thread.

* fix(workflows): keep unverifiable blockers blocking; stop leaking error text

Second-round bot review (CodeRabbit):

- @218 Treat a null/failed verifier as `unverified`, not refuted. A
  terminal verifier failure or skip no longer demotes a CRITICAL/HIGH
  to advisory; it stays in `blocking` tagged "could not be verified"
  (fail closed). Only a genuine isReal=false verdict is refuted. Adds
  stats.unverified.
- @189 Do not return raw subagent error text. Review/verify failures
  now log the raw message for operators and return only a bounded label
  (failedDimensions[].error = "review agent failed").

Stale re-posts this round (@81 evidence minLength, @224 verify guard)
were already fixed in prior commits.

* docs(workflows): enumerate bounded failedDimensions.error labels

CodeRabbit (trivial): the public contract implied callers get
human-readable error text, but the implementation returns only bounded
labels. Enumerate them in the README returns block.
2026-06-29 15:50:41 -07:00
..

ECC native workflows (pilot)

Scripts in this directory are Claude Code Workflow tool scripts — deterministic, multi-agent orchestration that runs in the background and fans out to subagents.

This is a pilot: ECC's orchestration (orch-*, multi-*, GAN/Santa loops) is currently hand-rolled on top of the Task/Agent tool. These scripts port the autonomous, fan-out-heavy segments to the native engine, which gives us barrier-free pipelining, automatic concurrency capping, structured-output validation, and resumability for free.

orch-review.workflow.js

A native port of orch-pipeline Phase 5 (Review).

The gated outer loop (Gate 1 after Plan, Gate 2 before Commit) stays in the main conversation — native workflows run autonomously in the background and cannot pause for interactive approval. This script owns only the segment between the gates:

  1. Review — one reviewer agent per dimension, in parallel:
    • ecc:code-reviewer (correctness & quality) — always
    • the matching ecc:<language>-reviewer — when args.language maps to one
    • ecc:security-reviewer — only when the orch-pipeline security trigger matches the diff/paths
  2. Dedup — independent reviewers routinely flag the same line, so findings are merged across dimensions keyed on the normalized evidence snippet (titles and line numbers drift per reviewer; the offending code does not). Each surviving finding records which dimensions reported it and keeps the strictest severity.
  3. Verify — every unique CRITICAL/HIGH finding is handed to an independent adversarial verifier that defaults to refuted on uncertainty. MEDIUM/LOW pass through as advisory.

The Review→Verify barrier is deliberate: deduping before verification is exactly the case the Workflow guidance calls a justified barrier — it stops the verifier running N times on the same bug (in local testing, 11 raw findings collapsed to 4 unique, roughly halving verifier cost).

Invocation

The main loop computes the diff, then calls the Workflow tool:

Workflow({
  scriptPath: "workflows/orch-review.workflow.js",
  args: {
    diff: "<unified git diff text>",   // required
    language: "typescript",            // optional — selects a language reviewer
    changedFiles: ["src/auth.ts"]      // optional — feeds the security trigger
  }
})

Invalid input throws (the gate fails closed): a missing/empty diff, malformed JSON, or a non-array changedFiles is rejected with a clear error rather than silently approving an unreviewed payload.

Returns

{
  "verdict": "APPROVE" | "CHANGES_REQUESTED", // CHANGES_REQUESTED if any blocker OR a dimension failed
  "incomplete": false,            // true when one or more review dimensions failed to run
  "failedDimensions": [ /* { dimension, error }  error is a bounded label, never raw subagent text:
                           "agent returned null (terminal failure or skip)" | "review agent failed" */ ],
  "blocking": [ /* confirmed CRITICAL/HIGH + unverifiable ones  must clear before Gate 2 */ ],
  "advisory": [ /* MEDIUM/LOW + adversarially-refuted findings */ ],
  "stats": { "dimensions": 3, "failed": 0, "raw": 11, "unique": 4, "confirmed": 4, "unverified": 0, "refuted": 0 }
}

The main loop presents blocking at Gate 2; the human still approves the commit. The gate fails closed at every stage: if a reviewer dies the dimension is recorded in failedDimensions (verdict never a clean APPROVE), and if a verifier dies or returns null the blocker is kept in blocking (tagged "could not be verified") rather than demoted to advisory — an unreviewed security dimension or an unverifiable CRITICAL must not pass as approved.

Not in this PR (follow-ups)

  • A /orch-review command + skill trigger (plus the mirrored i18n docs and surface tests ECC requires for a new command surface).
  • Installer / manifest wiring so the script ships to ~/.claude/ on install.
  • Porting the Research sweep and Plan judge-panel segments next.