mirror of
https://github.com/affaan-m/everything-claude-code.git
synced 2026-07-01 03:11:31 +08:00
* feat(workflows): add orch-review native Workflow pilot Port orch-pipeline Phase 5 (Review) to a native Claude Code Workflow script. The gated outer loop stays in the main conversation; this script owns only the autonomous review+verify segment between the two human gates: 1. Review — reviewers fan out in parallel: ecc:code-reviewer always, ecc:<language>-reviewer when args.language maps, ecc:security-reviewer when the orch-pipeline security trigger matches the diff/paths. 2. Dedup — merge findings across dimensions keyed on the normalized evidence snippet, since independent reviewers flag the same line. 3. Verify — each unique CRITICAL/HIGH finding goes to an independent adversarial verifier; MEDIUM/LOW pass through as advisory. The Review->Verify barrier is deliberate: deduping before verification stops the verifier running N times on the same bug (local testing: 11 raw findings collapsed to 4 unique, ~halving verifier cost). Existing ECC reviewer subagents are reused via agentType; reviewer output is validated by JSON schema. args is accepted as an object or a JSON-encoded string. - workflows/orch-review.workflow.js — the workflow script - workflows/README.md — invocation contract, returns shape, follow-ups CI lint is scoped to scripts/ and tests/, so the script (validated with node --check) and the README (passes markdownlint) are untouched. * fix(workflows): fail closed on invalid args and lost review dimensions Addresses the two safety findings from the PR bot review: 1. Lost review dimension (Greptile P1 / CodeRabbit Major): a reviewer agent that returns null or rejects was silently dropped by filter(Boolean), so an unreviewed security dimension could still return APPROVE. Each dimension's outcome is now captured; failures land in failedDimensions and force CHANGES_REQUESTED (incomplete). 2. Invalid args (CodeRabbit Major): an empty diff returned APPROVE and bad JSON / non-array changedFiles threw inconsistently. Input is now validated up front and rejected with a clear error — the gate fails closed instead of approving an unreviewed payload. Docs (header contract + README) updated for the new return fields (incomplete, failedDimensions, stats.failed). Remaining bot nits (evidence minLength, verify-label collision, verified->confirmed rename, contract drift) deferred as follow-ups. * fix(workflows): address remaining orch-review review nits Follow-up to the bot review (deferred items from the safety pass): - evidence: require minLength 1 in the schema, and fall back to a title+line dedup key when evidence is empty, so empty-evidence findings in one file no longer collapse onto a single key and drop (CodeRabbit). - verify label: include a slice of the normalized evidence so two CRITICAL/HIGH findings from the same file get distinct labels and do not alias under resumability (Greptile). - stats.verified -> stats.confirmed to match the "confirmed" wording used in the log and avoid ambiguity vs the refuted count (Greptile); header contract and README updated to match. Verified by running the workflow on a synthetic vulnerable diff: dedup 12 raw -> 5 unique, stats.confirmed populated, fail-closed fields (incomplete/failedDimensions) intact. * fix(workflows): harden verify stage and diff-only verification Addresses the second-round bot review: - Verify stage now has the same failure guard as the review stage: a rejected verifier no longer nulls out its slot (which crashed the later filter). A null return is treated as unconfirmed; a rejection keeps the finding as blocking (fail closed) so an unverifiable CRITICAL is never silently demoted to advisory (CodeRabbit @221). - verifyPrompt now instructs the skeptic to judge solely from the provided diff text and not to refute merely because the referenced file is absent from the working tree (the diff may be an unapplied PR). Fixes the false-refute seen when testing on a synthetic diff. CodeRabbit @81 (evidence minLength) was already addressed in the prior commit; this is a stale re-post on the unresolved thread. * fix(workflows): keep unverifiable blockers blocking; stop leaking error text Second-round bot review (CodeRabbit): - @218 Treat a null/failed verifier as `unverified`, not refuted. A terminal verifier failure or skip no longer demotes a CRITICAL/HIGH to advisory; it stays in `blocking` tagged "could not be verified" (fail closed). Only a genuine isReal=false verdict is refuted. Adds stats.unverified. - @189 Do not return raw subagent error text. Review/verify failures now log the raw message for operators and return only a bounded label (failedDimensions[].error = "review agent failed"). Stale re-posts this round (@81 evidence minLength, @224 verify guard) were already fixed in prior commits. * docs(workflows): enumerate bounded failedDimensions.error labels CodeRabbit (trivial): the public contract implied callers get human-readable error text, but the implementation returns only bounded labels. Enumerate them in the README returns block.
60 lines
4.0 KiB
Markdown
60 lines
4.0 KiB
Markdown
# ECC native workflows (pilot)
|
|
|
|
Scripts in this directory are [Claude Code **Workflow** tool](https://docs.claude.com/en/docs/claude-code) scripts — deterministic, multi-agent orchestration that runs in the background and fans out to subagents.
|
|
|
|
This is a **pilot**: ECC's orchestration (`orch-*`, `multi-*`, GAN/Santa loops) is currently hand-rolled on top of the `Task`/Agent tool. These scripts port the autonomous, fan-out-heavy segments to the native engine, which gives us barrier-free pipelining, automatic concurrency capping, structured-output validation, and resumability for free.
|
|
|
|
## `orch-review.workflow.js`
|
|
|
|
A native port of **orch-pipeline Phase 5 (Review)**.
|
|
|
|
The gated outer loop (Gate 1 after Plan, Gate 2 before Commit) **stays in the main conversation** — native workflows run autonomously in the background and cannot pause for interactive approval. This script owns only the segment *between* the gates:
|
|
|
|
1. **Review** — one reviewer agent per dimension, in parallel:
|
|
- `ecc:code-reviewer` (correctness & quality) — always
|
|
- the matching `ecc:<language>-reviewer` — when `args.language` maps to one
|
|
- `ecc:security-reviewer` — only when the orch-pipeline security trigger matches the diff/paths
|
|
2. **Dedup** — independent reviewers routinely flag the same line, so findings are merged across dimensions keyed on the normalized `evidence` snippet (titles and line numbers drift per reviewer; the offending code does not). Each surviving finding records which `dimensions` reported it and keeps the strictest severity.
|
|
3. **Verify** — every *unique* `CRITICAL`/`HIGH` finding is handed to an independent adversarial verifier that defaults to *refuted* on uncertainty. `MEDIUM`/`LOW` pass through as advisory.
|
|
|
|
The Review→Verify barrier is deliberate: deduping before verification is exactly the case the Workflow guidance calls a justified barrier — it stops the verifier running N times on the same bug (in local testing, 11 raw findings collapsed to 4 unique, roughly halving verifier cost).
|
|
|
|
### Invocation
|
|
|
|
The main loop computes the diff, then calls the Workflow tool:
|
|
|
|
```jsonc
|
|
Workflow({
|
|
scriptPath: "workflows/orch-review.workflow.js",
|
|
args: {
|
|
diff: "<unified git diff text>", // required
|
|
language: "typescript", // optional — selects a language reviewer
|
|
changedFiles: ["src/auth.ts"] // optional — feeds the security trigger
|
|
}
|
|
})
|
|
```
|
|
|
|
Invalid input throws (the gate **fails closed**): a missing/empty `diff`, malformed JSON, or a non-array `changedFiles` is rejected with a clear error rather than silently approving an unreviewed payload.
|
|
|
|
### Returns
|
|
|
|
```jsonc
|
|
{
|
|
"verdict": "APPROVE" | "CHANGES_REQUESTED", // CHANGES_REQUESTED if any blocker OR a dimension failed
|
|
"incomplete": false, // true when one or more review dimensions failed to run
|
|
"failedDimensions": [ /* { dimension, error } — error is a bounded label, never raw subagent text:
|
|
"agent returned null (terminal failure or skip)" | "review agent failed" */ ],
|
|
"blocking": [ /* confirmed CRITICAL/HIGH + unverifiable ones — must clear before Gate 2 */ ],
|
|
"advisory": [ /* MEDIUM/LOW + adversarially-refuted findings */ ],
|
|
"stats": { "dimensions": 3, "failed": 0, "raw": 11, "unique": 4, "confirmed": 4, "unverified": 0, "refuted": 0 }
|
|
}
|
|
```
|
|
|
|
The main loop presents `blocking` at Gate 2; the human still approves the commit. The gate fails closed at every stage: if a reviewer dies the dimension is recorded in `failedDimensions` (verdict never a clean `APPROVE`), and if a *verifier* dies or returns null the blocker is kept in `blocking` (tagged "could not be verified") rather than demoted to advisory — an unreviewed security dimension or an unverifiable CRITICAL must not pass as approved.
|
|
|
|
## Not in this PR (follow-ups)
|
|
|
|
- A `/orch-review` command + skill trigger (plus the mirrored i18n docs and surface tests ECC requires for a new command surface).
|
|
- Installer / manifest wiring so the script ships to `~/.claude/` on install.
|
|
- Porting the **Research** sweep and **Plan** judge-panel segments next.
|