mirror of
https://github.com/affaan-m/everything-claude-code.git
synced 2026-06-30 19:00:57 +08:00
1 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
1031d312cc
|
feat(workflows): add orch-review native Workflow pilot (#2363)
* feat(workflows): add orch-review native Workflow pilot Port orch-pipeline Phase 5 (Review) to a native Claude Code Workflow script. The gated outer loop stays in the main conversation; this script owns only the autonomous review+verify segment between the two human gates: 1. Review — reviewers fan out in parallel: ecc:code-reviewer always, ecc:<language>-reviewer when args.language maps, ecc:security-reviewer when the orch-pipeline security trigger matches the diff/paths. 2. Dedup — merge findings across dimensions keyed on the normalized evidence snippet, since independent reviewers flag the same line. 3. Verify — each unique CRITICAL/HIGH finding goes to an independent adversarial verifier; MEDIUM/LOW pass through as advisory. The Review->Verify barrier is deliberate: deduping before verification stops the verifier running N times on the same bug (local testing: 11 raw findings collapsed to 4 unique, ~halving verifier cost). Existing ECC reviewer subagents are reused via agentType; reviewer output is validated by JSON schema. args is accepted as an object or a JSON-encoded string. - workflows/orch-review.workflow.js — the workflow script - workflows/README.md — invocation contract, returns shape, follow-ups CI lint is scoped to scripts/ and tests/, so the script (validated with node --check) and the README (passes markdownlint) are untouched. * fix(workflows): fail closed on invalid args and lost review dimensions Addresses the two safety findings from the PR bot review: 1. Lost review dimension (Greptile P1 / CodeRabbit Major): a reviewer agent that returns null or rejects was silently dropped by filter(Boolean), so an unreviewed security dimension could still return APPROVE. Each dimension's outcome is now captured; failures land in failedDimensions and force CHANGES_REQUESTED (incomplete). 2. Invalid args (CodeRabbit Major): an empty diff returned APPROVE and bad JSON / non-array changedFiles threw inconsistently. Input is now validated up front and rejected with a clear error — the gate fails closed instead of approving an unreviewed payload. Docs (header contract + README) updated for the new return fields (incomplete, failedDimensions, stats.failed). Remaining bot nits (evidence minLength, verify-label collision, verified->confirmed rename, contract drift) deferred as follow-ups. * fix(workflows): address remaining orch-review review nits Follow-up to the bot review (deferred items from the safety pass): - evidence: require minLength 1 in the schema, and fall back to a title+line dedup key when evidence is empty, so empty-evidence findings in one file no longer collapse onto a single key and drop (CodeRabbit). - verify label: include a slice of the normalized evidence so two CRITICAL/HIGH findings from the same file get distinct labels and do not alias under resumability (Greptile). - stats.verified -> stats.confirmed to match the "confirmed" wording used in the log and avoid ambiguity vs the refuted count (Greptile); header contract and README updated to match. Verified by running the workflow on a synthetic vulnerable diff: dedup 12 raw -> 5 unique, stats.confirmed populated, fail-closed fields (incomplete/failedDimensions) intact. * fix(workflows): harden verify stage and diff-only verification Addresses the second-round bot review: - Verify stage now has the same failure guard as the review stage: a rejected verifier no longer nulls out its slot (which crashed the later filter). A null return is treated as unconfirmed; a rejection keeps the finding as blocking (fail closed) so an unverifiable CRITICAL is never silently demoted to advisory (CodeRabbit @221). - verifyPrompt now instructs the skeptic to judge solely from the provided diff text and not to refute merely because the referenced file is absent from the working tree (the diff may be an unapplied PR). Fixes the false-refute seen when testing on a synthetic diff. CodeRabbit @81 (evidence minLength) was already addressed in the prior commit; this is a stale re-post on the unresolved thread. * fix(workflows): keep unverifiable blockers blocking; stop leaking error text Second-round bot review (CodeRabbit): - @218 Treat a null/failed verifier as `unverified`, not refuted. A terminal verifier failure or skip no longer demotes a CRITICAL/HIGH to advisory; it stays in `blocking` tagged "could not be verified" (fail closed). Only a genuine isReal=false verdict is refuted. Adds stats.unverified. - @189 Do not return raw subagent error text. Review/verify failures now log the raw message for operators and return only a bounded label (failedDimensions[].error = "review agent failed"). Stale re-posts this round (@81 evidence minLength, @224 verify guard) were already fixed in prior commits. * docs(workflows): enumerate bounded failedDimensions.error labels CodeRabbit (trivial): the public contract implied callers get human-readable error text, but the implementation returns only bounded labels. Enumerate them in the README returns block. |