everything-claude-code

elias/everything-claude-code

Fork 0

mirror of https://github.com/affaan-m/everything-claude-code.git synced 2026-06-30 19:00:57 +08:00

Commit Graph

Author	SHA1	Message	Date
JongHyeok Park	1031d312cc	feat(workflows): add orch-review native Workflow pilot (#2363 ) * feat(workflows): add orch-review native Workflow pilot Port orch-pipeline Phase 5 (Review) to a native Claude Code Workflow script. The gated outer loop stays in the main conversation; this script owns only the autonomous review+verify segment between the two human gates: 1. Review — reviewers fan out in parallel: ecc:code-reviewer always, ecc:<language>-reviewer when args.language maps, ecc:security-reviewer when the orch-pipeline security trigger matches the diff/paths. 2. Dedup — merge findings across dimensions keyed on the normalized evidence snippet, since independent reviewers flag the same line. 3. Verify — each unique CRITICAL/HIGH finding goes to an independent adversarial verifier; MEDIUM/LOW pass through as advisory. The Review->Verify barrier is deliberate: deduping before verification stops the verifier running N times on the same bug (local testing: 11 raw findings collapsed to 4 unique, ~halving verifier cost). Existing ECC reviewer subagents are reused via agentType; reviewer output is validated by JSON schema. args is accepted as an object or a JSON-encoded string. - workflows/orch-review.workflow.js — the workflow script - workflows/README.md — invocation contract, returns shape, follow-ups CI lint is scoped to scripts/ and tests/, so the script (validated with node --check) and the README (passes markdownlint) are untouched. * fix(workflows): fail closed on invalid args and lost review dimensions Addresses the two safety findings from the PR bot review: 1. Lost review dimension (Greptile P1 / CodeRabbit Major): a reviewer agent that returns null or rejects was silently dropped by filter(Boolean), so an unreviewed security dimension could still return APPROVE. Each dimension's outcome is now captured; failures land in failedDimensions and force CHANGES_REQUESTED (incomplete). 2. Invalid args (CodeRabbit Major): an empty diff returned APPROVE and bad JSON / non-array changedFiles threw inconsistently. Input is now validated up front and rejected with a clear error — the gate fails closed instead of approving an unreviewed payload. Docs (header contract + README) updated for the new return fields (incomplete, failedDimensions, stats.failed). Remaining bot nits (evidence minLength, verify-label collision, verified->confirmed rename, contract drift) deferred as follow-ups. * fix(workflows): address remaining orch-review review nits Follow-up to the bot review (deferred items from the safety pass): - evidence: require minLength 1 in the schema, and fall back to a title+line dedup key when evidence is empty, so empty-evidence findings in one file no longer collapse onto a single key and drop (CodeRabbit). - verify label: include a slice of the normalized evidence so two CRITICAL/HIGH findings from the same file get distinct labels and do not alias under resumability (Greptile). - stats.verified -> stats.confirmed to match the "confirmed" wording used in the log and avoid ambiguity vs the refuted count (Greptile); header contract and README updated to match. Verified by running the workflow on a synthetic vulnerable diff: dedup 12 raw -> 5 unique, stats.confirmed populated, fail-closed fields (incomplete/failedDimensions) intact. * fix(workflows): harden verify stage and diff-only verification Addresses the second-round bot review: - Verify stage now has the same failure guard as the review stage: a rejected verifier no longer nulls out its slot (which crashed the later filter). A null return is treated as unconfirmed; a rejection keeps the finding as blocking (fail closed) so an unverifiable CRITICAL is never silently demoted to advisory (CodeRabbit @221). - verifyPrompt now instructs the skeptic to judge solely from the provided diff text and not to refute merely because the referenced file is absent from the working tree (the diff may be an unapplied PR). Fixes the false-refute seen when testing on a synthetic diff. CodeRabbit @81 (evidence minLength) was already addressed in the prior commit; this is a stale re-post on the unresolved thread. * fix(workflows): keep unverifiable blockers blocking; stop leaking error text Second-round bot review (CodeRabbit): - @218 Treat a null/failed verifier as `unverified`, not refuted. A terminal verifier failure or skip no longer demotes a CRITICAL/HIGH to advisory; it stays in `blocking` tagged "could not be verified" (fail closed). Only a genuine isReal=false verdict is refuted. Adds stats.unverified. - @189 Do not return raw subagent error text. Review/verify failures now log the raw message for operators and return only a bounded label (failedDimensions[].error = "review agent failed"). Stale re-posts this round (@81 evidence minLength, @224 verify guard) were already fixed in prior commits. * docs(workflows): enumerate bounded failedDimensions.error labels CodeRabbit (trivial): the public contract implied callers get human-readable error text, but the implementation returns only bounded labels. Enumerate them in the README returns block.	2026-06-29 15:50:41 -07:00

Author

SHA1

Message

Date

JongHyeok Park

1031d312cc

feat(workflows): add orch-review native Workflow pilot (#2363 )

* feat(workflows): add orch-review native Workflow pilot

Port orch-pipeline Phase 5 (Review) to a native Claude Code Workflow
script. The gated outer loop stays in the main conversation; this script
owns only the autonomous review+verify segment between the two human
gates:

1. Review — reviewers fan out in parallel: ecc:code-reviewer always,
   ecc:<language>-reviewer when args.language maps, ecc:security-reviewer
   when the orch-pipeline security trigger matches the diff/paths.
2. Dedup — merge findings across dimensions keyed on the normalized
   evidence snippet, since independent reviewers flag the same line.
3. Verify — each unique CRITICAL/HIGH finding goes to an independent
   adversarial verifier; MEDIUM/LOW pass through as advisory.

The Review->Verify barrier is deliberate: deduping before verification
stops the verifier running N times on the same bug (local testing: 11
raw findings collapsed to 4 unique, ~halving verifier cost).

Existing ECC reviewer subagents are reused via agentType; reviewer
output is validated by JSON schema. args is accepted as an object or a
JSON-encoded string.

- workflows/orch-review.workflow.js — the workflow script
- workflows/README.md — invocation contract, returns shape, follow-ups

CI lint is scoped to scripts/ and tests/, so the script (validated with
node --check) and the README (passes markdownlint) are untouched.

* fix(workflows): fail closed on invalid args and lost review dimensions

Addresses the two safety findings from the PR bot review:

1. Lost review dimension (Greptile P1 / CodeRabbit Major): a reviewer
   agent that returns null or rejects was silently dropped by
   filter(Boolean), so an unreviewed security dimension could still
   return APPROVE. Each dimension's outcome is now captured; failures
   land in failedDimensions and force CHANGES_REQUESTED (incomplete).

2. Invalid args (CodeRabbit Major): an empty diff returned APPROVE and
   bad JSON / non-array changedFiles threw inconsistently. Input is now
   validated up front and rejected with a clear error — the gate fails
   closed instead of approving an unreviewed payload.

Docs (header contract + README) updated for the new return fields
(incomplete, failedDimensions, stats.failed). Remaining bot nits
(evidence minLength, verify-label collision, verified->confirmed
rename, contract drift) deferred as follow-ups.

* fix(workflows): address remaining orch-review review nits

Follow-up to the bot review (deferred items from the safety pass):

- evidence: require minLength 1 in the schema, and fall back to a
  title+line dedup key when evidence is empty, so empty-evidence
  findings in one file no longer collapse onto a single key and drop
  (CodeRabbit).
- verify label: include a slice of the normalized evidence so two
  CRITICAL/HIGH findings from the same file get distinct labels and do
  not alias under resumability (Greptile).
- stats.verified -> stats.confirmed to match the "confirmed" wording
  used in the log and avoid ambiguity vs the refuted count (Greptile);
  header contract and README updated to match.

Verified by running the workflow on a synthetic vulnerable diff:
dedup 12 raw -> 5 unique, stats.confirmed populated, fail-closed fields
(incomplete/failedDimensions) intact.

* fix(workflows): harden verify stage and diff-only verification

Addresses the second-round bot review:

- Verify stage now has the same failure guard as the review stage: a
  rejected verifier no longer nulls out its slot (which crashed the
  later filter). A null return is treated as unconfirmed; a rejection
  keeps the finding as blocking (fail closed) so an unverifiable
  CRITICAL is never silently demoted to advisory (CodeRabbit @221).
- verifyPrompt now instructs the skeptic to judge solely from the
  provided diff text and not to refute merely because the referenced
  file is absent from the working tree (the diff may be an unapplied
  PR). Fixes the false-refute seen when testing on a synthetic diff.

CodeRabbit @81 (evidence minLength) was already addressed in the prior
commit; this is a stale re-post on the unresolved thread.

* fix(workflows): keep unverifiable blockers blocking; stop leaking error text

Second-round bot review (CodeRabbit):

- @218 Treat a null/failed verifier as `unverified`, not refuted. A
  terminal verifier failure or skip no longer demotes a CRITICAL/HIGH
  to advisory; it stays in `blocking` tagged "could not be verified"
  (fail closed). Only a genuine isReal=false verdict is refuted. Adds
  stats.unverified.
- @189 Do not return raw subagent error text. Review/verify failures
  now log the raw message for operators and return only a bounded label
  (failedDimensions[].error = "review agent failed").

Stale re-posts this round (@81 evidence minLength, @224 verify guard)
were already fixed in prior commits.

* docs(workflows): enumerate bounded failedDimensions.error labels

CodeRabbit (trivial): the public contract implied callers get
human-readable error text, but the implementation returns only bounded
labels. Enumerate them in the README returns block.

2026-06-29 15:50:41 -07:00

1 Commits