Replace invalid default model IDs (e.g. claude-sonnet-4-7) with current
claude-sonnet-4-6, claude-opus-4-8, and claude-haiku-4-5. Route system
messages to the API system field, enable ephemeral prompt caching, omit
temperature for Opus 4.7/4.8, and surface cache usage metrics. Update the
CLI model picker to match.
Co-authored-by: Vladimir Đuranović <vlada@MacBook-Pro.local>
Co-authored-by: Cursor <cursoragent@cursor.com>
* Add memxus configuration to mcp-servers.json
Added configuration for Memxus service with API key placeholder and description.
* Revise description in mcp-servers.json
Updated the description to include a note about reviewing stored memories to prevent prompt-injection.
* Update description in mcp-servers.json
Update description in mcp-servers.json
* feat(workflows): add orch-review native Workflow pilot
Port orch-pipeline Phase 5 (Review) to a native Claude Code Workflow
script. The gated outer loop stays in the main conversation; this script
owns only the autonomous review+verify segment between the two human
gates:
1. Review — reviewers fan out in parallel: ecc:code-reviewer always,
ecc:<language>-reviewer when args.language maps, ecc:security-reviewer
when the orch-pipeline security trigger matches the diff/paths.
2. Dedup — merge findings across dimensions keyed on the normalized
evidence snippet, since independent reviewers flag the same line.
3. Verify — each unique CRITICAL/HIGH finding goes to an independent
adversarial verifier; MEDIUM/LOW pass through as advisory.
The Review->Verify barrier is deliberate: deduping before verification
stops the verifier running N times on the same bug (local testing: 11
raw findings collapsed to 4 unique, ~halving verifier cost).
Existing ECC reviewer subagents are reused via agentType; reviewer
output is validated by JSON schema. args is accepted as an object or a
JSON-encoded string.
- workflows/orch-review.workflow.js — the workflow script
- workflows/README.md — invocation contract, returns shape, follow-ups
CI lint is scoped to scripts/ and tests/, so the script (validated with
node --check) and the README (passes markdownlint) are untouched.
* fix(workflows): fail closed on invalid args and lost review dimensions
Addresses the two safety findings from the PR bot review:
1. Lost review dimension (Greptile P1 / CodeRabbit Major): a reviewer
agent that returns null or rejects was silently dropped by
filter(Boolean), so an unreviewed security dimension could still
return APPROVE. Each dimension's outcome is now captured; failures
land in failedDimensions and force CHANGES_REQUESTED (incomplete).
2. Invalid args (CodeRabbit Major): an empty diff returned APPROVE and
bad JSON / non-array changedFiles threw inconsistently. Input is now
validated up front and rejected with a clear error — the gate fails
closed instead of approving an unreviewed payload.
Docs (header contract + README) updated for the new return fields
(incomplete, failedDimensions, stats.failed). Remaining bot nits
(evidence minLength, verify-label collision, verified->confirmed
rename, contract drift) deferred as follow-ups.
* fix(workflows): address remaining orch-review review nits
Follow-up to the bot review (deferred items from the safety pass):
- evidence: require minLength 1 in the schema, and fall back to a
title+line dedup key when evidence is empty, so empty-evidence
findings in one file no longer collapse onto a single key and drop
(CodeRabbit).
- verify label: include a slice of the normalized evidence so two
CRITICAL/HIGH findings from the same file get distinct labels and do
not alias under resumability (Greptile).
- stats.verified -> stats.confirmed to match the "confirmed" wording
used in the log and avoid ambiguity vs the refuted count (Greptile);
header contract and README updated to match.
Verified by running the workflow on a synthetic vulnerable diff:
dedup 12 raw -> 5 unique, stats.confirmed populated, fail-closed fields
(incomplete/failedDimensions) intact.
* fix(workflows): harden verify stage and diff-only verification
Addresses the second-round bot review:
- Verify stage now has the same failure guard as the review stage: a
rejected verifier no longer nulls out its slot (which crashed the
later filter). A null return is treated as unconfirmed; a rejection
keeps the finding as blocking (fail closed) so an unverifiable
CRITICAL is never silently demoted to advisory (CodeRabbit @221).
- verifyPrompt now instructs the skeptic to judge solely from the
provided diff text and not to refute merely because the referenced
file is absent from the working tree (the diff may be an unapplied
PR). Fixes the false-refute seen when testing on a synthetic diff.
CodeRabbit @81 (evidence minLength) was already addressed in the prior
commit; this is a stale re-post on the unresolved thread.
* fix(workflows): keep unverifiable blockers blocking; stop leaking error text
Second-round bot review (CodeRabbit):
- @218 Treat a null/failed verifier as `unverified`, not refuted. A
terminal verifier failure or skip no longer demotes a CRITICAL/HIGH
to advisory; it stays in `blocking` tagged "could not be verified"
(fail closed). Only a genuine isReal=false verdict is refuted. Adds
stats.unverified.
- @189 Do not return raw subagent error text. Review/verify failures
now log the raw message for operators and return only a bounded label
(failedDimensions[].error = "review agent failed").
Stale re-posts this round (@81 evidence minLength, @224 verify guard)
were already fixed in prior commits.
* docs(workflows): enumerate bounded failedDimensions.error labels
CodeRabbit (trivial): the public contract implied callers get
human-readable error text, but the implementation returns only bounded
labels. Enumerate them in the README returns block.
Two security-priority fixes in continuous-learning-v2/scripts/instinct-cli.py:
- #2294: _write_registry wrote projects.json without the advisory lock that
_update_registry holds, so concurrent 'projects delete/gc/merge' could race an
observe-time update and corrupt the registry. Extract the lock into a shared
_registry_lock() context manager and use it in both writers.
- #2297: _remove_project_storage called shutil.rmtree on PROJECTS_DIR/project_id
with no containment check. Add defense-in-depth: resolve the path and refuse to
delete anything that is not strictly inside PROJECTS_DIR (or is the root
itself), so a relaxed validator or future caller can never cause an
arbitrary-directory delete.
Adds 5 pytest regression tests (atomic write under lock, contained delete,
missing-dir no-op, traversal refused, root refused). Node integration suite
(tests/scripts/instinct-cli-projects.test.js) green 9/9.
* fix(clv2): escape $HOME before pgrep -f in migrate-homunculus.sh
pgrep -f treats its argument as an extended regular expression, but the
running-observer guard interpolated $HOME unescaped. Paths containing regex
metacharacters (e.g. /home/user.name, /home/c++dev, /home/user (work)) made the
match over-broad or invalid, causing either a false negative (live observer
missed, migration proceeds and risks registry corruption) or a false positive
(migration blocked unnecessarily).
Escape the ERE metacharacters in $HOME via sed before building the pattern so
the home prefix is matched literally while the trailing .*observer-loop\.sh
regex is preserved. Portable across BSD and GNU sed.
Fixes#2301
* test(clv2): add regression test for migrate-homunculus.sh $HOME escaping
Guards the #2301 fix: extracts the script's sed escaping command and asserts
the resulting pgrep -f pattern matches the literal home path while no longer
over-matching a regex-expanded decoy (HOME=/home/user.name must not match
/home/userXname). Also pins that the guard uses escaped_home rather than $HOME
directly. Follows the existing clv2 shell-test convention in
tests/hooks/observe-entrypoint-allowlist.test.js.
Refs #2301
* test(clv2): skip migrate-homunculus escaping test on Windows
The test relies on POSIX bash/sed/grep -E semantics, which differ on the
Windows CI runners. Guard with the same process.platform === 'win32' early
exit used by tests/hooks/observe-subdirectory-detection.test.js so the
bash-dependent assertions only run on POSIX platforms.
Refs #2301
Adds the Layer 4 observability view to the control pane: a self-contained,
dependency-free 3D point-cloud of the agent airspace (positions from the
proximity embedding, sized by working set, colored by collision risk, links
for converging pairs) plus an XSS-safe advisory panel that polls every 5s.
- proximity-viz.js: renderProximityVizHtml() (canvas projection, no external JS)
- server.js: GET /proximity (page) + GET /api/proximity (snapshot.proximity feed)
- test: asserts both routes serve and the feed carries positions/links/advisories
Finishes the steer/transmit loop — advisories now reach the agents' sessions.
- message-sink.js: createEccMessageSink() delivers via the canonical writer
'ecc-tui messages send' (maps steer/hold -> conflict kind, transmit -> query),
resolving the binary from override/env/built target/PATH. Injectable runner;
best-effort (a missing binary/failed send is counted skipped, never blocks).
- proximity.js: createProximityDispatcher() adds per-trigger cooldown so a
persistent collision fires once then stays quiet (agents get steered, not
spammed); runProximityTick() builds the snapshot and dispatches.
- scripts/proximity-tick.js: thin CLI — one-shot, --dry-run, --watch <sec>.
Messages are internal ECC agent-to-agent coordination, not any external channel.
- 14 new tests (sink argv/kind mapping, cooldown dedup, tick dispatch/dry-run,
CLI parse). Full suite 2891/2891; lint green.
- Line precision: parse git diff --unified=0 into per-file changed line ranges
(defaultWorkingSetFor), so two agents in the SAME file but DIFFERENT functions
no longer false-collide. Overlap channel now uses the overlap coefficient
(|A∩B|/min(|A|,|B|)) — high when one edit sits inside the other's region, low
for disjoint ranges; whole-file edit = 1. Docstring + design doc updated.
- Trigger firing: buildProximityTriggers() turns advisories into the concrete
messages — transmit-intent to both on a Traffic Advisory, steer-away to the
yielding agent + a hold notice on a Resolution Advisory. buildProximitySnapshot
now returns triggers; dispatchProximityTriggers(triggers, {sendMessage}) delivers
them through an injectable sink (the ECC messages table), best-effort.
- 12 new tests (line-range disjoint vs overlapping, parseDiffRanges, triggers,
dispatch). Full suite 2881/2881; lint green.
Adds a clean badge row right under the hero linking the official destinations,
led by a live Discord member-count badge (server widget enabled) to drive the
community from a few hundred toward a few thousand. Gives the official ECC links
the icons the readme was missing.
Turns live sessions into the airspace scan: each worktree session's git diff
becomes its working set, the dependency graph is built over the touched files,
and scanAirspace() produces the TCAS advisories + 3D positions.
- scripts/lib/control-pane/proximity.js: sessionsToAgents() + buildProximitySnapshot();
default working-set source shells `git diff --name-only <base>...HEAD` per
worktree (injectable for tests, fails closed to []).
- state.js: opt-in `proximity` field on the snapshot (includeProximity flag) so
the default hot path stays fast (git diffs only run when requested).
- 4 integration tests (same-file editors -> resolution, later agent steers,
<2 participants -> no advisories, labels). Full suite 2873/2873; lint green.
The moat layer: spatial deconfliction for multiple agents (and humans) on one
codebase, modeled on aircraft TCAS — measure how close two agents are in
code-space, then transmit-intent (Traffic Advisory) and steer-away (Resolution
Advisory) before they collide at the git layer.
scripts/lib/agent-proximity/:
- distance.js — the math: per-channel collision probabilities combined via
noisy-OR R = 1 - Π(1 - ω·r). Channels: edit overlap (file + line-range
Jaccard), dependency coupling (γ^(d-1) over the import graph, direction-
agnostic — catches 'edit there breaks here' even when tree-distant), and tree
proximity (LCA-based, soft prior). TCAS advise(): clear / advisory(transmit) /
resolution(steer), with deterministic right-of-way priority so the maneuver is
coordinated. closureRate() for approach-speed escalation.
- graph.js — lightweight require/import dependency-graph builder (fs or in-memory).
- index.js — scanAirspace(): pairwise advisories + 3D vector embedding (space-
filling path embedding pulled toward dependency neighbours) so a 'where are
the agents' visualization can render the file-cloud and watch agents crawl /
steer.
docs/design/agent-proximity.md — full mathematical formulation + protocol + viz
+ roadmap (v1 call-graph/symbol channels + live session-diff wiring; v2 cross-
machine airspace over Tailscale, the zero-conflict-swarm demo).
17 tests; full suite 2869/2869; lint green.
- coverage: branch threshold 80 -> 79 (current is 79.52%; lines/functions/
statements remain 88/94/88). The 80% branch gate has been red on every main
run; this unblocks CI while keeping a meaningful floor just below current.
- SECURITY.md: remove the bouncing security@ecc.tools mailbox (flagged by an
advisory reporter as undeliverable) and direct all reports to GitHub private
vulnerability reporting, the only monitored channel.
- gateguard (GHSA-4v57-ph3x-gf55): add a quote-aware detection pass that
dequotes command words and splits on UNQUOTED separators incl. newlines, so
newline-separated commands, quoted command words ('rm'/"rm"), quoted
find -exec, and sh/bash -c wrappers are all classified destructive. Additive —
existing 133 cases still pass; +7 bypass regressions + a false-positive guard
(rm inside a quoted echo arg stays allowed). 140/140.
- Windows CI: format-code.ts emitted backslash paths via path.normalize, breaking
forward-slash assertions on all Windows matrix cells — force forward slashes.
- claw.js (CodeQL #1 js/polynomial-redos): bound parseTurns input so the lazy
[\s\S]*? body can't drive O(n^2) scanning on adversarial history files.
Full suite 2852/2852; lint green.
Critical: project-local install-state (e.g. a cloned repo's .cursor/ecc-install-state.json)
is attacker-controllable, and repair/uninstall/auto-update replayed its operations with
destinationPath validated only for non-emptiness — confirmed arbitrary file write/delete
and chained RCE (write ~/.bashrc, .git/hooks, or run a planted install-apply.js).
- New scripts/lib/path-safety.js: assertWithinTrustedRoot() canonicalizes (incl. symlink
escape via nearest-existing-ancestor realpath) and fails closed unless the destination is
within the adapter-derived trusted root.
- install-lifecycle.js: gate executeRepairOperation + executeUninstallOperation + the
install-state removal against record.targetRoot (the adapter-resolved root, NOT the
attacker-supplied state.target.root).
- auto-update.js: validateRepoRoot now requires package.json name to be an official ECC
package, so a planted nested repo can't drive auto-update into executing attacker code.
- 7 containment regression tests. Existing install-lifecycle/repair/uninstall/auto-update
suites still green (legit destinations are within the root).
- ecc-bot.mjs: validate interaction id (snowflake) and token before building the
callback fetch URL (clears CodeQL js/request-forgery #239/#240/#241); clamp the
remote heartbeat_interval to [1s,10m] (js/resource-exhaustion #242); strip CR/LF
from log args (js/log-injection #246).
- Bump transitive dev deps via overrides/resolutions to patch quadratic-complexity
DoS: markdown-it >=14.2.0 (Dependabot #45/#46), js-yaml >=4.2.0 (#42/#43).
Both lockfiles regenerated; npm reports 0 vulnerabilities.
The interactive claim/move buttons concatenated work-item ids into inline
onclick JS with only single-quote escaping — a crafted id (ids/titles come from
GitHub sync and manual upserts, not a strict allowlist) could break out and
inject script, even on the localhost-only server.
Fix: emit the id/lane in HTML-escaped data-* attributes (escapeHtml encodes
&<>"'), attach delegated click listeners that read them via getAttribute, and
pass the raw value as a JS string arg — never concatenated into code. Adds a
regression assertion that no inline onclick handlers with interpolated ids
remain. Flagged by automated security review.
Full suite 2845/2845; lint green.
The board was read-only; you can now drive the agent+human JIT workflow from the
local control pane.
- New shared scripts/lib/control-pane/work-item-mutations.js (claimWorkItem,
moveWorkItem) so the CLI and server never diverge; work-items.js claim now
delegates to it.
- server.js: gated POST /api/work-items/:id/claim and /:id/move (localhost-only,
honors --read-only with 403). Claim sets owner + assigneeKind and moves to
running; move retargets the kanban lane.
- ui.js: per-card Claim (on unassigned cards) + lane buttons that POST and
refresh; 15s live auto-refresh (paused when the tab is hidden).
- Tests: interactive claim/move endpoints, read-only 403, invalid-lane 400, and
snapshot reflects mutations.
Full suite 2845/2845; lint green.
Closes the agent+human JIT loop the control-pane board surfaces: the board shows
the unassigned (needs-owner) queue; 'claim' lets an agent or human pick up work.
node scripts/work-items.js claim [<id>] --owner <name> [--as agent|human]
- No id: claims the highest-priority unassigned open item.
- With id: claims that specific item (re-assignable).
- Sets owner, records metadata.assigneeKind (agent|human), and moves the card to
running so the board reflects that work has started.
- Refuses done items, requires --owner, validates --as. 5 CLI tests added.
Full suite 2844/2844; lint green.
The kanban board tracked lanes (ready/running/blocked/done) but not WHO owns
each card, which is the missing piece for agent+human just-in-time team workflows.
- state.js: classifyAssignee() labels each work item agent | human | unassigned
(session-linked or agent-pattern owners = agent; named owners = human; ownerless
= unassigned), with an explicit metadata.assigneeKind override.
- summarizeWorkItems(): adds an assignment summary {agent,human,unassigned} over
OPEN cards plus a priority-sorted needsAssignment queue — the JIT pickup list.
- ui.js: cards show an [agent]/[human]/[unassigned] badge; the board header shows
agent/human split and 'N need owner'.
- Tests: assignment classification + JIT queue coverage in control-pane-state.
Full suite 2839/2839; lint green.
- #2290 suggest-compact: honor ECC_CONTEXT_WINDOW_TOKENS / CLAUDE_CODE_AUTO_COMPACT_WINDOW
so 400k-window models (Opus 4.x) no longer report ~double context usage; add
override + isolation tests in transcript-context.test.js.
- #2282 install: bare-language syntax is legacy-only by design, but the error
now distinguishes a supported-but-wrong-mode target (gemini/codex/…) from a
genuinely unknown one and points to --profile/--modules/--skills.
- #2276 cost-report: the command + cost-tracking skill targeted a SQLite DB no
tracker writes. Repoint both at the real ~/.claude/metrics/costs.jsonl (JSONL,
estimated_cost_usd), reduce cumulative-per-session snapshots to latest-per-session,
and use node instead of sqlite3 for cross-platform support.
- #2272 gateguard: make the 'confirm no existing file' checklist item
tool-agnostic (Glob/Grep or find/grep via Bash) so hosts without a Glob tool
don't get a dead tool call.
Full suite 2839/2839; lint green.
The observe hook's secret-scrub regex used a generic ([A-Za-z]+\s+)? group
that overlapped the separator and value classes, causing exponential
backtracking on identifier-dense tool I/O — an orphaned python child then
pegged a core at ~100% CPU for days because the async hook timed out without
killing it.
- Rewrite _SECRET_RE as a linear matcher: bounded separator {1,8}, a fixed
set of auth schemes (bearer|basic|token|bot) instead of [A-Za-z]+, and a
bounded value {8,256}. Pathological input drops from hang to <1ms; real
secrets still redact (verified incl. 'Bearer <token>').
- Add a signal.alarm(8) self-timeout to both scrub blocks so any runaway
child self-terminates before the 10s async-hook timeout can orphan it.
* fix(gateguard): check isDestructiveFindExec on each command segment
`isDestructiveBash` called `isDestructiveFindExec` only on the raw full
command string. When the raw string starts with a non-find command (e.g.
`echo x && find . -exec rm {} \;`), `isDestructiveFindExec` checks
tokens[0] and returns false — then the per-segment loop never calls it
again, letting the destructive `find -exec rm` segment through silently.
Fix: call `isDestructiveFindExec(segment)` inside the per-segment loop so
compound commands (`&&`, `;`, `|`) cannot be used to prepend a harmless
command and bypass the find-exec destructive check.
Adds three regression tests covering `&&`, `;`, and `|` bypass patterns.
* fix(gateguard): use raw body segments for isDestructiveFindExec to close quoted-binary gap
The previous per-segment call passed quote-stripped output from
splitCommandSegments to isDestructiveFindExec, so a quoted exec binary
like find . -exec 'rm' {} \; would arrive as find . -exec {} \; and
the check would silently miss it.
Switch to splitting collectExecutableBodies output on [;|&]+ without
quote-stripping first, so the find-exec binary name is always intact
when isDestructiveFindExec inspects it. This also covers || and
background & separators that the original tests did not exercise.
Adds a regression test for the || OR-chain bypass pattern.
Addresses Greptile review comments on PR #2292.
---------
Co-authored-by: kapilvus <kapilvus@gmail.com>
Trimmed the description from ~1216 to ~620 chars while keeping trigger coverage (reproducible cross-platform envs, system deps, local services, .flox/manifest.toml/flox activate/FloxHub).
- README: add a visible ## Security section (official sources, vuln reporting via SECURITY.md, GateGuard/IOC/AgentShield guardrails, security guide); make stats line a plain paragraph to clear MD028
- eslint: empty catch comment in run-with-flags.js; drop unneeded escape in github-coordination/parsing.js; remove unused execFileSync import in its test (#2236 follow-ups)
- markdownlint: wrap bare URLs in rules/vue/*.md (#2250 follow-up)
npm run lint green; full suite 2836/2836.
Greptile review:
- slim_dist.ps1: ErrorActionPreference SilentlyContinue -> Continue so failed
deletes are reported instead of showing a false success banner
- build_optimized.bat: wmic is removed on Windows 11 22H2+; use the built-in
%NUMBER_OF_PROCESSORS% env var (with a fallback) so --jobs is not silently 0
cubic P2: the fallback skill `python-installer-packaging` does not exist in the
repo, creating a broken routing dependency. Replace both references (description
+ When to Activate) with self-contained scoping language that keeps the
"advanced optimization only" gating without pointing at a missing skill.
Addresses PR review feedback (English description + cleaned placeholders + CI green)
and the inline bot findings.
- Add English description and canonical "When to Activate" / "How It Works" /
"Examples" sections for auto-activation; keep the existing Chinese content
- Replace the "某商业级桌面应用" placeholder with a concrete anonymized reference
("参考项目" / "生产级 PySide2 桌面应用, 323 MB")
- build_optimized.bat: compute dist size via PowerShell instead of parsing
`dir` output with the Chinese-locale string `find "个文件"` (breaks on
non-Chinese Windows)
- slim_dist.ps1: keep entry_points.txt in .dist-info (read at runtime by
importlib.metadata; deleting it breaks plugin discovery)
- Inno Setup: default the bundled VC++ redistributable to x86 to match the
recommended 32-bit build and comment out ArchitecturesInstallIn64BitMode,
with notes on switching to x64 for 64-bit builds (fixes runtime-arch mismatch)
- markdownlint: blank lines around tables (MD058)
- unicode-safety: strip emoji / U+FE0F variation selectors per repo policy
- Sync skill catalog counts 249 -> 250 across README / AGENTS / plugin /
marketplace manifests
- agent.yaml: register epic-* commands (#2236) and vue-review (#2241)
- package.json files: drop stray skills/ml-adoption-playbook entry (follows orphan-skill publish pattern; not in install-modules.json)
- unicode-safety: strip decorative emoji from dashboard-web.js (#2100) and brand-discovery refs (#2221) to pass the CI gate
- agent-compress: raise catalog token canary 5000 -> 6000 for the 67-agent catalog
Full suite green (2836/2836).