* feat(rules,skills): add React Native / Expo rules pack and react-native-patterns skill
* fix(rules,skills): address review feedback — safeParse nav example, drop deprecated sentry-expo, memoize list renderItem, clarify New Architecture SDK support
* fix(rules,skills): drop deprecated Flipper, surface permission-denied state in location hook
* feat(continuous-learning-v2): make observer model configurable via ECC_OBSERVER_MODEL
The observer hardcoded `--model haiku`. Parameterize as "${ECC_OBSERVER_MODEL:-haiku}": the haiku default is preserved (no behavior change for existing users), but users can opt into a stronger model — e.g. `ECC_OBSERVER_MODEL=opus` — for higher-quality instinct extraction. Useful on subscription plans where model cost isn't the limiting factor.
* fix(continuous-learning-v2): address review — update wiring test + docs
- Update source-inspection test to assert the ${ECC_OBSERVER_MODEL:-haiku} defaulting behavior (was matching the literal `claude --model haiku`, which this PR changed). All 31 tests pass.
- Add guidance to raise ECC_OBSERVER_TIMEOUT_SECONDS for slower models (e.g. opus) so the 120s watchdog doesn't kill analysis mid-run.
- Fix now-stale 'Haiku session' comment -> 'observer session' (model is configurable).
analyze_observations moved observations.jsonl into observations.archive/
unconditionally, even when the Claude analysis failed (timeout, non-zero
exit, rate limit). Because the analyzer only reads the live file, a failed
batch was archived and never re-analyzed, silently dropping the instincts
it would have produced.
Return early on a non-zero analysis exit so the archive mv runs only on
success, retaining observations for the next cycle to retry. Resolve the
script's own directory from ${BASH_SOURCE[0]} (SCRIPT_DIR) so sibling
scripts (session-guardian.sh) and relative helpers resolve correctly under
both execution and sourcing, and add a source-guard so observer-loop.sh can
be sourced without starting the loop. Add a regression test covering both
the failure (retain) and success (archive) paths.
Fixes#2370
* test(clv2): cover instinct-cli prune, projects ops, promote dry-run, normalize-url
Add pytest coverage for previously-untested functions in
skills/continuous-learning-v2/scripts/instinct-cli.py:
- _normalize_remote_url: scp/https/file forms, credential + .git
stripping, network lowercasing, case-preserving local paths, idempotence
- _promote_specific dry-run: returns 0 and writes no global file
- projects delete/gc/merge: invalid-id, not-found, dry-run, and force
paths over registry + storage, asserting destructive ops are gated
- cmd_prune: dry-run keeps files; non-dry-run deletes only expired; quiet
Test-only change; no production code modified.
Fixes#2302
* test(clv2): assert dry-run storage no-op and quiet-mode stderr silence
Address CodeRabbit review on #2374:
- projects gc/merge dry-run tests now also assert on-disk storage is
untouched (empty1 project dir survives; nothing copied into dest
personal), closing the gap where a storage-mutating dry-run regression
would still pass.
- cmd_prune quiet test now asserts stderr is empty too, not just stdout.
* test(clv2): cover merge missing-destination and prune empty-pending branches
* fix(clv2): surface SIGALRM timeout drops in observe.sh
The inline-Python observation writers in observe.sh arm a signal.SIGALRM
alarm (8s) so they self-terminate before the async hook's 10s timeout can
orphan them (#2278). The handler _ecc_bail called sys.exit(0) with no
logging, so when the alarm fired the in-flight observation was silently
dropped: nothing was logged, no partial write occurred, and the shell saw
a clean exit. There was no way to detect or count how many observations
were being lost.
Add a single stderr visibility line to both _ecc_bail handlers (the
parse-error fallback path and the main observation-writing path) before
sys.exit(0), using the repo's "[observe]" log prefix. Exit code stays 0:
in a Claude Code hook a non-zero exit signals a block, so changing it
would turn an internal timeout into a user-facing tool block. The warning
goes to stderr (not stdout) because both blocks redirect stdout into the
observations file.
Add tests/hooks/observe-signal-timeout.test.js: a static regression guard
that every _ecc_bail handler logs to stderr before exiting and keeps exit
0, plus a behavioral check that runs the real handler text extracted from
observe.sh and confirms a fired alarm exits 0 and emits the [observe]
warning on stderr only.
Fixes#2300
* test(clv2): exercise both _ecc_bail handlers end-to-end
The behavioral SIGALRM-fire test ran only handlers[0] (the parse-error
fallback path); the main observation-write path (handlers[1]) was covered
only by the static regex guard. The write path is the higher-value one to
verify end-to-end since it carries valid, parseable data that would succeed
given more time, so a silent drop there is the worst case.
Loop the behavioral check over every extracted handler so a regression that
silenced the second handler's stderr write is caught at runtime, not just by
the static guard.
* test(clv2): select timeout handlers by marker, not array index
The behavioral check looped over all extracted _ecc_bail handlers by index.
If an unrelated _ecc_bail were ever added to observe.sh, the loop would
either test the wrong block or be diluted. Filter the handlers to those
carrying the "[observe] SIGALRM timeout" marker so the live SIGALRM check
stays pinned to the two #2300 timeout handlers regardless of array order or
future additions.
* test(clv2): fail fast when python is missing in SIGALRM check
The behavioral test returned early when no python interpreter was found,
which the test harness records as a PASS — so the SIGALRM contract could go
entirely unverified yet still look green. Throw instead, matching the
existing insaits-security-monitor convention of failing when a required
Python runtime is absent, and drop the in-test console.log.
observe.sh bumps the SIGUSR1 throttle counter in
${PROJECT_DIR}/.observer-signal-counter with an unlocked read-modify-write.
The hook runs on every tool call, so concurrent invocations read the same
value, both increment, and lose a write, signaling the observer at
unpredictable intervals and defeating the #521 throttle.
Serialize the read-modify-write under a lock, and only ever bump the counter
while that lock is held:
- Prefer flock with a bounded -w wait (the OS auto-releases it when the fd
closes or the process dies, so there is no stale lock and no lost increment);
on a timeout the tick is skipped rather than bumped unlocked.
- Fall back to an atomic mkdir lock on platforms without flock, with a bounded
spin. An EXIT trap cleans up on normal completion; INT/TERM traps release the
lock and exit, so a signal cannot drop the lock and then continue the
read-modify-write without ownership. If the lock cannot be acquired in the
budget the tick is skipped rather than raced. No hand-rolled PID stale-reclaim
(which is racy and can delete a live re-acquirer's lock).
- Guard the counter read against a corrupt (non-integer) file that would abort
the hook under set -e.
Add tests/hooks/observe-signal-counter-race.test.js: 20 concurrent observe.sh
invocations must not lose increments (exact under flock; at most one dropped on
the best-effort mkdir fallback), the runner rejects on any hook execution
failure or hang, plus content guards for the lock and the corrupt-counter
handling.
Fixes#2296
* fix(clv2): align Python _update_registry schema with shell counterpart
The Python `_update_registry` in instinct-cli.py wrote registry entries
without the `id` and `created_at` fields, while the shell counterpart in
detect-project.sh writes both. A projects.json entry could therefore have a
different shape depending on which path (Python CLI or shell hook) last
touched it.
Emit the same field set and order as the shell version: id, name, root,
remote, created_at (preserved from any existing entry), last_seen. Add
regression tests asserting field parity and created_at preservation.
Fixes#2299
* fix(clv2): guard _update_registry against a non-dict registry entry
A malformed projects.json (a non-dict value for the current project id, e.g.
null) would make existing.get("created_at", ...) raise and crash the update,
losing the old code's ability to self-heal a corrupt per-entry value. Normalize
existing to {} when it is not a dict so the entry is healed by the rewrite. Add
a regression test for the malformed-entry path.
* test(clv2): assert the first-write created_at == last_seen contract
The new _update_registry tests only checked both timestamps were truthy. On the
initial write both derive from the same `now`, so created_at must equal
last_seen; assert that explicitly so a later refactor that breaks the contract
is caught. Split the compound assertions into single-expression checks.
* fix(clv2): heal a non-dict top-level registry in _update_registry
A projects.json that is valid JSON but not a mapping (e.g. `[]` or a
string) previously crashed _update_registry on registry.get(), before
the per-entry guard could run, so the corrupt file could not be healed.
Guard the top-level shape right after the load and fall back to {} so the
rewrite repairs the file — matching the per-entry healing already in place.
Resolves the remaining CodeRabbit finding on #2299.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
* feat(skills): make tdd-workflow test-runner aware (npm/pnpm/yarn/bun)
Add "Step 0: Detect the Test Runner" so the RED/GREEN cycle no longer
hardcodes `npm test`. Distinguishes the package manager from the test
runner (a project can install with Bun yet run Jest/Vitest), adds a runner
command matrix, and warns about `bun test` (native bun:test runner) vs
`bun run test` (runs the package.json script) — a common ESM failure mode.
Adds a Bun native test pattern section and links the bun-runtime skill.
Applied to both the canonical skills/ copy and the .agents/skills/ Codex
subset (manual sync per CONTRIBUTING).
* docs(skills): apply <test>/<coverage> placeholders in tdd-workflow steps
Address review feedback on PR #2347: Step 0 instructs the agent to substitute
the detected runner command, but Steps 3/5/7, Run Coverage Report, Watch Mode,
Pre-Commit, and CI/CD still showed literal `npm test` / `npm run test:coverage`
— so an agent reaching those blocks could run npm test on a pnpm/bun project.
Replace them with the <test> / <test-watch> / <coverage> placeholders from
Step 0. Left untouched: the plan-handoff allowlist example and the Step 8
evidence-table samples (illustrative, not run-this instructions). Applied to
both the canonical and Codex-subset copies.
* docs(skills): make pre-commit lint runner-agnostic via <lint> placeholder
Follow-up to PR #2347 review (CodeRabbit): the pre-commit example still used
`npm run lint`, coupling it to npm after test/coverage were made runner-aware.
Add a `<lint>` column to the Step 0 runner matrix (npm run lint / pnpm lint /
yarn lint / bun run lint) and change the Pre-Commit Hook example to
`<test> && <lint>`. Applied to both the canonical and Codex-subset copies.
* chore: re-trigger CI (flaky windows/node20 npm cell)
* docs(skills): update Prisma and Zod API patterns for cross-version compatibility
- skills/prisma-patterns: show both adapter-based and direct PrismaClient
initialization side-by-side; update import paths with conditional notes;
rewrite version header to be release-agnostic
- skills/backend-patterns: fix ZodError.errors -> ZodError.issues
- skills/coding-standards: fix ZodError.errors -> ZodError.issues
- skills/security-review: fix ZodError.errors -> ZodError.issues
These API differences were discovered during implementation of a
full-stack health assessment project. The updated code samples show
both the new and old API forms so the skill remains useful regardless
of which Prisma or Zod version is installed.
Closes#2335
* fix(skills): revert Prisma client imports to '@prisma/client'
The 'prisma' npm package is the CLI tool, not the runtime client.
Using it as an import source would cause compile-time failures on all
versions. '@prisma/client' remains the correct import source for the
generated PrismaClient and Prisma namespace types.
Found by Greptile during PR review.
* feat(skills): harden the file upload validation section in django-security
* Update skills/django-security/SKILL.md
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* add missing stuff to second code block
* add import to the top of the code block
---------
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
b3268fef (#2272) made the write-gate "confirm no existing file" item
tool-agnostic in the JS hook, but the rest of the checklist surface still
names Glob/Grep. On hosts without those tools the agent still hits a dead
tool call on:
- the edit-gate "list importers" item in the hook (scripts/hooks/gateguard-fact-force.js)
- both checklist items in all three SKILL.md copies (en, ja-JP, zh-CN)
Apply the same wording b3268fef introduced — "(search the tree — Glob/Grep,
or find/grep via Bash)" — to those five remaining spots so the whole gate is
consistent. Prose-only; no logic change.
Follow-up to #2272 / b3268fef.
The code-tour skill mentioned the CodeTour 'ref' field only in an example,
with no explanation of its behavior. CodeTour resolves each step's file
content from the git revision named by 'ref' (not the working tree) whenever
ref differs from HEAD, so any file that does not exist at that revision fails
to open with 'The editor could not be opened because the file was not found'
- even though the file is present on disk.
This bit a generated PR tour where ref was set to the base branch (develop):
every file ADDED by the PR is absent on the base, so all new-file steps 404'd
while the tour tree and comments still rendered, making the cause non-obvious.
Adds a 'The ref Field' section explaining the resolution behavior and the
rule that PR tours must pin ref to the branch head (never the base), plus a
validation step to confirm every referenced file exists at the chosen ref.
Adds a new Tool Integration skill (mailtrap-email-integration) covering transactional email sending patterns: sandbox vs. production separation, API authentication, and domain verification. Focused on patterns that generalize beyond one vendor, per the repo's Skill Adaptation Policy.
* feat: add ecc-recipes skill
Maps a described workflow to the right ECC command-group with run-order
and stop condition, and browses command-group recipe families. Fills the
gap between ecc-guide (flat catalog) and prompt-optimizer (single-prompt
match) by adding family grouping, run-order, and stop conditions.
Advisory only; reads commands/ live.
* fix(ecc-recipes): address review
- flatten frontmatter origin/author/version to top-level (repo convention)
- guard unset CMD_DIR before globbing; use find instead of ls
- show burn-warning explicitly in output template
* feat(ecc-recipes): add argument-hint for slash UI
Two security-priority fixes in continuous-learning-v2/scripts/instinct-cli.py:
- #2294: _write_registry wrote projects.json without the advisory lock that
_update_registry holds, so concurrent 'projects delete/gc/merge' could race an
observe-time update and corrupt the registry. Extract the lock into a shared
_registry_lock() context manager and use it in both writers.
- #2297: _remove_project_storage called shutil.rmtree on PROJECTS_DIR/project_id
with no containment check. Add defense-in-depth: resolve the path and refuse to
delete anything that is not strictly inside PROJECTS_DIR (or is the root
itself), so a relaxed validator or future caller can never cause an
arbitrary-directory delete.
Adds 5 pytest regression tests (atomic write under lock, contained delete,
missing-dir no-op, traversal refused, root refused). Node integration suite
(tests/scripts/instinct-cli-projects.test.js) green 9/9.
* fix(clv2): escape $HOME before pgrep -f in migrate-homunculus.sh
pgrep -f treats its argument as an extended regular expression, but the
running-observer guard interpolated $HOME unescaped. Paths containing regex
metacharacters (e.g. /home/user.name, /home/c++dev, /home/user (work)) made the
match over-broad or invalid, causing either a false negative (live observer
missed, migration proceeds and risks registry corruption) or a false positive
(migration blocked unnecessarily).
Escape the ERE metacharacters in $HOME via sed before building the pattern so
the home prefix is matched literally while the trailing .*observer-loop\.sh
regex is preserved. Portable across BSD and GNU sed.
Fixes#2301
* test(clv2): add regression test for migrate-homunculus.sh $HOME escaping
Guards the #2301 fix: extracts the script's sed escaping command and asserts
the resulting pgrep -f pattern matches the literal home path while no longer
over-matching a regex-expanded decoy (HOME=/home/user.name must not match
/home/userXname). Also pins that the guard uses escaped_home rather than $HOME
directly. Follows the existing clv2 shell-test convention in
tests/hooks/observe-entrypoint-allowlist.test.js.
Refs #2301
* test(clv2): skip migrate-homunculus escaping test on Windows
The test relies on POSIX bash/sed/grep -E semantics, which differ on the
Windows CI runners. Guard with the same process.platform === 'win32' early
exit used by tests/hooks/observe-subdirectory-detection.test.js so the
bash-dependent assertions only run on POSIX platforms.
Refs #2301
- #2290 suggest-compact: honor ECC_CONTEXT_WINDOW_TOKENS / CLAUDE_CODE_AUTO_COMPACT_WINDOW
so 400k-window models (Opus 4.x) no longer report ~double context usage; add
override + isolation tests in transcript-context.test.js.
- #2282 install: bare-language syntax is legacy-only by design, but the error
now distinguishes a supported-but-wrong-mode target (gemini/codex/…) from a
genuinely unknown one and points to --profile/--modules/--skills.
- #2276 cost-report: the command + cost-tracking skill targeted a SQLite DB no
tracker writes. Repoint both at the real ~/.claude/metrics/costs.jsonl (JSONL,
estimated_cost_usd), reduce cumulative-per-session snapshots to latest-per-session,
and use node instead of sqlite3 for cross-platform support.
- #2272 gateguard: make the 'confirm no existing file' checklist item
tool-agnostic (Glob/Grep or find/grep via Bash) so hosts without a Glob tool
don't get a dead tool call.
Full suite 2839/2839; lint green.
The observe hook's secret-scrub regex used a generic ([A-Za-z]+\s+)? group
that overlapped the separator and value classes, causing exponential
backtracking on identifier-dense tool I/O — an orphaned python child then
pegged a core at ~100% CPU for days because the async hook timed out without
killing it.
- Rewrite _SECRET_RE as a linear matcher: bounded separator {1,8}, a fixed
set of auth schemes (bearer|basic|token|bot) instead of [A-Za-z]+, and a
bounded value {8,256}. Pathological input drops from hang to <1ms; real
secrets still redact (verified incl. 'Bearer <token>').
- Add a signal.alarm(8) self-timeout to both scrub blocks so any runaway
child self-terminates before the 10s async-hook timeout can orphan it.
Trimmed the description from ~1216 to ~620 chars while keeping trigger coverage (reproducible cross-platform envs, system deps, local services, .flox/manifest.toml/flox activate/FloxHub).
Greptile review:
- slim_dist.ps1: ErrorActionPreference SilentlyContinue -> Continue so failed
deletes are reported instead of showing a false success banner
- build_optimized.bat: wmic is removed on Windows 11 22H2+; use the built-in
%NUMBER_OF_PROCESSORS% env var (with a fallback) so --jobs is not silently 0
cubic P2: the fallback skill `python-installer-packaging` does not exist in the
repo, creating a broken routing dependency. Replace both references (description
+ When to Activate) with self-contained scoping language that keeps the
"advanced optimization only" gating without pointing at a missing skill.
Addresses PR review feedback (English description + cleaned placeholders + CI green)
and the inline bot findings.
- Add English description and canonical "When to Activate" / "How It Works" /
"Examples" sections for auto-activation; keep the existing Chinese content
- Replace the "某商业级桌面应用" placeholder with a concrete anonymized reference
("参考项目" / "生产级 PySide2 桌面应用, 323 MB")
- build_optimized.bat: compute dist size via PowerShell instead of parsing
`dir` output with the Chinese-locale string `find "个文件"` (breaks on
non-Chinese Windows)
- slim_dist.ps1: keep entry_points.txt in .dist-info (read at runtime by
importlib.metadata; deleting it breaks plugin discovery)
- Inno Setup: default the bundled VC++ redistributable to x86 to match the
recommended 32-bit build and comment out ArchitecturesInstallIn64BitMode,
with notes on switching to x64 for 64-bit builds (fixes runtime-arch mismatch)
- markdownlint: blank lines around tables (MD058)
- unicode-safety: strip emoji / U+FE0F variation selectors per repo policy
- Sync skill catalog counts 249 -> 250 across README / AGENTS / plugin /
marketplace manifests
- agent.yaml: register epic-* commands (#2236) and vue-review (#2241)
- package.json files: drop stray skills/ml-adoption-playbook entry (follows orphan-skill publish pattern; not in install-modules.json)
- unicode-safety: strip decorative emoji from dashboard-web.js (#2100) and brand-discovery refs (#2221) to pass the CI gate
- agent-compress: raise catalog token canary 5000 -> 6000 for the 67-agent catalog
Full suite green (2836/2836).
* docs(skills): document tdd plan handoff evidence
Address issue #2138 by clarifying how tdd-workflow should continue from a plan file, preserve human-readable test guarantees, and retain RED/GREEN evidence across squash merges.
* docs(skills): harden tdd plan handoff guidance
Address review feedback on #2235: use angle-bracket argument hint, treat plan files as untrusted input, and prefer project-local documentation paths for TDD evidence reports.
* docs(skills): clarify plan handoff injection guard
Address review feedback by explicitly stating that plan file content is data, not AI instructions, and that validation commands from untrusted plans require sanitization and approval before execution.
* Update skills/tdd-workflow/SKILL.md
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
* docs(skills): address tdd workflow review nits
Clarify plan handoff safety decisions, remove redundant untrusted-input wording, and show consistent TDD evidence path examples.
---------
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
* fix: V-001 security vulnerability
Automated security fix generated by OrbisAI Security
* fix: sanitize subprocess call in runner.py
The runner
* fix: address PR review comments on V-001 allowlist and test coverage
Remove dangerous interpreters (python, python3, node, curl, wget) from
ALLOWED_SETUP_EXECUTABLES — they can execute arbitrary code via argument
flags and are not needed for sandbox setup. Rewrite test_invariant_runner
to call _setup_sandbox directly instead of spawning runner.py as a
subprocess (which had no __main__ entrypoint and never exercised the fix).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
- suggest-compact hook now reads the latest usage record from the session
transcript and suggests /compact at a window-scaled token threshold
(160k/200k window, 250k/1M window; COMPACT_CONTEXT_THRESHOLD and
COMPACT_CONTEXT_INTERVAL overridable), re-firing per 60k-token growth
bucket; tool-call count stays as the secondary signal (#2155)
- Codex repo marketplace now points at ./plugins/ecc instead of ./ — Codex
never discovers plugins whose local marketplace source.path is the
marketplace root (verified on Codex CLI 0.137.0); plugins/ecc is a thin
folder referencing root skills/.mcp.json per maintainer direction on
#2097; docs flag plugin mode as experimental with the upstream blocker
openai/codex#26037 linked (#2128)
- README badges for installs/stars/forks now use shields endpoint badges
backed by api.ecc.tools (live install count 3,712 vs the stale static
150), which also eliminates shields' 'Unable to select next GitHub token
from pool' render in the stars badge
Closes#2155Closes#2128
- competitive-platform-analysis: add ## Examples section per ECC
guidelines (8-axis taxonomy walkthrough + pre-filter scoring matrix)
- competitive-report-structure: clarify dimension 9 poles are client-
specific (e.g., Memorability/Hireability) not hard-coded names
- brand-discovery: fix terminal state — set inProgressModule to null
after 90_SYNTHESIS.md is complete to prevent misleading resumption
All fixes mirrored to .agents/ copies.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds four community skills covering brand identity discovery and a
three-skill competitive benchmarking pipeline.
**brand-discovery** — Adaptive multi-session brand identity interview
spanning 8 modules (purpose, positioning, audience, personality, voice,
narrative, founder-brand tension, synthesis). Uses laddering, 5 Whys,
and projective techniques. State persisted to disk via state.json so
sessions resume across conversations without losing elicited knowledge.
Frameworks: Sinek, Dunford, Baker, Enns, Kapferer, Aaker, Neumeier,
Mark & Pearson, Lencioni. Includes 8 module output templates in
references/.
**competitive-platform-analysis** — Scopes and tiers a competitor set
before benchmarking begins. Categorizes candidates along 8 generic
creative-industry axes (positioning stance, specialization, size/model,
engagement format, distinctiveness posture, evidence model, brand
strength, market/reach) into Direct / Adjacent / Aspirational tiers.
Includes a pre-filter scoring matrix. First step in the pipeline.
**benchmark-methodology** — Scores each competitor across 9 weighted
dimensions (positioning 18%, brand voice 15%, visual craft 15%, offer
packaging 12%, evidence 12%, enterprise-readiness 10%, thought
leadership 8%, pricing 5%, client's strategic tension 5%) with explicit
1–5 rubrics and bias controls. Produces one profile card per competitor.
**competitive-report-structure** — Assembles scored cards into a
decision-grade report: executive summary, landscape map, competitor
tiers, heatmap matrix, deep dives, white-space and threats, strategic
recommendations, sources appendix.
brand-discovery complements brand-voice (ECC): brand-voice extracts a
style profile from existing source material; brand-discovery elicits
identity from scratch through structured interviews when no prior
material exists.
A competitive set scoped without the client's positioning brief is
noise, not intelligence — each skill enforces this by requiring the
brief before proceeding. The 9-dimension scoring framework deliberately
reports the client's strategic tension as two separate poles (never
averaged) because the gap between them is the strategic finding.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(hooks): fail open on oversized stdin instead of echoing truncated JSON (#2222)
run-with-flags.js capped stdin at 1MB but every fallthrough path still
echoed the truncated string to stdout. The harness parses hook stdout as
JSON, got a document cut mid-stream, and blocked the tool call — so any
Edit/Write with a >1MB hook payload was permanently blocked by every
registered pre-write hook, before ECC_HOOK_PROFILE / ECC_DISABLED_HOOKS
gating could run.
- Exit 0 with empty stdout (no opinion) when the stdin cap trips, before
any echo or gating logic.
- Flush stdout via write callback before process.exit: exiting right
after stdout.write() dropped everything past the ~64KB pipe buffer,
cutting even sub-cap pass-through payloads mid-JSON.
Regression tests cover the enabled, disabled, and missing-arg paths for
oversized payloads plus full echo of sub-cap >64KB payloads.
* fix(codex): stop emitting invalid exa url entry, align merge with connector policy (#2224)
The Codex MCP merge declared exa with a url key, but Codex's
[mcp_servers.*] TOML schema is stdio-only — the url key makes the
entire config.toml fail to load, bricking both the codex CLI and the
desktop app. Every install/update re-injected the line because the
urlEntry branch treated the broken entry as present.
- ECC_SERVERS now emits only the current default set per
docs/MCP-CONNECTOR-POLICY.md: chrome-devtools (stdio, command/args).
Retired servers (supabase, playwright, context7, exa, github, memory,
sequential-thinking) are never re-emitted; existing user-managed
entries are untouched.
- The merge now repairs the exact ECC-emitted broken form (url-only
exa entry) on every run so re-running the installer fixes broken
configs instead of preserving them. User stdio exa entries
(command + mcp-remote) are left alone.
- check-codex-global-state.sh requires chrome-devtools instead of the
retired set, and flags url-only exa entries with a repair hint.
Tests cover repair, re-run idempotence, stdio-entry preservation, and
no-retired-server emission in add, update, dry-run, and disabled modes.
* fix(hooks): never echo truncated stdin from Stop hooks (#2090)
Stop hooks follow the ECC pass-through convention (echo stdin on
stdout), but every echoing Stop hook capped stdin and echoed the capped
string. The Stop payload carries last_assistant_message, so a long
final assistant message produced a JSON document cut mid-stream on
stdout, which the harness reports as 'Stop hook error: JSON validation
failed' across the whole Stop chain.
Reproduced: a Stop payload with a >64KB last_assistant_message run
through run-with-flags + cost-tracker emitted exactly 65536 bytes of
invalid JSON (cost-tracker capped stdin at 64KB — far below realistic
Stop payloads).
- cost-tracker: raise the cap to 1MB (matching all other hooks) and
suppress the pass-through echo when stdin was truncated.
- check-console-log, stop-format-typecheck, desktop-notify: suppress
the echo when stdin was truncated; flush stdout before process.exit
so sub-cap payloads are not cut at the ~64KB pipe buffer.
- All hooks keep exiting 0 (fail-open); diagnostics go to stderr.
New stop-hooks-stdout test asserts the contract for every registered
Stop hook: stdout is empty or valid JSON, exit code 0 — for realistic
100KB payloads and oversized >1MB payloads, via the production runner
and via direct invocation. Updated the old hooks.test.js case that
codified the truncated-echo behavior.
* fix(hooks): dampen GateGuard fact-force repetition in long sessions (#2142)
In long autonomous sessions the fact-force gate produced 10+
near-identical 'state facts -> blocked -> restate -> retry' blocks in
one context window, which measurably raises the odds of the model
collapsing into a degenerate single-token repetition loop.
- Track a per-session fact_force_denials counter in GateGuard state
(merged max across concurrent writers, reset with the session, robust
to malformed on-disk values).
- The first GATEGUARD_FACT_FORCE_FULL_DENIALS denials (default 3) keep
the full four-fact block; later denials emit a condensed single-line
message that carries the denial ordinal, so consecutive denials are
structurally different and never textually identical.
- True retries of the same target remain allowed without re-prompting
(unchanged). Destructive-Bash and routine-Bash gates are unchanged,
as are the ECC_GATEGUARD=off / ECC_DISABLED_HOOKS escape hatches.
Eight new tests cover budget counting, condensed format, ordinal
advancement, retry pass-through, env tuning, malformed state, MultiEdit
dampening, and destructive-gate exemption.
* fix(hooks): keep security hooks able to block on oversized stdin (#2222)
Refine the truncation fail-open: instead of skipping the hook entirely,
the runner now suppresses only its own raw-echo when stdin was
truncated. The hook still executes and receives the truncated flag
(run() context / ECC_HOOK_INPUT_TRUNCATED), so config-protection keeps
blocking truncated protected-config payloads (its test requires exit 2)
while pass-through hooks fail open with empty stdout as before.
* style: apply repo formatter to touched hook files