Cycle #46 follow-up to cycle #45's #251 implementation. Closes#250's
implementation urgency by aligning docs with reality.
SCHEMAS.md Updates:
For each of the 4 session-management verbs, added:
1. Status marker (Implemented or Stub only)
2. Actual binary envelope (shape produced by the #251-fixed binary)
3. Aspirational (future) shape (original SCHEMAS.md content, preserved as target)
4. Gap notes where the two diverge
Per-verb status:
- list-sessions: Implemented, nested field layout
- load-session: Implemented, nested session object with local session_not_found error
- delete-session: Stub, emits not_yet_implemented (local error, not auth)
- flush-transcript: Stub, emits not_yet_implemented (local error, not auth)
ROADMAP.md Updates:
- #251 marked CLOSED: Full status with commit ref, test counts.
- #250 marked SCOPE-REDUCED: Option A resolved by #251, Option C moot,
only Option B (doc alignment) remains as future cleanup.
Why this matters:
Every code change should close its documentation loop. #251 landed on
the branch, but SCHEMAS.md still described aspirational shapes without
marking which were implemented. Claws reading SCHEMAS.md would have
assumed full conformance and hit surprises. Now the document tells the
truth about which verbs work, which are stubs, and why.
Related:
- #251 implementation on feat/jobdori-251-session-dispatch branch
- #250 scope-reduced to Option B (field-name harmonization)
- #145/#146 parser fall-through fix precedent
Cycle #40: gaebal-gajae conceived #251 in their 00:00 Discord cycle
status but hadn't committed to ROADMAP yet. Jobdori verified their
diagnosis with code trace and formalized into ROADMAP with the proper
framing relationship to #250.
## What This Pinpoint Says
Same observable as #250 (session-management verbs emit missing_credentials
instead of SCHEMAS.md envelope) but reframed at the dispatch-order layer:
- #250 says: surface missing on canonical binary vs SCHEMAS.md promise
- #251 says: top-level parser fall-through happens BEFORE dispatcher
could intercept, so credential resolution runs before the verb is
classified as a purely-local operation
#251's framing is sharper because it identifies WHY the fall-through
produces auth errors, not just that it does.
## Verified Code Trace
- main.rs:1017-1027 is the _other => Prompt catchall
- joins all rest[] tokens into joined, constructs CliAction::Prompt
- downstream resolves credentials -> emits missing_credentials
- No credential call would be needed had the verb been intercepted
Same pattern has been fixed before for other purely-local verbs:
- #145: plugins (main.rs:888-906, explicit match arm)
- #146: config and diff (main.rs:911-935, same shape)
#251 extends this to the 4 session-management verbs.
## Recommended Sequence
1. #251 fix (4 match arms mirroring #145/#146) — principled solution
2. #250's Option B (docs scope note) — guard against future drift
3. #250's Option C (reject with redirect) — unnecessary if #251 lands
## Discipline
Per cycle #24 calibration:
- Red-state bug? Borderline (silent misroute to auth error class)
- Real friction? ✓ (4 documented surfaces emit wrong error class)
- Evidence-backed? ✓ (code trace + prior-fix precedent #145/#146)
- Same-cycle fix? ✗ (filed + document, boundary discipline #36)
- Implementation cost? ~40 lines Rust + tests, bounded
## Credit
Conception: gaebal-gajae (Discord msg 1496526112254328902, 00:00 KST)
Formalization: Jobdori cycle #40 (code trace + precedent linking)
This is the right kind of collaboration: gaebal-gajae saw the dispatch
pattern I had missed in #250 (I framed as surface parity; they framed
as dispatch order). I verified their diagnosis and committed the
ROADMAP entry. Two framings make the pinpoint sharper than either
alone.
Cycle #39 dogfood re-verification of #130 (filed 2026-04-20). All 5
filesystem failure modes reproduce identically on main HEAD 186d42f,
2 days after original filing. Gap is unchanged.
## What's Added
1. **[STILL OPEN — re-verified 2026-04-22 cycle #39]** marker on the
entry so readers can see immediately that the pinpoint hasn't been
accidentally closed.
2. Full 5-mode repro output preserved verbatim for the current HEAD,
so future re-verifications have a concrete baseline to diff against.
3. **New evidence not in original filing**: the classifier actively
chose `kind: "unknown"` rather than just omitting the field. This
means classify_error_kind() has NO substring match for "Is a
directory", "No such file", "Operation not permitted", or "File
exists". The typed-error contract is thus twice-broken on this path.
4. **Pairing with #247/#248/#249 classifier sweep**: the classifier-level
part of #130 could land in the same sweep (add substring branches
for io::ErrorKind strings). The context-preservation part (fix
run_export's bare `?`) is a separate, larger change.
## Why Re-Verification Not Re-Filing
Per cycle #24 discipline: speculative re-filings add noise, real
confirmations add truth. #130 was already filed with exact repros, code
trace, and fix shape. My dogfood hit the same gap on fresh HEAD — the
right output is confirming the gap is still there (not filing #251 for
the same bug).
This is the same pattern as cycle #32's "mark #127 CLOSED" reality-sync:
documentation-drift prevention through explicit status markers.
## New Pattern
"Reality-sync via re-verification" — re-running a filed pinpoint's
repro on fresh HEAD and adding the timestamp + output proves the gap
is still real without inventing new filings. Cycle #24 calibration
keeps ROADMAP entries honest.
Per cycle #24 calibration:
- Red-state bug? ⚠️ borderline (errors surfaced, but kind=unknown is
demonstrably wrong on a path where the system knows the errno)
- Real friction? ✓ (re-verified on fresh HEAD)
- Evidence-backed? ✓ (5-mode repro + classifier trace)
- Same-cycle fix? ✗ (classifier-level part could join #247/#248/#249
sweep; context-preservation part is larger refactor)
- Implementation cost? Classifier part ~10 lines; full context fix ~60 lines
Source: Jobdori cycle #39 proactive dogfood in response to Clawhip
pinpoint nudge. Probed export filesystem errors; discovered this was
#130 reconfirmation, not new bug. Applied reality-sync pattern from
cycle #32.
Cycle #38 dogfood finding. Probed session management via the top-level
subcommand path documented in SCHEMAS.md; discovered the Rust binary
doesn't implement these as top-level subcommands. The literal token
'list-sessions' falls through the _other => Prompt arm and returns
'missing Anthropic credentials' instead of the documented envelope.
## The Gap
SCHEMAS.md documents 14 CLAWABLE top-level subcommands. Python audit
harness (src/main.py) implements all 14. Rust binary implements ~8 of
them as top-level, routing session management through /session slash
commands via --resume instead.
Repro:
$ env -i PATH=$PATH HOME=$HOME claw list-sessions --output-format json
{"error":"missing Anthropic credentials; ...","kind":"missing_credentials"}
$ claw --resume latest /session list --output-format json
{"active":"...","kind":"session_list","sessions":[...]}
$ python3 -m src.main list-sessions --output-format json
{"command":"list-sessions","sessions":[...],"exit_code":0}
Same operation, three different CLI shapes across implementations.
## Classification
This is BOTH:
- a parser-level trust gap (6th in #108/#117/#119/#122/#127 family; same
_other => Prompt fall-through), AND
- a cross-implementation parity gap (SCHEMAS.md at repo root doesn't
match Rust binary's top-level surface)
Unlike prior fall-throughs where the input was malformed, the input
here IS a documented surface. The fall-through is wrong for a different
reason: the surface exists in the protocol but not in this implementation.
## Three Fix Options
Option A: Implement surfaces on Rust binary (highest cost, full parity)
Option B: Scope SCHEMAS.md to Python harness (docs-only)
Option C: Reject at parse time with redirect hint (cheapest, #127 pattern)
Recommended: C first (prevents cred misdirection), then B for docs
hygiene, then A if demand justifies.
## Discipline
Per cycle #24 calibration:
- Red-state bug? ⚠️ borderline — silent misroute to cred error on a
documented surface. Not a crash but a real wrong-contract response.
- Real friction? ✓ (claws reading SCHEMAS.md hit wrong error on canonical binary)
- Evidence-backed? ✓ (dogfood probe + SCHEMAS.md cross-reference + code trace)
- Implementation cost? Option C: ~30 lines (bounded). Option A: larger.
- Same-cycle fix? ✗ (file + document, defer implementation per #36 boundary discipline)
## Family Position
Natural bundle: **#127 + #250** — parser-level fall-through pair with
class distinction. #127 fixed suffix-arg-on-valid-verb case. #250 extends
to 'entire Python-harness verb treated as prompt.' Same fall-through arm,
different entry class.
Source: Jobdori cycle #38 proactive dogfood in response to Clawhip
pinpoint nudge at msg 1496518474019639408. Probed session management CLI
after gaebal-gajae's status sync confirmed no red-state regressions this
cycle; found this cross-implementation surface parity gap by comparing
SCHEMAS.md claims against actual Rust binary behavior.
Cycle #37 dogfood finding post-#247 merge. Two Err arms in the resumed-session
JSON path at main.rs:2747 and main.rs:2783 emit error envelopes WITHOUT the
`kind` field required by the §4.44 typed-envelope contract.
## The Pinpoint
Probed resumed-session slash command JSON path:
$ claw --output-format json --resume latest /session
{"command":"/session","error":"unsupported resumed slash command","type":"error"}
# no kind field
$ claw --output-format json --resume latest /xyz-unknown
{"command":"/xyz-unknown","error":"Unknown slash command: /xyz-unknown\n Help /help lists available slash commands","type":"error"}
# no kind field AND multi-line error without split hint
Compare to happy path which DOES include kind:
$ claw --output-format json --resume latest /session list
{"active":"...","kind":"session_list",...}
Contract awareness exists. It's just not applied in the Err arms.
## Scope
Two atomic fixes in main.rs:
- Line 2747: SlashCommand::parse() Err → add kind via classify_error_kind()
- Line 2783: run_resume_command() Err → add kind + call split_error_hint()
~15 lines Rust total. Bounded.
## Family Classification
§4.44 typed-envelope contract sweep:
- #179 (parse-error real message quality) — closed
- #181 (envelope exit_code matches process exit) — closed
- #247 (classify_error_kind misses prompt-patterns) — closed
- #248 (verb-qualified unknown option errors) — in-flight (another agent)
- **#249 (resumed-session slash error envelopes omit kind) — filed**
Natural bundle #247+#248+#249: classifier/envelope completeness across all
three CLI paths (top-level parse, subcommand options, resumed-session slash).
## Discipline
Per cycle #24 calibration:
- Red-state bug? ✗ (errors surfaced, exit codes correct)
- Real friction? ✓ (typed-error contract violation; claws dispatching on
error.kind get undefined for all resumed slash-command errors)
- Evidence-backed? ✓ (dogfood probe + code trace identified both Err arms)
- Implementation cost? ~15 lines (bounded)
- Same-cycle fix? ✗ (Rust change, deferred per file-not-fix discipline)
## Not Implementing This Cycle
Per the boundary discipline established in cycle #36: I don't touch another
agent's in-flight work, and I don't implement a Rust fix same-cycle when
the pattern is "file + document + let owner/maintainer decide."
Filing with concrete fix shape is the correct output. If demand or red-state
symptoms arrive, implementation can follow the same path as #247: file →
fix in branch → review → merge.
Source: Jobdori cycle #37 proactive dogfood in response to Clawhip pinpoint
nudge at msg 1496518474019639408.
Cycle #34 dogfood follow-through on Jobdori cycle #33 pinpoint (#247 filed
at fbcbe9d). Closes the two typed-error contract drifts surfaced in that
pinpoint against the Rust `claw` binary.
## What was wrong
1. `classify_error_kind()` (main.rs:~251) used substring matching but did
NOT match two common prompt-related parse errors:
- "prompt subcommand requires a prompt string"
- "empty prompt: provide a subcommand..."
Both fell through to `"unknown"`. §4.44 typed-error contract specifies
`parse | usage | unknown` as distinct classes, so claws dispatching on
`error.kind == "cli_parse"` missed those paths entirely.
2. JSON mode dropped the `Run `claw --help` for usage.` hint. Text mode
appends it at stderr-print time (main.rs:~234) AFTER split_error_hint()
has already serialized the envelope, so JSON consumers never saw it.
Text-mode humans got an actionable pointer; machine consumers did not.
## Fix
Two small, targeted edits:
1. `classify_error_kind()`: add explicit branches for "prompt subcommand
requires" and "empty prompt:" (the latter anchored with `starts_with`
so it never hijacks unrelated error messages containing the word).
Both route to `cli_parse`.
2. JSON error render path in `main()`: after calling split_error_hint(),
if the message carried no embedded hint AND kind is `cli_parse` AND
the short-reason does not already embed a `claw --help` pointer,
synthesize the same `Run `claw --help` for usage.` trailer that
text-mode stderr appends. The embedded-pointer check prevents
duplication on the `empty prompt: ... (run `claw --help`)` message
which already carries inline guidance.
## Verification
Direct repro on the compiled binary:
$ claw --output-format json prompt
{"error":"prompt subcommand requires a prompt string",
"hint":"Run `claw --help` for usage.",
"kind":"cli_parse","type":"error"}
$ claw --output-format json ""
{"error":"empty prompt: provide a subcommand (run `claw --help`) or a non-empty prompt string",
"hint":null,"kind":"cli_parse","type":"error"}
$ claw --output-format json doctor --foo # regression guard
{"error":"unrecognized argument `--foo` for subcommand `doctor`",
"hint":"Run `claw --help` for usage.",
"kind":"cli_parse","type":"error"}
Text mode unchanged in shape; `[error-kind: ...]` prefix now reads
`cli_parse` for the two previously-misclassified paths.
## Regression coverage
- Unit test `classify_error_kind_covers_prompt_parse_errors_247`: locks
both patterns route to `cli_parse` AND that generic "prompt"-containing
messages still fall through to `unknown`.
- Integration tests in `tests/output_format_contract.rs`:
* prompt_subcommand_without_arg_emits_cli_parse_envelope_with_hint_247
* empty_positional_arg_emits_cli_parse_envelope_247
* whitespace_only_positional_arg_emits_cli_parse_envelope_247
* unrecognized_argument_still_classifies_as_cli_parse_247_regression_guard
- Full rusty-claude-cli test suite: 218 tests pass (180 bin unit + 15
output_format_contract + 12 resume_slash + 7 compact + 3 mock + 1 cli).
## Family / related
Joins §4.44 typed-envelope contract gap family closure: #130, #179, #181,
and now **#247**. All four quartet items now have real fixes landed on
the canonical binary surface rather than only the Python harness.
ROADMAP.md: #247 marked CLOSED with before/after evidence preserved.
Cycle #33 dogfood finding from direct probe of Rust claw binary:
## The Pinpoint
Two related contract drifts in the typed-error envelope:
### 1. Error-kind misclassification
`classify_error_kind()` at main.rs:246-280 uses substring matching but
does NOT match two common parse error messages:
- "prompt subcommand requires a prompt string" → classified as 'unknown'
- "empty prompt: provide a subcommand..." → classified as 'unknown'
The §4.44 typed-error contract specifies 'parse | usage | unknown' as
DISTINCT classes. Known parse errors should be 'cli_parse', not 'unknown'.
### 2. Hint lost in JSON mode
Text mode appends 'Run `claw --help` for usage.' to parse errors.
JSON mode emits 'hint: null'. The trailer is added at the stderr-print
stage AFTER split_error_hint() has already serialized the envelope, so
JSON consumers never see it.
## Repro
Dogfooded on main HEAD dd0993c (cycle #33):
$ claw --output-format json prompt
{"error":"prompt subcommand requires a prompt string","hint":null,"kind":"unknown","type":"error"}
Expected: kind="cli_parse" + hint="Run \\`claw --help\\` for usage."
## Impact
- Claws dispatching on typed error.kind fall back to substring matching
- JSON consumers lose actionable hint that text-mode users see
- Joins JSON envelope field-quality family (#90, #91, #92, #110, #115,
#116, #130, #179, #181, #247)
## Fix Shape
1. Add prompt-pattern clauses to classify_error_kind() (~4 lines)
2. Move hint plumbing to BEFORE JSON envelope serialization (~15 lines)
3. Add golden-fixture regression tests per cycle #30 pattern
Not a red-state bug (error IS surfaced, exit code IS correct), but real
contract drift. Deferred for implementation; filed per Clawhip nudge
to 'add one concrete follow-up to ROADMAP.md'.
Per cycle #24 calibration:
- Red-state bug? ✗ (errors exit 1 correctly)
- Real friction? ✓ (typed-error contract drift)
- Evidence-backed? ✓ (dogfood probe + code trace identified both leaks)
- Implementation cost? ~20 lines Rust (bounded)
- Demand signal needed? Medium — any claw doing error.kind dispatch on
prompt-path errors is affected
Source: Jobdori cycle #33 direct dogfood 2026-04-22 22:30 KST in response
to Clawhip pinpoint nudge at msg 1496503374621970583.
Cycle #32 dogfood finding: #127 was fixed on main via `a3270db` + `79352a2`
(2026-04-20), but the ROADMAP.md entry still lacked a [CLOSED] marker.
The in-flight branches `feat/jobdori-127-clean` and
`feat/jobdori-127-verb-suffix-flags` were superseded and are now obsolete.
## What This Fixes
**Documentation drift:** Pinpoint #127 was complete in code but unmarked
in ROADMAP. New contributors checking the roadmap would see it as open
work, potentially duplicating effort.
**Stale branches:** Two branches (`feat/jobdori-127-clean`,
`feat/jobdori-127-verb-suffix-flags`) contain the fix attempt bundled
with an unrelated large-scope refactor (5365 lines removed from
ROADMAP.md, root-level governance docs deleted, command infra refactored).
Their fix was superseded; branches are functionally obsolete.
## Verification
Re-verified all 4 #127 scenarios pass on main HEAD `b903e16`:
$ claw doctor --json → rejected with "did you mean" hint
$ claw doctor garbage → rejected
$ claw doctor --unknown-flag → rejected
$ claw doctor --output-format json → works (canonical form)
All behavior matches #127 acceptance criteria.
## Cluster Impact
Post-closure: **parser-level trust gap quintet (#108 + #117 + #119 + #122
+ #127) is 5/5 closed**. The `_other => Prompt` fall-through audit is
complete.
## Discipline Check
Per cycle #24 calibration:
- Red-state bug? ✗ (behavior is correct on main)
- Real friction? ✓ (ROADMAP drift; obsolete branches adrift)
- Evidence-backed? ✓ (dogfood probe confirmed closure; git log confirmed
supersession; branch diff confirmed scope contamination)
## Relationship to Gaebal-gajae's Option A Guidance
Cycle #32 started by proposing separating the #127 fix from the attached
refactor. On deeper probe, discovered the fix was already superseded on
main via different commits. Option A (separate the fix) is retroactively
satisfied: the fix landed cleanly, the refactor never did.
The remaining action is governance hygiene: mark closure, document
supersession, flag obsolete branches for deletion.
## Next Actions (not in this commit)
- Delete `feat/jobdori-127-clean` locally and on fork (after confirmation)
- Delete `feat/jobdori-127-verb-suffix-flags` locally and on fork
- Monitor whether any attached refactor content should be re-proposed in
its own scoped PR
Source: Jobdori cycle #32 dogfood in response to Clawhip 10-min nudge.
Proposed Option A (separate fix from refactor); probe revealed the fix
already landed via a different commit path, rendering the refactor-only
branch obsolete.
Cycle #30 dogfood found a testing gap: OPT_OUT surfaces were classified
in code but their REJECTION behavior was never regression-tested.
## The Gap
OPT_OUT_AUDIT.md declares 12 surfaces as intentionally exempt from
--output-format. The test suite had:
- ✅ test_clawable_surface_has_output_format (CLAWABLE must accept)
- ✅ test_every_registered_command_is_classified (no orphans)
- ❌ Nothing verifying OPT_OUT surfaces REJECT --output-format
If a developer accidentally added --output-format to 'summary' (one of
the 12 OPT_OUT surfaces), no test would catch the silent promotion.
The classification was governed, but the rejection behavior was NOT.
## What Changed
Added TestOptOutSurfaceRejection to test_cli_parity_audit.py with 14 tests:
1. **12 parametrized tests** — one per OPT_OUT surface, verifying each
rejects --output-format with an argparse error.
2. **test_opt_out_set_matches_audit_document** — verifies OPT_OUT_SURFACES
constant matches the declared 12 surfaces in OPT_OUT_AUDIT.md.
3. **test_opt_out_count_matches_declared** — sanity check that the count
stays at 12 as documented.
## Symmetry Achieved
Before: only CLAWABLE acceptance tested
CLAWABLE accepts --output-format ✅
OPT_OUT behavior: untested
After: full parity coverage
CLAWABLE accepts --output-format ✅
OPT_OUT rejects --output-format ✅
Audit doc ↔ constant kept in sync ✅
This completes the parity enforcement loop: every new surface is
explicitly IN or OUT, and BOTH directions are regression-locked.
## Promotion Path Preserved
When a real OPT_OUT surface gains genuine demand (per OPT_OUT_DEMAND_LOG.md):
1. Move from OPT_OUT_SURFACES to CLAWABLE_SURFACES
2. Update OPT_OUT_AUDIT.md with promotion rationale
3. Remove from this test's expected rejections
4. Tests pass (rejection test no longer runs; acceptance test now required)
Graceful promotion; no accidental drift.
## Test Count
- 222 → 236 passing (+14, zero regressions)
- 12 parametrized + 2 metadata = 14 new tests
## Discipline Check
Per cycle #24 calibration:
- Red-state bug? ✗ (no broken behavior)
- Real friction? ✓ (testing gap discovered by dogfood)
- Evidence-backed? ✓ (systematic probe revealed missing coverage)
This is the cycle #27 taxonomy (structural / quality / cross-channel /
text-vs-JSON divergence) extending into classification: not just 'is the
envelope right?' but 'is the OPPOSITE-OF-envelope right?'
Future cycles can apply the same principle to other classifications:
every governed non-goal deserves regression tests that lock its
non-goal-ness.
Classification:
- Real friction: ✓ (cycle #30 dogfood)
- Evidence-backed: ✓ (gap discovered by systematic surface audit)
- Same-cycle fix: ✓ (maintainership discipline)
Source: Jobdori cycle #30 proactive dogfood — probed all 26 subcommands
with --output-format json and noticed OPT_OUT rejection pattern was
unverified by any dedicated test.
Cycle #29 dogfood found a real pinpoint: cross-mode exit code divergence.
## The Pinpoint
Dogfooding the CLI revealed that unknown subcommand errors return different
exit codes depending on output mode:
$ python3 -m src.main nonexistent-cmd # exit 2
$ python3 -m src.main nonexistent-cmd --output-format json # exit 1
ERROR_HANDLING.md documented the exit-code contract (1=parse, 2=timeout)
but did NOT explicitly state the contract applies only to JSON mode. Text
mode follows argparse defaults (exit 2 for any parse error), which
violates the documented contract when interpreted generally.
A claw using text mode with 'claw nonexistent' would see exit 2 and
misclassify as timeout per the docs. Real protocol contract gap, not
implementation bug.
## Classification
This is a DOCUMENTATION gap, not a behavior bug:
- Text mode follows argparse convention (reasonable for humans)
- JSON mode normalizes to documented contract (reasonable for claws)
- The divergence is intentional; only the docs were silent about it
Fix = document the divergence explicitly + lock it with tests.
NOT fix = change text mode exit code to 1 (would break argparse
conventions and confuse human users).
## Documentation Changes
ERROR_HANDLING.md:
1. Added IMPORTANT callout in Quick Reference section:
'The exit code contract applies ONLY when --output-format json is
explicitly set. Text mode follows argparse conventions.'
2. New 'Text mode vs JSON mode exit codes' table showing exact divergence:
- Unknown subcommand: text=2, json=1
- Missing required arg: text=2, json=1
- Session not found: text=1, json=1 (app-level, identical)
- Success: text=0, json=0 (identical)
- Timeout: text=2, json=2 (identical, #161)
3. Practical rule: 'always pass --output-format json'
## Tests Added (5)
TestTextVsJsonModeDivergence in test_cross_channel_consistency.py:
1. test_unknown_command_text_mode_exits_2 — text mode argparse default
2. test_unknown_command_json_mode_exits_1 — JSON mode contract normalized
3. test_missing_required_arg_text_mode_exits_2 — same for missing args
4. test_missing_required_arg_json_mode_exits_1 — same normalization
5. test_success_path_identical_in_both_modes — success exit identical
These tests LOCK the expected divergence so:
- Documentation stays aligned with implementation
- Future changes (either direction) are caught as intentional
- Claws trust the docs
## Test Status
- 217 → 222 tests passing (+5)
- Zero regressions
## Discipline
This cycle follows the cycle #28 template exactly:
- Dogfood probe revealed real friction (test said exit=2, docs said exit=1)
- Minimal fix shape (documentation clarification, not code change)
- Regression guard via tests
- Evidence-backed, not speculative
Relationship to #181:
- #181 fixed env.exit_code != process exit (WITHIN JSON mode)
- #29 clarifies exit code contract scope (ONLY JSON mode)
- Both establish: exit codes are deterministic, but only when --output-format json
---
Classification (per cycle #24 calibration):
- Red-state bug? ✗ (behavior was reasonable, docs were incomplete)
- Real friction? ✓ (docs/code divergence revealed by dogfood)
- Evidence-backed? ✓ (test suite probed both modes, found the gap)
Source: Jobdori cycle #29 proactive dogfood — in response to Clawhip nudge
for pinpoint hunting. Found that text-mode errors return exit 2 but
ERROR_HANDLING.md implied exit 1 was the parse-error contract universally.
Cycle #28 closes the low-hanging metadata protocol gap identified in #180.
## The Gap
Pinpoint #180 (filed cycle #24) documented a metadata protocol gap:
- `--help` works (argparse default)
- `--version` does NOT exist
The ROADMAP entry deferred implementation pending demand. Cycle #28 dogfood
probe found this during routine invariant audit (attempt to call `--version`
as part of comprehensive CLI surface coverage). This is concrete evidence of
real friction, not speculative gap-filling.
## Implementation
Added `--version` flag to argparse in `build_parser()`:
```python
parser.add_argument('--version', action='version', version='claw-code 1.0.0 (Python harness)')
```
Simple one-liner. Follows Python argparse conventions (built-in action='version').
## Tests Added (3)
TestMetadataFlags in test_exec_route_bootstrap_output_format.py:
1. test_version_flag_returns_version_text — `claw --version` prints version
2. test_help_flag_returns_help_text — `claw --help` still works
3. test_help_still_works_after_version_added — Both -h and --help work
Regression guard on the original help surface.
## Test Status
- 214 → 217 tests passing (+3)
- Zero regressions
- Full suite green
## Discipline
This cycle exemplifies the cycle #24 calibration:
- #180 was filed as 'deferred pending demand'
- Cycle #28 dogfood found actual friction (proactive test coverage gap)
- Evidence = concrete ('--version not found during invariant audit')
- Action = minimal implementation + regression tests
- No speculation, no feature creep, no implementation before evidence
Not 'we imagined someone might want this.' Instead: 'we tried to call it
during routine maintenance, got ENOENT, fixed it.'
## Related
- #180 (cycle #24): Metadata protocol gap filed
- Cycle #27: Cross-channel consistency audit established framework
- Cycle #28 invariant audit: Discovered actual friction, triggered fix
---
Classification (per cycle #24 calibration):
- Red-state bug? ✗ (not a malfunction, just an absence)
- Real friction? ✓ (audit probe could not call the flag, had to special-case)
- Evidence-backed? ✓ (proactive test coverage revealed the gap)
Source: Jobdori cycle #28 dogfood — invariant audit attempting comprehensive
CLI surface coverage found that --version was unsupported.
Cycle #27 ships a new test class systematizing the three-layer protocol
invariant framework.
## Context
After cycles #20–#26, the protocol has three distinct invariant classes:
1. **Structural compliance** (#178): Does the envelope exist?
2. **Quality compliance** (#179): Is stderr silent + error message truthful?
3. **Cross-channel consistency** (#181 + NEW): Do multiple channels agree?
#181 revealed a critical gap: the second test class was incomplete.
Envelopes could be structurally valid, quality-compliant, but still
lie about their own state (envelope.exit_code != actual exit).
## New Test Class
TestCrossChannelConsistency in test_cross_channel_consistency.py captures
the third invariant layer with 5 dedicated tests:
1. envelope.command ↔ dispatched subcommand
2. envelope.output_format ↔ --output-format flag
3. envelope.timestamp ↔ actual wall clock (recent, <5s)
4. envelope.exit_code ↔ process exit code (cycle #26/#181 regression guard)
5. envelope boolean fields (found/handled/deleted) ↔ error block presence
Each test specifically targets cross-channel truth, not structure or quality.
## Why Separate Test Classes Matter
A command can fail all three ways independently:
| Failure mode | Exit/Crash | Test class | Example |
|---|---|---|---|
| Structural | stderr noise | TestParseErrorEnvelope | argparse leaks to stderr |
| Quality | correct shape, wrong message | TestParseErrorStderrHygiene | error instead of real message |
| Cross-channel | truthy field, lie about state | TestCrossChannelConsistency | exit_code: 0 but exit 1 |
#181 was invisible to the first two classes. A claw passing all structure/
quality tests could still be misled. The third class catches that.
## Audit Results (Cycle #27)
All 5 tests pass — no drift detected in any channel pair:
- ✅ Envelope command always matches dispatch
- ✅ Envelope output_format always matches flag
- ✅ Envelope timestamp always recent (<5s)
- ✅ Envelope exit_code always matches process exit (post-#181 guard)
- ✅ Boolean fields consistent with error block presence
The systematic audit proved the fix from #181 holds, and identified
no new cross-channel gaps.
## Test Impact
- 209 → 214 tests passing (+5)
- Zero regressions
- New invariant class now has dedicated test suite
- Future cross-channel bugs will be caught by this class
## Related
- #178 (#20): Parser-front-door structural contract
- #179 (#20): Stderr hygiene + real error message quality
- #181 (#26): Envelope exit_code must match process exit
- #182-N: Future cross-channel contract violations will be caught
by TestCrossChannelConsistency
This test class is evergreen — as new fields/channels are added to the
protocol, invariants for those channels should be added here, not mixed
with other test classes. Keeping invariant classes separate makes
regression attribution instant (e.g., 'TestCrossChannelConsistency failed'
= 'some truth channel disagreed').
Classification (per cycle #24 calibration):
- Red-state bug: ✗ (audit is green)
- Real friction: ✓ (structured audit of documented invariants)
- Proof of equilibrium: ✓ (systematic verification, no gaps found)
Source: Jobdori cycle #27 proactive invariant audit — following gaebal
guidance to probe documented invariants, not speculative gaps.
Cycle #26 dogfood found a real red-state bug in the JSON envelope contract.
## The Bug
exec-command and exec-tool not-found cases return exit code 1 from the
process, but the envelope reports exit_code: 0 (the default from
wrap_json_envelope). This is a protocol violation.
Repro (before fix):
$ claw exec-command unknown-cmd test --output-format json > out.json
$ echo $?
1
$ jq '.exit_code' out.json
0 # WRONG — envelope lies about exit code
Claws reading the envelope's exit_code field get misinformation. A claw
implementing the canonical ERROR_HANDLING.md pattern (check exit_code,
then classify by error.kind) would incorrectly treat failures as
successes when dispatching on the envelope alone.
## Root Cause
main.py lines 687–739 (exec-command + exec-tool handlers):
- Return statement: 'return 0 if result.handled else 1' (correct)
- Envelope wrap: 'wrap_json_envelope(envelope, args.command)'
(uses default exit_code=0, IGNORES the return value)
The envelope wrap was called BEFORE the return value was computed, so
the exit_code field was never synchronized with the actual exit code.
## The Fix
Compute exit_code ONCE at the top:
exit_code = 0 if result.handled else 1
Pass it explicitly to wrap_json_envelope:
wrap_json_envelope(envelope, args.command, exit_code=exit_code)
Return the same value:
return exit_code
This ensures the envelope's exit_code field is always truth — the SAME
value the process returns.
## Tests Added (3)
TestEnvelopeExitCodeMatchesProcessExit in test_exec_route_bootstrap_output_format.py:
1. test_exec_command_not_found_envelope_exit_matches:
Verifies exec-command unknown-cmd returns exit 1 in both envelope
and process.
2. test_exec_tool_not_found_envelope_exit_matches:
Same for exec-tool.
3. test_all_commands_exit_code_invariant:
Audit across 4 known non-zero cases (show-command, show-tool,
exec-command, exec-tool not-found). Guards against the same bug
in other surfaces.
## Impact
- 206 → 209 passing tests (+3)
- Zero regressions
- Protocol contract now truthful: envelope.exit_code == process exit
- Claws using the one-handler pattern from ERROR_HANDLING.md now get
correct information
## Related
- ERROR_HANDLING.md (cycle #22): Documented exit_code as machine-readable
contract field
- #178/#179 (cycles #19/#20): Closed parser-front-door contract
- This closes a gap in the WORK PROTOCOL contract — envelope values must
match reality, not just be structurally present.
Classification (per cycle #24 calibration):
- Red-state bug: ✓ (contract violation, claws get misinformation)
- Real friction: ✓ (discovered via dogfood, not speculative)
- Fix ships same-cycle: ✓ (discipline per maintainership mode)
Source: Jobdori cycle #26 dogfood — ran multiple edge-case probes, noticed
exec-command envelope showed exit_code: 0 while process exited 1.
Investigated wrap_json_envelope default behavior, confirmed bug, fixed
and tested in same cycle.
Cycle #25 ships navigation improvements connecting USAGE (setup/interactive)
to ERROR_HANDLING.md (subprocess/orchestration patterns).
Before: USAGE.md had JSON scripting mention but no link to error-handling guide.
New users reading USAGE would see JSON is available, but wouldn't discover
the error-handling pattern without accidentally finding ERROR_HANDLING.md.
After: Two strategic cross-links:
1. Top-level tip box: "Building orchestration code? See ERROR_HANDLING.md"
2. JSON scripting section expanded with examples + link to unified pattern
Changes to USAGE.md:
- Added TIP callout near top linking to ERROR_HANDLING.md
- Expanded "JSON output for scripting" section:
- Explains what the envelope contains (exit_code, command, timestamp, fields)
- Added 3 command examples (prompt, load-session, turn-loop)
- Added callout for dispatchers/orchestrators pointing to ERROR_HANDLING pattern
Impact: Operators reading USAGE for "how do I call claw from scripts?" now
immediately see the canonical answer (ERROR_HANDLING.md) instead of having
to reverse-engineer it from code examples.
No code changes. Pure navigation/documentation.
Continues the documentation-governance pattern: the work protocol (14 clawable
commands) has a consumption guide (ERROR_HANDLING.md), and that guide is now
reachable from the main entry point (USAGE.md + README.md top nav).
Cycle #24 dogfood discovery.
Running proactive edge-case dogfood on the JSON contract, hit a real pinpoint:
--help and --version are outside the parser-front-door contract.
The gap:
1. "claw --help --output-format json" returns text (not envelope)
2. "claw bootstrap --help --output-format json" returns text (not envelope)
3. "claw --version" doesn't exist at all
Why it matters:
- Claws can't programmatically discover the CLI surface
- Version checking requires side-effectful commands
- Natural follow-up gap to #178/#179 parser-front-door work
Discoverability scenarios:
- Orchestrator checking whether a new command (e.g., turn-loop) is available
- Version compat check before dispatching work
- Enumerating available commands for routing decisions
Filed as Pinpoint #180 in ROADMAP.md with:
- Gap description + 3-case repro
- Impact analysis (version compat, surface enumeration, governance)
- Root cause (argparse default HelpAction prints text + exits)
- Fix shape (3 stages, ~40 lines total)
- Stage A: --version + JSON envelope version metadata
- Stage B: --help JSON routing via custom HelpAction
- Stage C: optional 'schema-info' command for pre-dispatch discovery
- Acceptance criteria (4 cases, including backward compat)
- Priority: Medium (not red-state, but real discoverability gap)
Status: **Filed, implementation deferred.**
Following maintainership equilibrium: pinpoints stay documented but don't
force code changes. If external demand arrives (claw author building a
dispatcher, orchestrator doing version checks), the fix can ship in one
cycle using the shape already documented.
No code changes this cycle. Pure ROADMAP filing.
Continues the maintainership pattern: find friction, document it, defer
until evidence-backed demand arrives.
Source: Jobdori proactive dogfood at 2026-04-22 20:58 KST.
Cycle #23 ships a documentation discoverability fix.
After #22 shipping ERROR_HANDLING.md, the next natural step is making it
discoverable from the project's entry point (README.md).
Before: README top navigation linked to USAGE, PARITY, ROADMAP, Rust workspace.
ERROR_HANDLING.md was buried in CLAUDE.md references.
After: ERROR_HANDLING.md is now in the top navigation (right after USAGE,
before Rust workspace). Also added SCHEMAS.md mention in repository shape.
This signals that:
1. Error handling is a first-class concern (not an afterthought)
2. The Python harness documentation (SCHEMAS.md, ERROR_HANDLING.md, CLAUDE.md)
is part of the official docs, not just dogfood artifacts
3. New users/claws can discover the error-handling pattern at entry point
Impact: Operators building orchestration code will immediately see
'Error Handling' link in navigation, shortening the path to understanding
how to consume the protocol reliably.
No code changes. No test changes. Pure navigation/discoverability.
Cycle #22 ships documentation that operationalizes cycles #178–#179.
Problem context:
After #178 (parse-error envelope) and #179 (stderr hygiene + real error message),
claws can now build a unified error handler for all 14 clawable commands.
But there was no guide on how to actually do that. Operators had the pieces;
they didn't have the pattern.
This file changes that.
New file: ERROR_HANDLING.md
- Quick reference: exit codes + envelope shapes (0=success, 1=error, 2=timeout)
- One-handler pattern: ~80 lines of Python showing how to parse error.kind,
check retryable, and decide recovery strategy
- Four practical recovery patterns:
- Retry on transient errors (filesystem, timeout)
- Reuse session after timeout (if cancel_observed=true)
- Validate command syntax before dispatch (dry-run --help)
- Log errors for observability
- Error kinds enumeration (parse, session_not_found, filesystem, runtime, timeout)
- Common mistakes to avoid (6 patterns with BAD vs GOOD examples)
- Testing your error handler (unit test examples)
Operational impact:
Orchestration code now has a canonical pattern. Claws can:
- Copy-paste the run_claw_command() function (works for all commands)
- Classify errors uniformly (no special cases per command)
- Decide recovery deterministically (error.kind + retryable + cancel_observed)
- Log/monitor/escalate with confidence
Related cycles:
- #178: Parse-error envelope (commands now emit structured JSON on invalid argv)
- #179: Stderr hygiene + real message (JSON mode silences argparse, carries actual error)
- #164 Stage B: cancel_observed field (callers know if session is safe for reuse)
Updated CLAUDE.md:
- Added ERROR_HANDLING.md to 'Related docs' section
- Now documents the one-handler pattern as a guideline
No code changes. No test changes. Pure documentation.
This completes the documentation trail from protocol (SCHEMAS.md) →
governance (OPT_OUT_AUDIT.md, OPT_OUT_DEMAND_LOG.md) → practice (ERROR_HANDLING.md).
Cycle #21 ships governance infrastructure, not implementation. Maintainership
mode means sometimes the right deliverable is a decision framework, not code.
Problem context:
OPT_OUT_AUDIT.md (cycle #18 bonus) established 'demand-backed audit' as the
next step. But without a structured way to record demand signals, 'demand-backed'
was just a slogan — the next audit cycle would have no evidence to work from.
This commit creates the evidentiary base:
New file: OPT_OUT_DEMAND_LOG.md
- Per-surface entries for all 12 OPT_OUT commands (Groups A/B/C)
- Current state: 0 signals across all surfaces (consistent with audit prediction)
- Signal entry template with required fields:
- Source (who/what)
- Use case (concrete orchestration problem)
- Markdown-alternative-checked (why existing output insufficient)
- Date
- Promotion thresholds:
- 2+ independent signals for same surface → file promotion pinpoint
- 1 signal + existing stable schema → file pinpoint for discussion
- 0 signals → stays OPT_OUT (rationale preserved)
Decision framework for cycle #22 (audit close):
- If 0 signals total: move to PERMANENTLY_OPT_OUT, close audit
- If 1-2 signals: file individual promotion pinpoints with evidence
- If 3+ signals: reopen audit, question classification itself
Updated files:
- OPT_OUT_AUDIT.md: Added demand log reference in Related section
- CLAUDE.md: Added prerequisites for promotions (must have logged signals),
added 'File a demand signal' workflow section
Philosophy:
'Prevent speculative expansion' — schema bloat protection discipline.
Every new CLAWABLE surface is a maintenance tax. Evidence requirement keeps
the protocol lean. OPT_OUT surfaces are intentionally not-clawable until
proven otherwise by external demand.
Operational impact:
Next cycles can now:
1. Watch for real claws hitting OPT_OUT surface limits
2. Log signals in structured format (no ad-hoc filing)
3. Run audit at cycle #22 with actual data, not speculation
No code changes. No test changes. Pure governance infrastructure.
Related: #18 cycle (OPT_OUT_AUDIT.md), maintainership phase transition.
Dogfood discovered #178 had two residual gaps:
1. Stderr pollution: argparse usage + error text still leaked to stderr even in
JSON mode (envelope was correct on stdout, but stderr noise broke the
'machine-first protocol' contract — claws capturing both streams got dual output)
2. Generic error message: envelope carried 'invalid command or argument (argparse
rejection)' instead of argparse's actual text like 'the following arguments
are required: session_id' or 'invalid choice: typo (choose from ...)'
Before #179:
$ claw load-session --output-format json
[stdout] {"error": {"message": "invalid command or argument (argparse rejection)"}}
[stderr] usage: main.py load-session [-h] ...
main.py load-session: error: the following arguments are required: session_id
[exit 1]
After #179:
$ claw load-session --output-format json
[stdout] {"error": {"message": "the following arguments are required: session_id"}}
[stderr] (empty)
[exit 1]
Implementation:
- New _ArgparseError exception class captures argparse's real message
- main() monkey-patches parser.error (+ all subparser.error) in JSON mode to raise
_ArgparseError instead of print-to-stderr + sys.exit(2)
- _emit_parse_error_envelope() now receives the real message verbatim
- Text mode path unchanged: still uses original argparse print+exit behavior
Contract:
- JSON mode: stdout carries envelope with argparse's actual error; stderr silent
- Text mode: unchanged — argparse usage to stderr, exit 2
- Parse errors still error.kind='parse', retryable=false
Test additions (5 new, 14 total in test_parse_error_envelope.py):
- TestParseErrorStderrHygiene (5):
- test_json_mode_stderr_is_silent_on_unknown_command
- test_json_mode_stderr_is_silent_on_missing_arg
- test_json_mode_envelope_carries_real_argparse_message
- test_json_mode_envelope_carries_invalid_choice_details (verifies valid-choices list)
- test_text_mode_stderr_preserved_on_unknown_command (backward compat)
Operational impact:
Claws capturing both stdout and stderr no longer get garbled output. The envelope
message now carries discoverability info (valid command list, missing-arg name)
that claws can use for retry/recovery without probing the CLI a second time.
Test results: 201 → 206 passing, 3 skipped unchanged, zero regression.
Pinpoint discovered via dogfood at 2026-04-22 20:30 KST (cycle #20).
Filed explicit decision criteria for the 12 OPT_OUT surfaces (commands that do
not support --output-format json) documented in test_cli_parity_audit.py.
Categorized by rationale:
- Group A (4): Rich-Markdown reports (summary, manifest, parity-audit, setup-report)
Markdown-as-output is intentional; JSON would be information loss.
Unlikely promotions (remain OPT_OUT long-term).
- Group B (3): List filters with --query/--limit (subsystems, commands, tools)
Query layer already exists; users have escape hatch.
Remain OPT_OUT (promotion effort >> value).
- Group C (5): Simulation/debug surfaces (remote-mode, ssh-mode, teleport-mode,
direct-connect-mode, deep-link-mode)
Intentionally non-production; JSON output doesn't add value.
Remain OPT_OUT (simulation tools, not orchestration endpoints).
Audit workflow documented:
1. Survey: Check if external claws actually request JSON versions
2. Cost estimate: Schema + tests for each surface
3. Value estimate: Real demand vs hypothetical
4. Decision: CLAWABLE, remain OPT_OUT, or new pinpoint
Promotion criteria locked (only if clear use case + schema simple + demand exists).
Outcome prediction: All 12 likely remain OPT_OUT (documented rationale per group).
Timeline: Survey period (cycles #19–#21), final decision (cycle #22).
Related pinpoints: #175 (summary/manifest JSON parallel?), #176 (--query-json?),
#177 (mode simulators ever CLAWABLE?).
This closes the documentation loop from cycles #173–#174 (protocol closure →
field evolution → reframe). Now governance rules are explicit for future work.
#164 Stage B requires exposing whether cancellation was observed at the
turn-result level. This commit adds the infrastructure field:
Changes:
- TurnResult.cancel_observed: bool = False (query_engine.py)
- _build_timeout_result() accepts cancel_observed parameter (runtime.py)
- Two timeout paths now pass cancel_event.is_set() to signal observation (runtime.py)
- bootstrap command includes cancel_observed in turn JSON (main.py)
- SCHEMAS.md documents Turn Result Fields with cancel_observed contract
Usage:
When a turn timeout occurs, cancel_observed=true indicates that the
engine observed the cancellation event being set. This allows callers
to distinguish:
- timeout with no cancel → infrastructure/network stall
- timeout with cancel observed → cooperative cancellation was triggered
Backward compat:
- Existing TurnResult construction without cancel_observed defaults to False
- bootstrap JSON output still validates per SCHEMAS.md (new field is always present)
Test results: 182 passing, 3 skipped, zero regression.
Related: #161 (wall-clock timeout), #164 (cancellation observability protocol)
ROADMAP continues #164 with Stage C (test coverage for cancellation + turn envelope).