Cycle #27 ships a new test class systematizing the three-layer protocol
invariant framework.
## Context
After cycles #20–#26, the protocol has three distinct invariant classes:
1. **Structural compliance** (#178): Does the envelope exist?
2. **Quality compliance** (#179): Is stderr silent + error message truthful?
3. **Cross-channel consistency** (#181 + NEW): Do multiple channels agree?
#181 revealed a critical gap: the second test class was incomplete.
Envelopes could be structurally valid, quality-compliant, but still
lie about their own state (envelope.exit_code != actual exit).
## New Test Class
TestCrossChannelConsistency in test_cross_channel_consistency.py captures
the third invariant layer with 5 dedicated tests:
1. envelope.command ↔ dispatched subcommand
2. envelope.output_format ↔ --output-format flag
3. envelope.timestamp ↔ actual wall clock (recent, <5s)
4. envelope.exit_code ↔ process exit code (cycle #26/#181 regression guard)
5. envelope boolean fields (found/handled/deleted) ↔ error block presence
Each test specifically targets cross-channel truth, not structure or quality.
## Why Separate Test Classes Matter
A command can fail all three ways independently:
| Failure mode | Exit/Crash | Test class | Example |
|---|---|---|---|
| Structural | stderr noise | TestParseErrorEnvelope | argparse leaks to stderr |
| Quality | correct shape, wrong message | TestParseErrorStderrHygiene | error instead of real message |
| Cross-channel | truthy field, lie about state | TestCrossChannelConsistency | exit_code: 0 but exit 1 |
#181 was invisible to the first two classes. A claw passing all structure/
quality tests could still be misled. The third class catches that.
## Audit Results (Cycle #27)
All 5 tests pass — no drift detected in any channel pair:
- ✅ Envelope command always matches dispatch
- ✅ Envelope output_format always matches flag
- ✅ Envelope timestamp always recent (<5s)
- ✅ Envelope exit_code always matches process exit (post-#181 guard)
- ✅ Boolean fields consistent with error block presence
The systematic audit proved the fix from #181 holds, and identified
no new cross-channel gaps.
## Test Impact
- 209 → 214 tests passing (+5)
- Zero regressions
- New invariant class now has dedicated test suite
- Future cross-channel bugs will be caught by this class
## Related
- #178 (#20): Parser-front-door structural contract
- #179 (#20): Stderr hygiene + real error message quality
- #181 (#26): Envelope exit_code must match process exit
- #182-N: Future cross-channel contract violations will be caught
by TestCrossChannelConsistency
This test class is evergreen — as new fields/channels are added to the
protocol, invariants for those channels should be added here, not mixed
with other test classes. Keeping invariant classes separate makes
regression attribution instant (e.g., 'TestCrossChannelConsistency failed'
= 'some truth channel disagreed').
Classification (per cycle #24 calibration):
- Red-state bug: ✗ (audit is green)
- Real friction: ✓ (structured audit of documented invariants)
- Proof of equilibrium: ✓ (systematic verification, no gaps found)
Source: Jobdori cycle #27 proactive invariant audit — following gaebal
guidance to probe documented invariants, not speculative gaps.
Cycle #26 dogfood found a real red-state bug in the JSON envelope contract.
## The Bug
exec-command and exec-tool not-found cases return exit code 1 from the
process, but the envelope reports exit_code: 0 (the default from
wrap_json_envelope). This is a protocol violation.
Repro (before fix):
$ claw exec-command unknown-cmd test --output-format json > out.json
$ echo $?
1
$ jq '.exit_code' out.json
0 # WRONG — envelope lies about exit code
Claws reading the envelope's exit_code field get misinformation. A claw
implementing the canonical ERROR_HANDLING.md pattern (check exit_code,
then classify by error.kind) would incorrectly treat failures as
successes when dispatching on the envelope alone.
## Root Cause
main.py lines 687–739 (exec-command + exec-tool handlers):
- Return statement: 'return 0 if result.handled else 1' (correct)
- Envelope wrap: 'wrap_json_envelope(envelope, args.command)'
(uses default exit_code=0, IGNORES the return value)
The envelope wrap was called BEFORE the return value was computed, so
the exit_code field was never synchronized with the actual exit code.
## The Fix
Compute exit_code ONCE at the top:
exit_code = 0 if result.handled else 1
Pass it explicitly to wrap_json_envelope:
wrap_json_envelope(envelope, args.command, exit_code=exit_code)
Return the same value:
return exit_code
This ensures the envelope's exit_code field is always truth — the SAME
value the process returns.
## Tests Added (3)
TestEnvelopeExitCodeMatchesProcessExit in test_exec_route_bootstrap_output_format.py:
1. test_exec_command_not_found_envelope_exit_matches:
Verifies exec-command unknown-cmd returns exit 1 in both envelope
and process.
2. test_exec_tool_not_found_envelope_exit_matches:
Same for exec-tool.
3. test_all_commands_exit_code_invariant:
Audit across 4 known non-zero cases (show-command, show-tool,
exec-command, exec-tool not-found). Guards against the same bug
in other surfaces.
## Impact
- 206 → 209 passing tests (+3)
- Zero regressions
- Protocol contract now truthful: envelope.exit_code == process exit
- Claws using the one-handler pattern from ERROR_HANDLING.md now get
correct information
## Related
- ERROR_HANDLING.md (cycle #22): Documented exit_code as machine-readable
contract field
- #178/#179 (cycles #19/#20): Closed parser-front-door contract
- This closes a gap in the WORK PROTOCOL contract — envelope values must
match reality, not just be structurally present.
Classification (per cycle #24 calibration):
- Red-state bug: ✓ (contract violation, claws get misinformation)
- Real friction: ✓ (discovered via dogfood, not speculative)
- Fix ships same-cycle: ✓ (discipline per maintainership mode)
Source: Jobdori cycle #26 dogfood — ran multiple edge-case probes, noticed
exec-command envelope showed exit_code: 0 while process exited 1.
Investigated wrap_json_envelope default behavior, confirmed bug, fixed
and tested in same cycle.
Cycle #25 ships navigation improvements connecting USAGE (setup/interactive)
to ERROR_HANDLING.md (subprocess/orchestration patterns).
Before: USAGE.md had JSON scripting mention but no link to error-handling guide.
New users reading USAGE would see JSON is available, but wouldn't discover
the error-handling pattern without accidentally finding ERROR_HANDLING.md.
After: Two strategic cross-links:
1. Top-level tip box: "Building orchestration code? See ERROR_HANDLING.md"
2. JSON scripting section expanded with examples + link to unified pattern
Changes to USAGE.md:
- Added TIP callout near top linking to ERROR_HANDLING.md
- Expanded "JSON output for scripting" section:
- Explains what the envelope contains (exit_code, command, timestamp, fields)
- Added 3 command examples (prompt, load-session, turn-loop)
- Added callout for dispatchers/orchestrators pointing to ERROR_HANDLING pattern
Impact: Operators reading USAGE for "how do I call claw from scripts?" now
immediately see the canonical answer (ERROR_HANDLING.md) instead of having
to reverse-engineer it from code examples.
No code changes. Pure navigation/documentation.
Continues the documentation-governance pattern: the work protocol (14 clawable
commands) has a consumption guide (ERROR_HANDLING.md), and that guide is now
reachable from the main entry point (USAGE.md + README.md top nav).
Cycle #24 dogfood discovery.
Running proactive edge-case dogfood on the JSON contract, hit a real pinpoint:
--help and --version are outside the parser-front-door contract.
The gap:
1. "claw --help --output-format json" returns text (not envelope)
2. "claw bootstrap --help --output-format json" returns text (not envelope)
3. "claw --version" doesn't exist at all
Why it matters:
- Claws can't programmatically discover the CLI surface
- Version checking requires side-effectful commands
- Natural follow-up gap to #178/#179 parser-front-door work
Discoverability scenarios:
- Orchestrator checking whether a new command (e.g., turn-loop) is available
- Version compat check before dispatching work
- Enumerating available commands for routing decisions
Filed as Pinpoint #180 in ROADMAP.md with:
- Gap description + 3-case repro
- Impact analysis (version compat, surface enumeration, governance)
- Root cause (argparse default HelpAction prints text + exits)
- Fix shape (3 stages, ~40 lines total)
- Stage A: --version + JSON envelope version metadata
- Stage B: --help JSON routing via custom HelpAction
- Stage C: optional 'schema-info' command for pre-dispatch discovery
- Acceptance criteria (4 cases, including backward compat)
- Priority: Medium (not red-state, but real discoverability gap)
Status: **Filed, implementation deferred.**
Following maintainership equilibrium: pinpoints stay documented but don't
force code changes. If external demand arrives (claw author building a
dispatcher, orchestrator doing version checks), the fix can ship in one
cycle using the shape already documented.
No code changes this cycle. Pure ROADMAP filing.
Continues the maintainership pattern: find friction, document it, defer
until evidence-backed demand arrives.
Source: Jobdori proactive dogfood at 2026-04-22 20:58 KST.
Cycle #23 ships a documentation discoverability fix.
After #22 shipping ERROR_HANDLING.md, the next natural step is making it
discoverable from the project's entry point (README.md).
Before: README top navigation linked to USAGE, PARITY, ROADMAP, Rust workspace.
ERROR_HANDLING.md was buried in CLAUDE.md references.
After: ERROR_HANDLING.md is now in the top navigation (right after USAGE,
before Rust workspace). Also added SCHEMAS.md mention in repository shape.
This signals that:
1. Error handling is a first-class concern (not an afterthought)
2. The Python harness documentation (SCHEMAS.md, ERROR_HANDLING.md, CLAUDE.md)
is part of the official docs, not just dogfood artifacts
3. New users/claws can discover the error-handling pattern at entry point
Impact: Operators building orchestration code will immediately see
'Error Handling' link in navigation, shortening the path to understanding
how to consume the protocol reliably.
No code changes. No test changes. Pure navigation/discoverability.
Cycle #22 ships documentation that operationalizes cycles #178–#179.
Problem context:
After #178 (parse-error envelope) and #179 (stderr hygiene + real error message),
claws can now build a unified error handler for all 14 clawable commands.
But there was no guide on how to actually do that. Operators had the pieces;
they didn't have the pattern.
This file changes that.
New file: ERROR_HANDLING.md
- Quick reference: exit codes + envelope shapes (0=success, 1=error, 2=timeout)
- One-handler pattern: ~80 lines of Python showing how to parse error.kind,
check retryable, and decide recovery strategy
- Four practical recovery patterns:
- Retry on transient errors (filesystem, timeout)
- Reuse session after timeout (if cancel_observed=true)
- Validate command syntax before dispatch (dry-run --help)
- Log errors for observability
- Error kinds enumeration (parse, session_not_found, filesystem, runtime, timeout)
- Common mistakes to avoid (6 patterns with BAD vs GOOD examples)
- Testing your error handler (unit test examples)
Operational impact:
Orchestration code now has a canonical pattern. Claws can:
- Copy-paste the run_claw_command() function (works for all commands)
- Classify errors uniformly (no special cases per command)
- Decide recovery deterministically (error.kind + retryable + cancel_observed)
- Log/monitor/escalate with confidence
Related cycles:
- #178: Parse-error envelope (commands now emit structured JSON on invalid argv)
- #179: Stderr hygiene + real message (JSON mode silences argparse, carries actual error)
- #164 Stage B: cancel_observed field (callers know if session is safe for reuse)
Updated CLAUDE.md:
- Added ERROR_HANDLING.md to 'Related docs' section
- Now documents the one-handler pattern as a guideline
No code changes. No test changes. Pure documentation.
This completes the documentation trail from protocol (SCHEMAS.md) →
governance (OPT_OUT_AUDIT.md, OPT_OUT_DEMAND_LOG.md) → practice (ERROR_HANDLING.md).
Cycle #21 ships governance infrastructure, not implementation. Maintainership
mode means sometimes the right deliverable is a decision framework, not code.
Problem context:
OPT_OUT_AUDIT.md (cycle #18 bonus) established 'demand-backed audit' as the
next step. But without a structured way to record demand signals, 'demand-backed'
was just a slogan — the next audit cycle would have no evidence to work from.
This commit creates the evidentiary base:
New file: OPT_OUT_DEMAND_LOG.md
- Per-surface entries for all 12 OPT_OUT commands (Groups A/B/C)
- Current state: 0 signals across all surfaces (consistent with audit prediction)
- Signal entry template with required fields:
- Source (who/what)
- Use case (concrete orchestration problem)
- Markdown-alternative-checked (why existing output insufficient)
- Date
- Promotion thresholds:
- 2+ independent signals for same surface → file promotion pinpoint
- 1 signal + existing stable schema → file pinpoint for discussion
- 0 signals → stays OPT_OUT (rationale preserved)
Decision framework for cycle #22 (audit close):
- If 0 signals total: move to PERMANENTLY_OPT_OUT, close audit
- If 1-2 signals: file individual promotion pinpoints with evidence
- If 3+ signals: reopen audit, question classification itself
Updated files:
- OPT_OUT_AUDIT.md: Added demand log reference in Related section
- CLAUDE.md: Added prerequisites for promotions (must have logged signals),
added 'File a demand signal' workflow section
Philosophy:
'Prevent speculative expansion' — schema bloat protection discipline.
Every new CLAWABLE surface is a maintenance tax. Evidence requirement keeps
the protocol lean. OPT_OUT surfaces are intentionally not-clawable until
proven otherwise by external demand.
Operational impact:
Next cycles can now:
1. Watch for real claws hitting OPT_OUT surface limits
2. Log signals in structured format (no ad-hoc filing)
3. Run audit at cycle #22 with actual data, not speculation
No code changes. No test changes. Pure governance infrastructure.
Related: #18 cycle (OPT_OUT_AUDIT.md), maintainership phase transition.
Dogfood discovered #178 had two residual gaps:
1. Stderr pollution: argparse usage + error text still leaked to stderr even in
JSON mode (envelope was correct on stdout, but stderr noise broke the
'machine-first protocol' contract — claws capturing both streams got dual output)
2. Generic error message: envelope carried 'invalid command or argument (argparse
rejection)' instead of argparse's actual text like 'the following arguments
are required: session_id' or 'invalid choice: typo (choose from ...)'
Before #179:
$ claw load-session --output-format json
[stdout] {"error": {"message": "invalid command or argument (argparse rejection)"}}
[stderr] usage: main.py load-session [-h] ...
main.py load-session: error: the following arguments are required: session_id
[exit 1]
After #179:
$ claw load-session --output-format json
[stdout] {"error": {"message": "the following arguments are required: session_id"}}
[stderr] (empty)
[exit 1]
Implementation:
- New _ArgparseError exception class captures argparse's real message
- main() monkey-patches parser.error (+ all subparser.error) in JSON mode to raise
_ArgparseError instead of print-to-stderr + sys.exit(2)
- _emit_parse_error_envelope() now receives the real message verbatim
- Text mode path unchanged: still uses original argparse print+exit behavior
Contract:
- JSON mode: stdout carries envelope with argparse's actual error; stderr silent
- Text mode: unchanged — argparse usage to stderr, exit 2
- Parse errors still error.kind='parse', retryable=false
Test additions (5 new, 14 total in test_parse_error_envelope.py):
- TestParseErrorStderrHygiene (5):
- test_json_mode_stderr_is_silent_on_unknown_command
- test_json_mode_stderr_is_silent_on_missing_arg
- test_json_mode_envelope_carries_real_argparse_message
- test_json_mode_envelope_carries_invalid_choice_details (verifies valid-choices list)
- test_text_mode_stderr_preserved_on_unknown_command (backward compat)
Operational impact:
Claws capturing both stdout and stderr no longer get garbled output. The envelope
message now carries discoverability info (valid command list, missing-arg name)
that claws can use for retry/recovery without probing the CLI a second time.
Test results: 201 → 206 passing, 3 skipped unchanged, zero regression.
Pinpoint discovered via dogfood at 2026-04-22 20:30 KST (cycle #20).
Filed explicit decision criteria for the 12 OPT_OUT surfaces (commands that do
not support --output-format json) documented in test_cli_parity_audit.py.
Categorized by rationale:
- Group A (4): Rich-Markdown reports (summary, manifest, parity-audit, setup-report)
Markdown-as-output is intentional; JSON would be information loss.
Unlikely promotions (remain OPT_OUT long-term).
- Group B (3): List filters with --query/--limit (subsystems, commands, tools)
Query layer already exists; users have escape hatch.
Remain OPT_OUT (promotion effort >> value).
- Group C (5): Simulation/debug surfaces (remote-mode, ssh-mode, teleport-mode,
direct-connect-mode, deep-link-mode)
Intentionally non-production; JSON output doesn't add value.
Remain OPT_OUT (simulation tools, not orchestration endpoints).
Audit workflow documented:
1. Survey: Check if external claws actually request JSON versions
2. Cost estimate: Schema + tests for each surface
3. Value estimate: Real demand vs hypothetical
4. Decision: CLAWABLE, remain OPT_OUT, or new pinpoint
Promotion criteria locked (only if clear use case + schema simple + demand exists).
Outcome prediction: All 12 likely remain OPT_OUT (documented rationale per group).
Timeline: Survey period (cycles #19–#21), final decision (cycle #22).
Related pinpoints: #175 (summary/manifest JSON parallel?), #176 (--query-json?),
#177 (mode simulators ever CLAWABLE?).
This closes the documentation loop from cycles #173–#174 (protocol closure →
field evolution → reframe). Now governance rules are explicit for future work.
#164 Stage B requires exposing whether cancellation was observed at the
turn-result level. This commit adds the infrastructure field:
Changes:
- TurnResult.cancel_observed: bool = False (query_engine.py)
- _build_timeout_result() accepts cancel_observed parameter (runtime.py)
- Two timeout paths now pass cancel_event.is_set() to signal observation (runtime.py)
- bootstrap command includes cancel_observed in turn JSON (main.py)
- SCHEMAS.md documents Turn Result Fields with cancel_observed contract
Usage:
When a turn timeout occurs, cancel_observed=true indicates that the
engine observed the cancellation event being set. This allows callers
to distinguish:
- timeout with no cancel → infrastructure/network stall
- timeout with cancel observed → cooperative cancellation was triggered
Backward compat:
- Existing TurnResult construction without cancel_observed defaults to False
- bootstrap JSON output still validates per SCHEMAS.md (new field is always present)
Test results: 182 passing, 3 skipped, zero regression.
Related: #161 (wall-clock timeout), #164 (cancellation observability protocol)
ROADMAP continues #164 with Stage C (test coverage for cancellation + turn envelope).
Adds parametrised test suite validating that clawable-surface commands'
JSON output matches their declared envelope contracts per SCHEMAS.md.
Two phases:
Phase 1 (this commit): Consistency baseline.
- Collect ENVELOPE_CONTRACTS registry mapping each command to its
required and optional fields
- TestJsonEnvelopeConsistency: parametrised test iterates over 13
commands, invokes with --output-format json, validates that
actual JSON envelope contains all required fields
- test_envelope_field_value_types: spot-check types (int, str, list)
for consistency
Phase 2 (future #173): Common field wrapping.
- Once wrap_json_envelope() is applied, all commands will emit
timestamp, command, exit_code, output_format, schema_version
- Currently skipped via @pytest.mark.skip, these tests will activate
automatically when wrapping is implemented:
TestJsonEnvelopeCommonFieldPrep::test_all_envelopes_include_timestamp
TestJsonEnvelopeCommonFieldPrep::test_all_envelopes_include_command
TestJsonEnvelopeCommonFieldPrep::test_all_envelopes_include_exit_code_and_schema_version
Why this matters:
- #172 documented the JSON contract; this test validates it
- Currently detects when actual output diverges from SCHEMAS.md
(e.g. list-sessions emits 'count', not 'sessions_count')
- As #173 wraps commands, test suite auto-validates new common fields
- Prevents regression: accidental field removal breaks the test suite
Current status: 11 passed (consistency), 6 skipped (awaiting #173)
Full suite: 168 → 179 passing, zero regression.
Closes ROADMAP #173 prep (framework for common field validation).
Actual field wrapping remains for next cycle.
Stops manual parity inspection from being a human-noticed concern. When
a developer adds a new subcommand to the claw-code CLI, this test suite
enforces explicit classification:
- CLAWABLE_SURFACES: MUST accept --output-format {text,json}
- OPT_OUT_SURFACES: explicitly exempt with documented rationale
A new command that forgets to opt into one of these two sets FAILS
loudly with TestCommandClassificationCoverage::test_every_registered_
command_is_classified. No silent drift possible.
Technique: argparse introspection at test time walks the _actions tree,
discovers every registered subcommand, and compares against the declared
classification sets. Contract is enforced machine-first instead of
depending on human review.
Three test classes covering three invariants:
TestClawableSurfaceParity (14 tests):
- test_all_clawable_surfaces_accept_output_format: every member of
CLAWABLE_SURFACES has --output-format flag registered
- test_clawable_surface_output_format_choices (parametrised over 13
commands): each must accept exactly {text, json} and default to 'text'
for backward compat
TestCommandClassificationCoverage (3 tests):
- test_every_registered_command_is_classified: any new subcommand
must be explicitly added to CLAWABLE_SURFACES or OPT_OUT_SURFACES
- test_no_command_in_both_sets: sanity check for classification conflicts
- test_all_classified_commands_actually_exist: no phantom commands
(catches stale entries after a command is removed)
TestJsonOutputContractEndToEnd (10 tests):
- test_command_emits_parseable_json (parametrised over 10 clawable
commands): actual subprocess invocation with --output-format json
produces valid parseable JSON on stdout
Classification:
CLAWABLE_SURFACES (13):
Session lifecycle: list-sessions, delete-session, load-session,
flush-transcript
Inspect: show-command, show-tool
Execution: exec-command, exec-tool, route, bootstrap
Diagnostic inventory: command-graph, tool-pool, bootstrap-graph
OPT_OUT_SURFACES (12):
Rich-Markdown reports (future JSON schema): summary, manifest,
parity-audit, setup-report
List filter commands: subsystems, commands, tools
Turn-loop: structured_output is future work
Simulation/debug: remote-mode, ssh-mode, teleport-mode,
direct-connect-mode, deep-link-mode
Full suite: 141 → 168 passing (+27), zero regression.
Closes ROADMAP #171.
Why this matters:
Before: parity was human-monitored; every new command was a drift
risk. The CLUSTER 3 sweep required manually auditing every
subcommand and landing fixes as separate pinpoints.
After: parity is machine-enforced. If a future developer adds a new
command without --output-format, the test suite blocks it
immediately with a concrete error message pointing at the
missing flag.
This is the first step in Gaebal-gajae's identified upper-level work:
operationalised parity instead of aspirational parity.
Related clusters:
- Clawability principle: machine-first protocol enforcement
- Test-first regression guard: extends TestTripletParityConsistency
(#160/#165) and TestFullFamilyParity (#166) from per-cluster
parity to cross-surface parity
Final diagnostic surface in the JSON parity sweep: bootstrap-graph
(the runtime bootstrap/prefetch visualization) now supports --output-format.
Concrete addition:
- bootstrap-graph: --output-format {text,json}
JSON envelope:
{stages: [str], note: 'bootstrap-graph is markdown-only in this version'}
Envelope explanation: bootstrap-graph's Markdown output is rich and
textual; raw JSON embedding maintains the markdown format (split into
lines array) rather than attempting lossy structural extraction that
would lose information. This is an honest limitation in this cycle;
full JSON schema can be added in a future audit if claws require
structured bootstrap data (dependency graphs, prefetch timing, etc.).
Backward compatibility:
- Default is 'text' (Markdown unchanged)
Closes ROADMAP #170.
Related: #167, #168, #169. Diagnostic/inventory surface family is now
uniformly JSON-capable. Summary, manifest, parity-audit, setup-report,
command-graph, tool-pool, bootstrap-graph all accept --output-format.
Extends the diagnostic surface audit with the two inventory-structure
commands: command-graph (command family segmentation) and tool-pool
(assembled tool inventory). Both now expose their underlying rich
datastructures via JSON envelope.
Concrete additions:
- command-graph: --output-format {text,json}
- tool-pool: --output-format {text,json}
JSON envelope shapes:
command-graph:
{builtins_count, plugin_like_count, skill_like_count, total_count,
builtins: [{name, source_hint}],
plugin_like: [{name, source_hint}],
skill_like: [{name, source_hint}]}
tool-pool:
{simple_mode, include_mcp, tool_count,
tools: [{name, source_hint}]}
Backward compatibility:
- Default is 'text' (Markdown unchanged)
- Text output byte-identical to pre-#169
Tests (4 new, test_command_graph_tool_pool_output_format.py):
- TestCommandGraphOutputFormat (2): JSON structure + text compat
- TestToolPoolOutputFormat (2): JSON structure + text compat
Full suite: 137 → 141 passing, zero regression.
Closes ROADMAP #169.
Why this matters:
Claws auditing the codebase can now ask 'what commands exist' and
'what tools exist' and get structured, parseable answers instead of
regex-parsing Markdown headers and counting list items.
Related clusters:
- Diagnostic surfaces (#169 adds to #167/#168 work-verb parity)
- Inventory introspection (command-graph + tool-pool are the two
foundational 'what do we have?' queries)
Closes the inspect-capability parity gap: show-command and show-tool were
the only discovery/inspection CLI commands lacking --output-format support,
making them outliers in the ecosystem that already had unified JSON
contracts across list-sessions, load-session, delete-session, and
flush-transcript (#160/#165/#166).
Concrete additions:
- show-command: --output-format {text,json}
- show-tool: --output-format {text,json}
JSON envelope shape (found case):
{name, found: true, source_hint, responsibility}
JSON envelope shape (not-found case):
{name, found: false, error: {kind:'command_not_found'|'tool_not_found',
message, retryable: false}}
Exit codes:
0 = success
1 = not found
Backward compatibility:
- Default (no --output-format) is 'text' (unchanged)
- Text output byte-identical to pre-#167 (three newline-separated lines)
Tests (10 new, test_show_command_tool_output_format.py):
- TestShowCommandOutputFormat (5): found + not-found in JSON; text mode
backward compat; text is default
- TestShowToolOutputFormat (3): found + not-found in JSON; text mode
backward compat
- TestShowCommandToolFormatParity (2): both accept same flag choices;
consistent JSON envelope shape
Full suite: 114 → 124 passing, zero regression.
Closes ROADMAP #167.
Why this matters:
Before: Claws calling show-command/show-tool had to parse human-readable
prose output via regex, with no structured error signal.
After: Same envelope contract as load-session and friends: JSON-first,
typed errors, machine-parseable.
Related clusters:
- Session-lifecycle CLI parity family (#160, #165, #166, #167)
- Machine-readable error contracts (same vein as #162 atomicity + #164
cancellation state-safety: structured boundaries for orchestration)
Closes the #161 follow-up gap identified in review: wall-clock timeout
bounded caller-facing wait but did not cancel the underlying provider
thread, which could silently mutate mutable_messages / transcript_store /
permission_denials / total_usage after the caller had already observed
stop_reason='timeout'. A ghost turn committed post-deadline would poison
any session that got persisted afterwards.
Stage A scope (this commit): runtime + engine layer cooperative cancel.
Engine layer (src/query_engine.py):
- submit_message now accepts cancel_event: threading.Event | None = None
- Two safe checkpoints:
1. Entry (before max_turns / budget projection) — earliest possible return
2. Post-budget (after output synthesis, before mutation) — catches cancel
that arrives while output was being computed
- Both checkpoints return stop_reason='cancelled' with state UNCHANGED
(mutable_messages, transcript_store, permission_denials, total_usage
all preserved exactly as on entry)
- cancel_event=None preserves legacy behaviour with zero overhead (no
checkpoint checks at all)
Runtime layer (src/runtime.py):
- run_turn_loop creates one cancel_event per invocation when a deadline
is in play (and None otherwise, preserving legacy fast path)
- Passes the same event to every submit_message call across turns, so a
late cancel on turn N-1 affects turn N
- On timeout (either pre-call or mid-call), runtime explicitly calls
cancel_event.set() before future.cancel() + synthesizing the timeout
TurnResult. This upgrades #161's best-effort future.cancel() (which
only cancels not-yet-started futures) to cooperative mid-flight cancel.
Stop reason taxonomy after Stage A:
'completed' — turn committed, state mutated exactly once
'max_budget_reached' — overflow, state unchanged (#162)
'max_turns_reached' — capacity exceeded, state unchanged
'cancelled' — cancel_event observed, state unchanged (#164 Stage A)
'timeout' — synthesised by runtime, not engine (#161)
The 'cancelled' vs 'timeout' split matters:
- 'timeout' is the runtime's best-effort signal to the caller: deadline hit
- 'cancelled' is the engine's confirmation: cancel was observed + honoured
If the provider call wedges entirely (never reaches a checkpoint), the
caller still sees 'timeout' and the thread is leaked — but any NEXT
submit_message call on the same engine observes the event at entry and
returns 'cancelled' immediately, preventing ghost-turn accumulation.
This is the honest cooperative limit in Python threading land; true
preemption requires async-native provider IO (future work, not Stage A).
Tests (29 new tests, tests/test_submit_message_cancellation.py + tests/
test_run_turn_loop_cancellation.py):
Engine-layer (12 tests):
- TestCancellationBeforeCall (5): pre-set event returns 'cancelled' immediately;
mutable_messages, transcript_store, usage, permission_denials all preserved
- TestCancellationAfterBudgetCheck (1): cancel set mid-call (after projection,
before commit) still honoured; output synthesised but state untouched
- TestCancellationAfterCommit (2): post-commit cancel not observable (honest
limit) BUT next call on same engine observes it + returns 'cancelled'
- TestLegacyCallersUnchanged (3): cancel_event=None preserves #162 atomicity
+ max_turns contract with zero behaviour change
- TestCancellationVsOtherStopReasons (2): cancel precedes max_turns check;
cancel does not retroactively override a completed turn
Runtime-layer (5 tests):
- TestTimeoutPropagatesCancelEvent (3): submit_message receives a real Event
object when deadline is set; None in legacy mode; timeout actually calls
event.set() so in-flight threads observe at their next checkpoint
- TestCancelEventSharedAcrossTurns (1): same event object passed to every
turn (object identity check) — late cancel on turn N-1 must affect turn N
Regression: 3 existing timeout test mocks updated to accept cancel_event
kwarg (mocks that previously had signature (prompt, commands, tools, denials)
now have (prompt, commands, tools, denials, cancel_event=None) since runtime
passes cancel_event positionally on the timeout path).
Full suite: 97 → 114 passing, zero regression.
Closes ROADMAP #164 Stage A.
What's explicitly NOT in Stage A:
- Preemptive cancellation of wedged provider IO (requires asyncio-native
provider path; larger refactor)
- Timeout on the legacy unbounded run_turn_loop path (by design: legacy
callers opt out of cancellation entirely)
- CLI exposure of 'cancelled' as a distinct exit code (currently 'cancelled'
maps to the same stop_reason != 'completed' break condition as others;
CLI surface for cancel is a separate pinpoint if warranted)
Every 'claw flush-transcript' call without --directory writes to
.port_sessions/<uuid>.json in CWD. Without a gitignore entry, every
dogfood run leaves dozens of untracked files in the repo, masking real
changes in 'git status' output.
Now that #160/#166 ship structured session lifecycle commands and
deterministic --session-id, this directory is purely transient by
default — belongs in .gitignore.
#159: multi-turn sessions had a silent security asymmetry: denied_tools
were always empty in run_turn_loop, even though bootstrap_session inferred
them from the routed matches. Result: any tool gated as 'destructive'
(bash-family commands, rm, etc) would silently appear unblocked across all
turns in multi-turn mode, giving a false 'clean' permission picture to any
claw consuming TurnResult.permission_denials.
Fix: compute denied_tools once at loop start via _infer_permission_denials,
then pass the same denials to every submit_message call (both timeout and
legacy unbounded paths). This mirrors the existing bootstrap_session pattern.
Acceptance: run_turn_loop('run bash ls').permission_denials now matches
what bootstrap_session returns — both infer the same denials from the
routed matches. Multi-turn security posture is symmetric.
Tests (tests/test_run_turn_loop_permissions.py, 2 tests):
- test_turn_loop_surfaces_permission_denials_like_bootstrap: Symmetry
check confirming both paths infer identical denials for destructive tools
- test_turn_loop_with_continuation_preserves_denials: Denials inferred at
loop start are passed consistently to all turns; captured via mock and
verified non-empty
Full suite: 82/82 passing, zero regression.
Closes ROADMAP #159.
The #160 session-lifecycle CLI triplet was asymmetric: list-sessions and
delete-session accepted --directory + --output-format and emitted typed
JSON error envelopes, but load-session had neither flag and dumped a raw
Python traceback (including the SessionNotFoundError class name) on a
missing session.
Three concrete impacts this fix closes:
1. Alternate session-store locations (e.g. /tmp/claw-run-XXX/.port_sessions)
were unreachable via load-session; claws had to chdir or monkeypatch
DEFAULT_SESSION_DIR to work around it.
2. Not-found emitted a multi-line Python stack, not a parseable envelope.
Claws deciding retry/escalate/give-up had only exit code 1 to work with.
3. The traceback leaked 'src.session_store.SessionNotFoundError' verbatim,
coupling version-pinned claws to our internal exception class name.
Now all three triplet commands accept the same flag pair and emit the
same JSON error shape:
Success (json mode):
{"session_id": "alpha", "loaded": true, "messages_count": 3,
"input_tokens": 42, "output_tokens": 99}
Not-found:
{"session_id": "missing", "loaded": false,
"error": {"kind": "session_not_found",
"message": "session 'missing' not found in /path",
"directory": "/path", "retryable": false}}
Corrupted file:
{"session_id": "broken", "loaded": false,
"error": {"kind": "session_load_failed",
"message": "...", "directory": "/path",
"retryable": true}}
Exit code contract:
- 0 on successful load
- 1 on not-found (preserves existing $?)
- 1 on OSError/JSONDecodeError (distinct 'kind' in JSON)
Backward compat: legacy 'claw load-session ID' text output unchanged
byte-for-byte. Only new behaviour is the flags and structured error path.
Tests (tests/test_load_session_cli.py, 13 tests):
- TestDirectoryFlagParity (2): --directory works + fallback to CWD/.port_sessions
- TestOutputFormatFlagParity (2): json schema + text-mode backward compat
- TestNotFoundTypedError (2): JSON envelope on not-found; no traceback in
either mode; no internal class name leak
- TestLoadFailedDistinctFromNotFound (1): corrupted file = session_load_failed
with retryable=true, distinct from session_not_found
- TestTripletParityConsistency (6): parametrised over [list, delete, load] *
[--directory, --output-format] — explicit parity guard for future regressions
Full suite: 80/80 passing, zero regression.
Discovered via Jobdori dogfood sweep 2026-04-22 17:44 KST — ran
'claw load-session nonexistent' expecting a clean error, got a Python
traceback. Filed #165 + fixed in same commit.
Closes ROADMAP #165.
#163: run_turn_loop no longer injects f'{prompt} [turn N]' into follow-up
prompts. The suffix was never defined or interpreted anywhere — not by the
engine, not by the system prompt, not by any LLM. It looked like a real
user-typed annotation in the transcript and made replay/analysis fragile.
New behaviour:
- turn 0 submits the original prompt (unchanged)
- turn > 0 submits caller-supplied continuation_prompt if provided, else
the loop stops cleanly — no fabricated user turn
- added continuation_prompt: str | None = None parameter to run_turn_loop
- added --continuation-prompt CLI flag for claws scripting multi-turn loops
- zero '[turn' strings ever appear in mutable_messages or stdout now
Behaviour change for existing callers:
- Before: run_turn_loop(prompt, max_turns=3) submitted 3 turns
('prompt', 'prompt [turn 2]', 'prompt [turn 3]')
- After: run_turn_loop(prompt, max_turns=3) submits 1 turn ('prompt')
- To preserve old multi-turn behaviour, pass continuation_prompt='Continue.'
or any structured follow-up text
One existing timeout test (test_budget_is_cumulative_across_turns) updated
to pass continuation_prompt so the cumulative-budget contract is actually
exercised across turns instead of trivially satisfied by a one-turn loop.
#164 filed: addresses reviewer feedback on #161. The wall-clock timeout
bounds the caller-facing wait, but the underlying submit_message worker
thread keeps running and can mutate engine state after the timeout
TurnResult is returned. A cooperative cancel_event pattern is sketched in
the pinpoint; real asyncio.Task.cancel() support will come once provider
IO is async-native (larger refactor).
Tests (tests/test_run_turn_loop_continuation.py, 8 tests):
- TestNoTurnSuffixInjection (2): zero '[turn' strings in any submitted
prompt, both default and explicit-continuation paths
- TestContinuationDefaultStopsAfterTurnZero (2): default loops run exactly
one turn; engine.submit_message called exactly once despite max_turns=10
- TestExplicitContinuationBehaviour (2): turn 0 = original, turn N = continuation
verbatim; max_turns still respected
- TestCLIContinuationFlag (2): CLI default emits only '## Turn 1';
--continuation-prompt wires through to multi-turn behaviour
Full suite: 67/67 passing.
Closes ROADMAP #163. Files #164.
Previously, QueryEnginePort.submit_message() checked the token budget AFTER
appending the prompt to mutable_messages, transcript_store, and permission_denials,
and AFTER calling compact_messages_if_needed(). On overflow it set
stop_reason='max_budget_reached' but the overflow turn was already committed.
Any caller that persisted the session afterwards wrote the rejected prompt to
disk — the session was silently poisoned even though the TurnResult said the
turn never completed.
Fix:
- Restructure submit_message so the budget check early-returns BEFORE any
mutation of mutable_messages, transcript_store, permission_denials, or
total_usage.
- The returned TurnResult.usage reflects pre-call state (overflow never
advanced the usage counter).
- Normal (in-budget) path unchanged: mutation happens exactly once, at the
end, only on 'completed' results.
This closes the atomicity gap: submit_message is now either 'turn committed'
(stop_reason='completed') or 'turn rejected, state untouched'
(stop_reason in {'max_budget_reached', 'max_turns_reached'}). Callers can
safely retry with a fresh budget or a smaller prompt without worrying about
phantom committed turns from prior rejections.
Tests (tests/test_submit_message_budget.py, 10 tests):
- TestBudgetOverflowDoesNotMutate (5): mutable_messages / transcript /
permission_denials / total_usage / TurnResult.usage all pre-mutation after overflow
- TestOverflowPersistence (2): first-turn overflow persists empty session;
successful-turn-then-overflow persists only the successful turn
- TestEngineUsableAfterOverflow (2): subsequent in-budget call still works
with no residue; repeated overflows don't accumulate hidden state
- TestNormalPathStillCommits (1): regression guard — non-overflow path still
commits mutable_messages/transcript/usage as expected
Full suite: 59/59 passing, zero regression.
Blocker: none. Closes ROADMAP #162.
Previously, run_turn_loop was bounded only by max_turns (turn count). If
engine.submit_message stalled — slow provider, hung network, infinite
stream — the loop blocked indefinitely with no cancellation path. Claws
calling run_turn_loop in CI or orchestration had no reliable way to
enforce a deadline; the loop would hang until OS kill or human intervention.
Fix:
- Add timeout_seconds parameter to run_turn_loop (default None = legacy unbounded).
- When set, each submit_message call runs inside a ThreadPoolExecutor and is
bounded by the remaining wall-clock budget (total across all turns, not per-turn).
- On timeout, synthesize a TurnResult with stop_reason='timeout' carrying the
turn's prompt and routed matches so transcripts preserve orchestration context.
- Exhausted/negative budget short-circuits before calling submit_message.
- Legacy path (timeout_seconds=None) bypasses the executor entirely — zero
overhead for callers that don't opt in.
CLI:
- Added --timeout-seconds flag to 'turn-loop' command.
- Exit code 2 when the loop terminated on timeout (vs 0 for completed),
so shell scripts can distinguish 'done' from 'budget exhausted'.
Tests (tests/test_run_turn_loop_timeout.py, 6 tests):
- Legacy unbounded path unchanged (timeout_seconds=None never emits 'timeout')
- Hung submit_message aborted within budget (0.3s budget, 5s mock hang → exit <1.5s)
- Budget is cumulative across turns (0.6s budget, 0.4s per turn, not per-turn)
- timeout_seconds=0 short-circuits first turn without calling submit_message
- Negative timeout treated as exhausted (guard against caller bugs)
- Timeout TurnResult carries correct prompt, matches, UsageSummary shape
Full suite: 49/49 passing, zero regression.
Blocker: none. Closes ROADMAP #161.
- list_sessions(directory=None) -> list[str]: enumerate stored session IDs
- session_exists(session_id, directory=None) -> bool: check existence without FileNotFoundError
- delete_session(session_id, directory=None) -> bool: unlink a session file
- load_session now raises typed SessionNotFoundError (subclass of KeyError) instead of FileNotFoundError
- Claws can now manage session lifecycle without reaching past the module to glob filesystem
Closes ROADMAP #160. Acceptance: claw can call list_sessions(), session_exists(id), delete_session(id) without importing Path or knowing .port_sessions/<id>.json layout.
## Gap
#77 Phase 1 added machine-readable error kind discriminants and #156 extended
them to text-mode output. However, the hint field is still prose derived from
splitting existing error text — not a stable registry-backed remediation
contract.
Downstream claws inspecting the hint field still need to parse human wording
to decide whether to retry, escalate, or terminate.
## Fix Shape
1. Remediation registry: remediation_for(kind, operation) -> Remediation struct
with action (retry/escalate/terminate/configure), target, and stable message
2. Stable hint outputs per error class (no more prose splitting)
3. Golden fixture tests replacing split_error_hint() string hacks
## Source
gaebal-gajae dogfood sweep 2026-04-22 05:30 KST
## Problem
#77 Phase 1 added machine-readable error `kind` discriminants to JSON error
payloads. Text-mode (stderr) errors still emit prose-only output with no
structured classification.
Observability tools (log aggregators, CI error parsers) parsing stderr can't
distinguish error classes without regex-scraping the prose.
## Fix
Added `[error-kind: <class>]` prefix line to all text-mode error output.
The prefix appears before the error prose, making it immediately parseable by
line-based log tools without any substring matching.
**Examples:**
## Impact
- Stderr observers (log aggregators, CI systems) can now parse error class
from the first line without regex or substring scraping
- Same classifier function used for JSON (#77 P1) and text modes
- Text-mode output remains human-readable (error prose unchanged)
- Prefix format follows syslog/structured-logging conventions
## Tests
All 179 rusty-claude-cli tests pass. Verified on 3 different error classes.
Closes ROADMAP #156.
## Problem
All JSON error payloads had the same three-field envelope:
```json
{"type": "error", "error": "<prose with hint baked in>"}
```
Five distinct error classes were indistinguishable at the schema level:
- missing_credentials (no API key)
- missing_worker_state (no state file)
- session_not_found / session_load_failed
- cli_parse (unrecognized args)
- invalid_model_syntax
Downstream claws had to regex-scrape the prose to route failures.
## Fix
1. **Added `classify_error_kind()`** — prefix/keyword classifier that returns a
snake_case discriminant token for 12 known error classes:
`missing_credentials`, `missing_manifests`, `missing_worker_state`,
`session_not_found`, `session_load_failed`, `no_managed_sessions`,
`cli_parse`, `invalid_model_syntax`, `unsupported_command`,
`unsupported_resumed_command`, `confirmation_required`, `api_http_error`,
plus `unknown` fallback.
2. **Added `split_error_hint()`** — splits multi-line error messages into
(short_reason, optional_hint) so the runbook prose stops being stuffed
into the `error` field.
3. **Extended JSON envelope** at 4 emit sites:
- Main error sink (line ~213)
- Session load failure in resume_session
- Stub command (unsupported_command)
- Unknown resumed command (unsupported_resumed_command)
## New JSON shape
```json
{
"type": "error",
"error": "short reason (first line)",
"kind": "missing_credentials",
"hint": "Hint: export ANTHROPIC_API_KEY..."
}
```
`kind` is always present. `hint` is null when no runbook follows.
`error` now carries only the short reason, not the full multi-line prose.
## Tests
Added 2 new regression tests:
- `classify_error_kind_returns_correct_discriminants` — all 9 known classes + fallback
- `split_error_hint_separates_reason_from_runbook` — with and without hints
All 179 rusty-claude-cli tests pass. Full workspace green.
Closes ROADMAP #77 Phase 1.
## Problem
Two session error messages advertised `.claw/sessions/` as the managed-session
location, but the actual on-disk layout is `.claw/sessions/<workspace_fingerprint>/`
where the fingerprint is a 16-char FNV-1a hash of the CWD path.
Users see error messages like:
```
no managed sessions found in .claw/sessions/
```
But the real directory is:
```
.claw/sessions/8497f4bcf995fc19/
```
The error copy was a direct lie — it made workspace-fingerprint partitioning
invisible and left users confused about whether sessions were lost or just in
a different partition.
## Fix
Updated two error formatters to accept the resolved `sessions_root` path
and extract the actual workspace-fingerprint directory:
1. **format_missing_session_reference**: now shows the actual fingerprint dir
and explains that it's a workspace-specific partition
2. **format_no_managed_sessions**: now shows the actual fingerprint dir and
includes a note that sessions from other CWDs are intentionally invisible
Updated all three call sites to pass `&self.sessions_root` to the formatters.
## Examples
**Before:**
```
no managed sessions found in .claw/sessions/
```
**After:**
```
no managed sessions found in .claw/sessions/8497f4bcf995fc19/
Start `claw` to create a session, then rerun with `--resume latest`.
Note: claw partitions sessions per workspace fingerprint; sessions from other CWDs are invisible.
```
```
session not found: nonexistent-id
Hint: managed sessions live in .claw/sessions/8497f4bcf995fc19/ (workspace-specific partition).
Try `latest` for the most recent session or `/session list` in the REPL.
```
## Impact
- Users can now tell from the error message that they're looking in the right
directory (the one their current CWD maps to)
- The workspace-fingerprint partitioning stops being invisible
- Operators understand why sessions from adjacent CWDs don't appear
- Error copy matches the actual on-disk structure
## Tests
All 466 runtime tests pass. Verified on two real workspaces with actual
workspace-fingerprint directories.
Closes ROADMAP #80.
## Problem
Three interactive slash commands are documented in `claw --help` but have no
corresponding section in USAGE.md:
- `/ultraplan [task]` — Run a deep planning prompt with multi-step reasoning
- `/teleport <symbol-or-path>` — Jump to a file or symbol by searching the workspace
- `/bughunter [scope]` — Inspect the codebase for likely bugs
New users see these commands in the help output but don't know:
- What each command does
- How to use it
- When to use it vs. other commands
- What kind of results to expect
## Fix
Added new section "Advanced slash commands (Interactive REPL only)" to USAGE.md
with documentation for all three commands:
1. **`/ultraplan`** — multi-step reasoning for complex tasks
- Example: `/ultraplan refactor the auth module to use async/await`
- Output: structured plan with numbered steps and reasoning
2. **`/teleport`** — navigate to a file or symbol
- Example: `/teleport UserService`, `/teleport src/auth.rs`
- Output: file content with the requested symbol highlighted
3. **`/bughunter`** — scan for likely bugs
- Example: `/bughunter src/handlers`, `/bughunter` (all)
- Output: list of suspicious patterns with explanations
## Impact
Users can now discover these commands and understand when to use them without
having to guess or search external sources. Bridges the gap between `--help`
output and full documentation.
Also filed ROADMAP #155 documenting the gap.
Closes ROADMAP #155.
## Problem
When a user types `claw --model gpt-4` or `--model qwen-plus`, they get:
```
error: invalid model syntax: 'gpt-4'. Expected provider/model (e.g., anthropic/claude-opus-4-6) or known alias
```
USAGE.md documents that "The error message now includes a hint that names the detected env var" — but this hint does not actually exist. The user has to re-read USAGE.md or guess the correct prefix.
## Fix
Enhance `validate_model_syntax` to detect when a model name looks like it belongs to a different provider:
1. **OpenAI models** (starts with `gpt-` or `gpt_`):
```
Did you mean `openai/gpt-4`? (Requires OPENAI_API_KEY env var)
```
2. **Qwen/DashScope models** (starts with `qwen`):
```
Did you mean `qwen/qwen-plus`? (Requires DASHSCOPE_API_KEY env var)
```
3. **Grok/xAI models** (starts with `grok`):
```
Did you mean `xai/grok-3`? (Requires XAI_API_KEY env var)
```
Unrelated invalid models (e.g., `asdfgh`) do not get a spurious hint.
## Verification
- `claw --model gpt-4` → hints `openai/gpt-4` + `OPENAI_API_KEY`
- `claw --model qwen-plus` → hints `qwen/qwen-plus` + `DASHSCOPE_API_KEY`
- `claw --model grok-3` → hints `xai/grok-3` + `XAI_API_KEY`
- `claw --model asdfgh` → generic error (no hint)
## Tests
Added 3 new assertions in `parses_multiple_diagnostic_subcommands`:
- GPT model error hints openai/ prefix and OPENAI_API_KEY
- Qwen model error hints qwen/ prefix and DASHSCOPE_API_KEY
- Unrelated models don't get a spurious hint
All 177 rusty-claude-cli tests pass.
Closes ROADMAP #154.
## Problem
Users frequently ask after building:
- "Where is the claw binary?"
- "Did the build actually work?"
- "Why can't I run \`claw\` from anywhere?"
This happens because \`cargo build\` puts the binary in \`rust/target/debug/claw\`
(or \`rust/target/release/claw\`), and new users don't know:
1. Where to find it
2. How to test it
3. How to add it to PATH (optional but common follow-up)
## Fix
Added new section "Post-build: locate the binary and verify" to README covering:
1. **Binary location table:** debug vs. release, macOS/Linux vs. Windows paths
2. **Verification commands:** Test the binary with \`--help\` and \`doctor\`
3. **Three ways to add to PATH:**
- Symlink (macOS/Linux): \`ln -s ... /usr/local/bin/claw\`
- cargo install: \`cargo install --path . --force\`
- Shell profile update: add rust/target/debug to \$PATH
4. **Troubleshooting:** Common errors ("command not found", "permission denied",
debug vs. release build speed)
## Impact
New users can now:
- Find the binary immediately after build
- Run it and verify with \`claw doctor\`
- Know their options for system-wide access
Also filed ROADMAP #153 documenting the gap.
Closes ROADMAP #153.
## Problem
Users commonly type `claw doctor --json`, `claw status --json`, or
`claw system-prompt --json` expecting JSON output. These fail with
`unrecognized argument \`--json\` for subcommand` with no hint that
`--output-format json` is the correct flag.
## Discovery
Filed as #152 during 21:17 dogfood nudge. The #127 worktree contained
a more comprehensive patch but conflicted with #141 (unified --help).
On re-investigation of main, Bugs 1 and 3 from #127 are already closed
(positional arg rejection works, no double "error:" prefix). Only
Bug 2 (the `--json` hint) remained.
## Fix
Two call sites add the hint:
1. `parse_single_word_command_alias`'s diagnostic-verb suffix path:
when rest[1] == "--json", append "Did you mean \`--output-format json\`?"
2. `parse_system_prompt_options` unknown-option path: same hint when
the option is exactly `--json`.
## Verification
Before:
$ claw doctor --json
error: unrecognized argument `--json` for subcommand `doctor`
Run `claw --help` for usage.
After:
$ claw doctor --json
error: unrecognized argument `--json` for subcommand `doctor`
Did you mean `--output-format json`?
Run `claw --help` for usage.
Covers: `doctor --json`, `status --json`, `sandbox --json`,
`system-prompt --json`, and any other diagnostic verb that routes
through `parse_single_word_command_alias`.
Other unrecognized args (`claw doctor garbage`) correctly don't
trigger the hint.
## Tests
- 2 new assertions in `parses_multiple_diagnostic_subcommands`:
- `claw doctor --json` produces hint
- `claw doctor garbage` does NOT produce hint
- 177 rusty-claude-cli tests pass
- Workspace tests green
Closes ROADMAP #152.
Filed from nudge directive at 21:17 KST. Implementation exists on worktree
`jobdori-127-verb-suffix` but needs rebase due to merge with #141.
Ready for Phase 1 implementation once conflicts resolved.
## Problem
`workspace_fingerprint(path)` hashes the raw path string without
canonicalization. Two equivalent paths (e.g. `/tmp/foo` vs
`/private/tmp/foo` on macOS) produce different fingerprints and
therefore different session stores. #150 fixed the test-side symptom;
this fixes the underlying product contract.
## Discovery path
#150 fix (canonicalize in test) was a workaround. Q's ack on #150
surfaced the deeper gap: the function itself is still fragile for
any caller passing a non-canonical path:
1. Embedded callers with a raw `--data-dir` path
2. Programmatic `SessionStore::from_cwd(user_path)` calls
3. NixOS store paths, Docker bind mounts, case-insensitive normalization
The REPL's default flow happens to work because `env::current_dir()`
returns canonical paths on macOS. But any caller passing a raw path
risks silent session-store divergence.
## Fix
Canonicalize inside `SessionStore::from_cwd()` and `from_data_dir()`
before computing the fingerprint. Kept `workspace_fingerprint()` itself
as a pure function for determinism — canonicalization is the entry
point's responsibility.
```rust
let canonical_cwd = fs::canonicalize(cwd).unwrap_or_else(|_| cwd.to_path_buf());
let sessions_root = canonical_cwd.join(".claw").join("sessions").join(workspace_fingerprint(&canonical_cwd));
```
Falls back to the raw path if canonicalize fails (directory doesn't
exist yet).
## Test-side updates
Three legacy-session tests expected the non-canonical base path to
match the store's workspace_root. Updated them to canonicalize
`base` after creation — same defensive pattern as #150, now
explicit across all three tests.
## Regression test
Added `session_store_from_cwd_canonicalizes_equivalent_paths` that
creates two stores from equivalent paths (raw vs canonical) and
asserts they resolve to the same sessions_dir.
## Verification
- `cargo test -p runtime session_store_` — 9/9 pass
- `cargo test --workspace` — all green, no FAILED markers
- No behavior change for existing users (REPL default flow already
used canonical paths)
## Backward compatibility
Users on macOS who always went through `env::current_dir()`:
no hash change, sessions resume identically.
Users who ever called with a non-canonical path: hash would change,
but those sessions were already broken (couldn't be resumed from a
canonical-path cwd). Net improvement.
Closes ROADMAP #151.
## #150 Fix: resume_latest test flake
**Problem:** `resume_latest_restores_the_most_recent_managed_session` intermittently
fails when run in the workspace suite or multiple times in sequence, but passes in
isolation.
**Root cause:** `workspace_fingerprint(path)` hashes the path string without
canonicalization. On macOS, `/tmp` is a symlink to `/private/tmp`. The test
creates a temp dir via `std::env::temp_dir().join(...)` which returns
`/var/folders/...` (non-canonical). When the subprocess spawns,
`env::current_dir()` returns the canonical path `/private/var/folders/...`.
The two fingerprints differ, so the subprocess looks in
`.claw/sessions/<hash1>` while files are in `.claw/sessions/<hash2>`.
Session discovery fails.
**Fix:** Call `fs::canonicalize(&project_dir)` after creating the directory
to ensure test and subprocess use identical path representations.
**Verification:** 5 consecutive runs of the full test suite — all pass.
Previously: 5/5 failed when run in sequence.
## #246 Filing: Reminder cron outcome ambiguity (control-loop blocker)
The `clawcode-dogfood-cycle-reminder` cron times out repeatedly with no
structured feedback on whether the nudge was delivered, skipped, or died in-flight.
**Phase 1 outcome schema** — add explicit field to cron result:
- `delivered` — nudge posted to Discord
- `timed_out_before_send` — died before posting
- `timed_out_after_send` — posted but cleanup timed out
- `skipped_due_to_active_cycle` — previous cycle active
- `aborted_gateway_draining` — daemon shutdown
Assigned to gaebal-gajae (cron/orchestration domain). Unblocks trustworthy
dogfood cycle observability.
Closes ROADMAP #150. Filed ROADMAP #246.