Cycle #46 follow-up to cycle #45's #251 implementation. Closes#250's
implementation urgency by aligning docs with reality.
SCHEMAS.md Updates:
For each of the 4 session-management verbs, added:
1. Status marker (Implemented or Stub only)
2. Actual binary envelope (shape produced by the #251-fixed binary)
3. Aspirational (future) shape (original SCHEMAS.md content, preserved as target)
4. Gap notes where the two diverge
Per-verb status:
- list-sessions: Implemented, nested field layout
- load-session: Implemented, nested session object with local session_not_found error
- delete-session: Stub, emits not_yet_implemented (local error, not auth)
- flush-transcript: Stub, emits not_yet_implemented (local error, not auth)
ROADMAP.md Updates:
- #251 marked CLOSED: Full status with commit ref, test counts.
- #250 marked SCOPE-REDUCED: Option A resolved by #251, Option C moot,
only Option B (doc alignment) remains as future cleanup.
Why this matters:
Every code change should close its documentation loop. #251 landed on
the branch, but SCHEMAS.md still described aspirational shapes without
marking which were implemented. Claws reading SCHEMAS.md would have
assumed full conformance and hit surprises. Now the document tells the
truth about which verbs work, which are stubs, and why.
Related:
- #251 implementation on feat/jobdori-251-session-dispatch branch
- #250 scope-reduced to Option B (field-name harmonization)
- #145/#146 parser fall-through fix precedent
Cycle #40: gaebal-gajae conceived #251 in their 00:00 Discord cycle
status but hadn't committed to ROADMAP yet. Jobdori verified their
diagnosis with code trace and formalized into ROADMAP with the proper
framing relationship to #250.
## What This Pinpoint Says
Same observable as #250 (session-management verbs emit missing_credentials
instead of SCHEMAS.md envelope) but reframed at the dispatch-order layer:
- #250 says: surface missing on canonical binary vs SCHEMAS.md promise
- #251 says: top-level parser fall-through happens BEFORE dispatcher
could intercept, so credential resolution runs before the verb is
classified as a purely-local operation
#251's framing is sharper because it identifies WHY the fall-through
produces auth errors, not just that it does.
## Verified Code Trace
- main.rs:1017-1027 is the _other => Prompt catchall
- joins all rest[] tokens into joined, constructs CliAction::Prompt
- downstream resolves credentials -> emits missing_credentials
- No credential call would be needed had the verb been intercepted
Same pattern has been fixed before for other purely-local verbs:
- #145: plugins (main.rs:888-906, explicit match arm)
- #146: config and diff (main.rs:911-935, same shape)
#251 extends this to the 4 session-management verbs.
## Recommended Sequence
1. #251 fix (4 match arms mirroring #145/#146) — principled solution
2. #250's Option B (docs scope note) — guard against future drift
3. #250's Option C (reject with redirect) — unnecessary if #251 lands
## Discipline
Per cycle #24 calibration:
- Red-state bug? Borderline (silent misroute to auth error class)
- Real friction? ✓ (4 documented surfaces emit wrong error class)
- Evidence-backed? ✓ (code trace + prior-fix precedent #145/#146)
- Same-cycle fix? ✗ (filed + document, boundary discipline #36)
- Implementation cost? ~40 lines Rust + tests, bounded
## Credit
Conception: gaebal-gajae (Discord msg 1496526112254328902, 00:00 KST)
Formalization: Jobdori cycle #40 (code trace + precedent linking)
This is the right kind of collaboration: gaebal-gajae saw the dispatch
pattern I had missed in #250 (I framed as surface parity; they framed
as dispatch order). I verified their diagnosis and committed the
ROADMAP entry. Two framings make the pinpoint sharper than either
alone.
Cycle #39 dogfood re-verification of #130 (filed 2026-04-20). All 5
filesystem failure modes reproduce identically on main HEAD 186d42f,
2 days after original filing. Gap is unchanged.
## What's Added
1. **[STILL OPEN — re-verified 2026-04-22 cycle #39]** marker on the
entry so readers can see immediately that the pinpoint hasn't been
accidentally closed.
2. Full 5-mode repro output preserved verbatim for the current HEAD,
so future re-verifications have a concrete baseline to diff against.
3. **New evidence not in original filing**: the classifier actively
chose `kind: "unknown"` rather than just omitting the field. This
means classify_error_kind() has NO substring match for "Is a
directory", "No such file", "Operation not permitted", or "File
exists". The typed-error contract is thus twice-broken on this path.
4. **Pairing with #247/#248/#249 classifier sweep**: the classifier-level
part of #130 could land in the same sweep (add substring branches
for io::ErrorKind strings). The context-preservation part (fix
run_export's bare `?`) is a separate, larger change.
## Why Re-Verification Not Re-Filing
Per cycle #24 discipline: speculative re-filings add noise, real
confirmations add truth. #130 was already filed with exact repros, code
trace, and fix shape. My dogfood hit the same gap on fresh HEAD — the
right output is confirming the gap is still there (not filing #251 for
the same bug).
This is the same pattern as cycle #32's "mark #127 CLOSED" reality-sync:
documentation-drift prevention through explicit status markers.
## New Pattern
"Reality-sync via re-verification" — re-running a filed pinpoint's
repro on fresh HEAD and adding the timestamp + output proves the gap
is still real without inventing new filings. Cycle #24 calibration
keeps ROADMAP entries honest.
Per cycle #24 calibration:
- Red-state bug? ⚠️ borderline (errors surfaced, but kind=unknown is
demonstrably wrong on a path where the system knows the errno)
- Real friction? ✓ (re-verified on fresh HEAD)
- Evidence-backed? ✓ (5-mode repro + classifier trace)
- Same-cycle fix? ✗ (classifier-level part could join #247/#248/#249
sweep; context-preservation part is larger refactor)
- Implementation cost? Classifier part ~10 lines; full context fix ~60 lines
Source: Jobdori cycle #39 proactive dogfood in response to Clawhip
pinpoint nudge. Probed export filesystem errors; discovered this was
#130 reconfirmation, not new bug. Applied reality-sync pattern from
cycle #32.
Cycle #38 dogfood finding. Probed session management via the top-level
subcommand path documented in SCHEMAS.md; discovered the Rust binary
doesn't implement these as top-level subcommands. The literal token
'list-sessions' falls through the _other => Prompt arm and returns
'missing Anthropic credentials' instead of the documented envelope.
## The Gap
SCHEMAS.md documents 14 CLAWABLE top-level subcommands. Python audit
harness (src/main.py) implements all 14. Rust binary implements ~8 of
them as top-level, routing session management through /session slash
commands via --resume instead.
Repro:
$ env -i PATH=$PATH HOME=$HOME claw list-sessions --output-format json
{"error":"missing Anthropic credentials; ...","kind":"missing_credentials"}
$ claw --resume latest /session list --output-format json
{"active":"...","kind":"session_list","sessions":[...]}
$ python3 -m src.main list-sessions --output-format json
{"command":"list-sessions","sessions":[...],"exit_code":0}
Same operation, three different CLI shapes across implementations.
## Classification
This is BOTH:
- a parser-level trust gap (6th in #108/#117/#119/#122/#127 family; same
_other => Prompt fall-through), AND
- a cross-implementation parity gap (SCHEMAS.md at repo root doesn't
match Rust binary's top-level surface)
Unlike prior fall-throughs where the input was malformed, the input
here IS a documented surface. The fall-through is wrong for a different
reason: the surface exists in the protocol but not in this implementation.
## Three Fix Options
Option A: Implement surfaces on Rust binary (highest cost, full parity)
Option B: Scope SCHEMAS.md to Python harness (docs-only)
Option C: Reject at parse time with redirect hint (cheapest, #127 pattern)
Recommended: C first (prevents cred misdirection), then B for docs
hygiene, then A if demand justifies.
## Discipline
Per cycle #24 calibration:
- Red-state bug? ⚠️ borderline — silent misroute to cred error on a
documented surface. Not a crash but a real wrong-contract response.
- Real friction? ✓ (claws reading SCHEMAS.md hit wrong error on canonical binary)
- Evidence-backed? ✓ (dogfood probe + SCHEMAS.md cross-reference + code trace)
- Implementation cost? Option C: ~30 lines (bounded). Option A: larger.
- Same-cycle fix? ✗ (file + document, defer implementation per #36 boundary discipline)
## Family Position
Natural bundle: **#127 + #250** — parser-level fall-through pair with
class distinction. #127 fixed suffix-arg-on-valid-verb case. #250 extends
to 'entire Python-harness verb treated as prompt.' Same fall-through arm,
different entry class.
Source: Jobdori cycle #38 proactive dogfood in response to Clawhip
pinpoint nudge at msg 1496518474019639408. Probed session management CLI
after gaebal-gajae's status sync confirmed no red-state regressions this
cycle; found this cross-implementation surface parity gap by comparing
SCHEMAS.md claims against actual Rust binary behavior.
Cycle #37 dogfood finding post-#247 merge. Two Err arms in the resumed-session
JSON path at main.rs:2747 and main.rs:2783 emit error envelopes WITHOUT the
`kind` field required by the §4.44 typed-envelope contract.
## The Pinpoint
Probed resumed-session slash command JSON path:
$ claw --output-format json --resume latest /session
{"command":"/session","error":"unsupported resumed slash command","type":"error"}
# no kind field
$ claw --output-format json --resume latest /xyz-unknown
{"command":"/xyz-unknown","error":"Unknown slash command: /xyz-unknown\n Help /help lists available slash commands","type":"error"}
# no kind field AND multi-line error without split hint
Compare to happy path which DOES include kind:
$ claw --output-format json --resume latest /session list
{"active":"...","kind":"session_list",...}
Contract awareness exists. It's just not applied in the Err arms.
## Scope
Two atomic fixes in main.rs:
- Line 2747: SlashCommand::parse() Err → add kind via classify_error_kind()
- Line 2783: run_resume_command() Err → add kind + call split_error_hint()
~15 lines Rust total. Bounded.
## Family Classification
§4.44 typed-envelope contract sweep:
- #179 (parse-error real message quality) — closed
- #181 (envelope exit_code matches process exit) — closed
- #247 (classify_error_kind misses prompt-patterns) — closed
- #248 (verb-qualified unknown option errors) — in-flight (another agent)
- **#249 (resumed-session slash error envelopes omit kind) — filed**
Natural bundle #247+#248+#249: classifier/envelope completeness across all
three CLI paths (top-level parse, subcommand options, resumed-session slash).
## Discipline
Per cycle #24 calibration:
- Red-state bug? ✗ (errors surfaced, exit codes correct)
- Real friction? ✓ (typed-error contract violation; claws dispatching on
error.kind get undefined for all resumed slash-command errors)
- Evidence-backed? ✓ (dogfood probe + code trace identified both Err arms)
- Implementation cost? ~15 lines (bounded)
- Same-cycle fix? ✗ (Rust change, deferred per file-not-fix discipline)
## Not Implementing This Cycle
Per the boundary discipline established in cycle #36: I don't touch another
agent's in-flight work, and I don't implement a Rust fix same-cycle when
the pattern is "file + document + let owner/maintainer decide."
Filing with concrete fix shape is the correct output. If demand or red-state
symptoms arrive, implementation can follow the same path as #247: file →
fix in branch → review → merge.
Source: Jobdori cycle #37 proactive dogfood in response to Clawhip pinpoint
nudge at msg 1496518474019639408.
Cycle #34 dogfood follow-through on Jobdori cycle #33 pinpoint (#247 filed
at fbcbe9d). Closes the two typed-error contract drifts surfaced in that
pinpoint against the Rust `claw` binary.
## What was wrong
1. `classify_error_kind()` (main.rs:~251) used substring matching but did
NOT match two common prompt-related parse errors:
- "prompt subcommand requires a prompt string"
- "empty prompt: provide a subcommand..."
Both fell through to `"unknown"`. §4.44 typed-error contract specifies
`parse | usage | unknown` as distinct classes, so claws dispatching on
`error.kind == "cli_parse"` missed those paths entirely.
2. JSON mode dropped the `Run `claw --help` for usage.` hint. Text mode
appends it at stderr-print time (main.rs:~234) AFTER split_error_hint()
has already serialized the envelope, so JSON consumers never saw it.
Text-mode humans got an actionable pointer; machine consumers did not.
## Fix
Two small, targeted edits:
1. `classify_error_kind()`: add explicit branches for "prompt subcommand
requires" and "empty prompt:" (the latter anchored with `starts_with`
so it never hijacks unrelated error messages containing the word).
Both route to `cli_parse`.
2. JSON error render path in `main()`: after calling split_error_hint(),
if the message carried no embedded hint AND kind is `cli_parse` AND
the short-reason does not already embed a `claw --help` pointer,
synthesize the same `Run `claw --help` for usage.` trailer that
text-mode stderr appends. The embedded-pointer check prevents
duplication on the `empty prompt: ... (run `claw --help`)` message
which already carries inline guidance.
## Verification
Direct repro on the compiled binary:
$ claw --output-format json prompt
{"error":"prompt subcommand requires a prompt string",
"hint":"Run `claw --help` for usage.",
"kind":"cli_parse","type":"error"}
$ claw --output-format json ""
{"error":"empty prompt: provide a subcommand (run `claw --help`) or a non-empty prompt string",
"hint":null,"kind":"cli_parse","type":"error"}
$ claw --output-format json doctor --foo # regression guard
{"error":"unrecognized argument `--foo` for subcommand `doctor`",
"hint":"Run `claw --help` for usage.",
"kind":"cli_parse","type":"error"}
Text mode unchanged in shape; `[error-kind: ...]` prefix now reads
`cli_parse` for the two previously-misclassified paths.
## Regression coverage
- Unit test `classify_error_kind_covers_prompt_parse_errors_247`: locks
both patterns route to `cli_parse` AND that generic "prompt"-containing
messages still fall through to `unknown`.
- Integration tests in `tests/output_format_contract.rs`:
* prompt_subcommand_without_arg_emits_cli_parse_envelope_with_hint_247
* empty_positional_arg_emits_cli_parse_envelope_247
* whitespace_only_positional_arg_emits_cli_parse_envelope_247
* unrecognized_argument_still_classifies_as_cli_parse_247_regression_guard
- Full rusty-claude-cli test suite: 218 tests pass (180 bin unit + 15
output_format_contract + 12 resume_slash + 7 compact + 3 mock + 1 cli).
## Family / related
Joins §4.44 typed-envelope contract gap family closure: #130, #179, #181,
and now **#247**. All four quartet items now have real fixes landed on
the canonical binary surface rather than only the Python harness.
ROADMAP.md: #247 marked CLOSED with before/after evidence preserved.
Cycle #33 dogfood finding from direct probe of Rust claw binary:
## The Pinpoint
Two related contract drifts in the typed-error envelope:
### 1. Error-kind misclassification
`classify_error_kind()` at main.rs:246-280 uses substring matching but
does NOT match two common parse error messages:
- "prompt subcommand requires a prompt string" → classified as 'unknown'
- "empty prompt: provide a subcommand..." → classified as 'unknown'
The §4.44 typed-error contract specifies 'parse | usage | unknown' as
DISTINCT classes. Known parse errors should be 'cli_parse', not 'unknown'.
### 2. Hint lost in JSON mode
Text mode appends 'Run `claw --help` for usage.' to parse errors.
JSON mode emits 'hint: null'. The trailer is added at the stderr-print
stage AFTER split_error_hint() has already serialized the envelope, so
JSON consumers never see it.
## Repro
Dogfooded on main HEAD dd0993c (cycle #33):
$ claw --output-format json prompt
{"error":"prompt subcommand requires a prompt string","hint":null,"kind":"unknown","type":"error"}
Expected: kind="cli_parse" + hint="Run \\`claw --help\\` for usage."
## Impact
- Claws dispatching on typed error.kind fall back to substring matching
- JSON consumers lose actionable hint that text-mode users see
- Joins JSON envelope field-quality family (#90, #91, #92, #110, #115,
#116, #130, #179, #181, #247)
## Fix Shape
1. Add prompt-pattern clauses to classify_error_kind() (~4 lines)
2. Move hint plumbing to BEFORE JSON envelope serialization (~15 lines)
3. Add golden-fixture regression tests per cycle #30 pattern
Not a red-state bug (error IS surfaced, exit code IS correct), but real
contract drift. Deferred for implementation; filed per Clawhip nudge
to 'add one concrete follow-up to ROADMAP.md'.
Per cycle #24 calibration:
- Red-state bug? ✗ (errors exit 1 correctly)
- Real friction? ✓ (typed-error contract drift)
- Evidence-backed? ✓ (dogfood probe + code trace identified both leaks)
- Implementation cost? ~20 lines Rust (bounded)
- Demand signal needed? Medium — any claw doing error.kind dispatch on
prompt-path errors is affected
Source: Jobdori cycle #33 direct dogfood 2026-04-22 22:30 KST in response
to Clawhip pinpoint nudge at msg 1496503374621970583.
Cycle #32 dogfood finding: #127 was fixed on main via `a3270db` + `79352a2`
(2026-04-20), but the ROADMAP.md entry still lacked a [CLOSED] marker.
The in-flight branches `feat/jobdori-127-clean` and
`feat/jobdori-127-verb-suffix-flags` were superseded and are now obsolete.
## What This Fixes
**Documentation drift:** Pinpoint #127 was complete in code but unmarked
in ROADMAP. New contributors checking the roadmap would see it as open
work, potentially duplicating effort.
**Stale branches:** Two branches (`feat/jobdori-127-clean`,
`feat/jobdori-127-verb-suffix-flags`) contain the fix attempt bundled
with an unrelated large-scope refactor (5365 lines removed from
ROADMAP.md, root-level governance docs deleted, command infra refactored).
Their fix was superseded; branches are functionally obsolete.
## Verification
Re-verified all 4 #127 scenarios pass on main HEAD `b903e16`:
$ claw doctor --json → rejected with "did you mean" hint
$ claw doctor garbage → rejected
$ claw doctor --unknown-flag → rejected
$ claw doctor --output-format json → works (canonical form)
All behavior matches #127 acceptance criteria.
## Cluster Impact
Post-closure: **parser-level trust gap quintet (#108 + #117 + #119 + #122
+ #127) is 5/5 closed**. The `_other => Prompt` fall-through audit is
complete.
## Discipline Check
Per cycle #24 calibration:
- Red-state bug? ✗ (behavior is correct on main)
- Real friction? ✓ (ROADMAP drift; obsolete branches adrift)
- Evidence-backed? ✓ (dogfood probe confirmed closure; git log confirmed
supersession; branch diff confirmed scope contamination)
## Relationship to Gaebal-gajae's Option A Guidance
Cycle #32 started by proposing separating the #127 fix from the attached
refactor. On deeper probe, discovered the fix was already superseded on
main via different commits. Option A (separate the fix) is retroactively
satisfied: the fix landed cleanly, the refactor never did.
The remaining action is governance hygiene: mark closure, document
supersession, flag obsolete branches for deletion.
## Next Actions (not in this commit)
- Delete `feat/jobdori-127-clean` locally and on fork (after confirmation)
- Delete `feat/jobdori-127-verb-suffix-flags` locally and on fork
- Monitor whether any attached refactor content should be re-proposed in
its own scoped PR
Source: Jobdori cycle #32 dogfood in response to Clawhip 10-min nudge.
Proposed Option A (separate fix from refactor); probe revealed the fix
already landed via a different commit path, rendering the refactor-only
branch obsolete.
Cycle #24 dogfood discovery.
Running proactive edge-case dogfood on the JSON contract, hit a real pinpoint:
--help and --version are outside the parser-front-door contract.
The gap:
1. "claw --help --output-format json" returns text (not envelope)
2. "claw bootstrap --help --output-format json" returns text (not envelope)
3. "claw --version" doesn't exist at all
Why it matters:
- Claws can't programmatically discover the CLI surface
- Version checking requires side-effectful commands
- Natural follow-up gap to #178/#179 parser-front-door work
Discoverability scenarios:
- Orchestrator checking whether a new command (e.g., turn-loop) is available
- Version compat check before dispatching work
- Enumerating available commands for routing decisions
Filed as Pinpoint #180 in ROADMAP.md with:
- Gap description + 3-case repro
- Impact analysis (version compat, surface enumeration, governance)
- Root cause (argparse default HelpAction prints text + exits)
- Fix shape (3 stages, ~40 lines total)
- Stage A: --version + JSON envelope version metadata
- Stage B: --help JSON routing via custom HelpAction
- Stage C: optional 'schema-info' command for pre-dispatch discovery
- Acceptance criteria (4 cases, including backward compat)
- Priority: Medium (not red-state, but real discoverability gap)
Status: **Filed, implementation deferred.**
Following maintainership equilibrium: pinpoints stay documented but don't
force code changes. If external demand arrives (claw author building a
dispatcher, orchestrator doing version checks), the fix can ship in one
cycle using the shape already documented.
No code changes this cycle. Pure ROADMAP filing.
Continues the maintainership pattern: find friction, document it, defer
until evidence-backed demand arrives.
Source: Jobdori proactive dogfood at 2026-04-22 20:58 KST.
The #160 session-lifecycle CLI triplet was asymmetric: list-sessions and
delete-session accepted --directory + --output-format and emitted typed
JSON error envelopes, but load-session had neither flag and dumped a raw
Python traceback (including the SessionNotFoundError class name) on a
missing session.
Three concrete impacts this fix closes:
1. Alternate session-store locations (e.g. /tmp/claw-run-XXX/.port_sessions)
were unreachable via load-session; claws had to chdir or monkeypatch
DEFAULT_SESSION_DIR to work around it.
2. Not-found emitted a multi-line Python stack, not a parseable envelope.
Claws deciding retry/escalate/give-up had only exit code 1 to work with.
3. The traceback leaked 'src.session_store.SessionNotFoundError' verbatim,
coupling version-pinned claws to our internal exception class name.
Now all three triplet commands accept the same flag pair and emit the
same JSON error shape:
Success (json mode):
{"session_id": "alpha", "loaded": true, "messages_count": 3,
"input_tokens": 42, "output_tokens": 99}
Not-found:
{"session_id": "missing", "loaded": false,
"error": {"kind": "session_not_found",
"message": "session 'missing' not found in /path",
"directory": "/path", "retryable": false}}
Corrupted file:
{"session_id": "broken", "loaded": false,
"error": {"kind": "session_load_failed",
"message": "...", "directory": "/path",
"retryable": true}}
Exit code contract:
- 0 on successful load
- 1 on not-found (preserves existing $?)
- 1 on OSError/JSONDecodeError (distinct 'kind' in JSON)
Backward compat: legacy 'claw load-session ID' text output unchanged
byte-for-byte. Only new behaviour is the flags and structured error path.
Tests (tests/test_load_session_cli.py, 13 tests):
- TestDirectoryFlagParity (2): --directory works + fallback to CWD/.port_sessions
- TestOutputFormatFlagParity (2): json schema + text-mode backward compat
- TestNotFoundTypedError (2): JSON envelope on not-found; no traceback in
either mode; no internal class name leak
- TestLoadFailedDistinctFromNotFound (1): corrupted file = session_load_failed
with retryable=true, distinct from session_not_found
- TestTripletParityConsistency (6): parametrised over [list, delete, load] *
[--directory, --output-format] — explicit parity guard for future regressions
Full suite: 80/80 passing, zero regression.
Discovered via Jobdori dogfood sweep 2026-04-22 17:44 KST — ran
'claw load-session nonexistent' expecting a clean error, got a Python
traceback. Filed #165 + fixed in same commit.
Closes ROADMAP #165.
#163: run_turn_loop no longer injects f'{prompt} [turn N]' into follow-up
prompts. The suffix was never defined or interpreted anywhere — not by the
engine, not by the system prompt, not by any LLM. It looked like a real
user-typed annotation in the transcript and made replay/analysis fragile.
New behaviour:
- turn 0 submits the original prompt (unchanged)
- turn > 0 submits caller-supplied continuation_prompt if provided, else
the loop stops cleanly — no fabricated user turn
- added continuation_prompt: str | None = None parameter to run_turn_loop
- added --continuation-prompt CLI flag for claws scripting multi-turn loops
- zero '[turn' strings ever appear in mutable_messages or stdout now
Behaviour change for existing callers:
- Before: run_turn_loop(prompt, max_turns=3) submitted 3 turns
('prompt', 'prompt [turn 2]', 'prompt [turn 3]')
- After: run_turn_loop(prompt, max_turns=3) submits 1 turn ('prompt')
- To preserve old multi-turn behaviour, pass continuation_prompt='Continue.'
or any structured follow-up text
One existing timeout test (test_budget_is_cumulative_across_turns) updated
to pass continuation_prompt so the cumulative-budget contract is actually
exercised across turns instead of trivially satisfied by a one-turn loop.
#164 filed: addresses reviewer feedback on #161. The wall-clock timeout
bounds the caller-facing wait, but the underlying submit_message worker
thread keeps running and can mutate engine state after the timeout
TurnResult is returned. A cooperative cancel_event pattern is sketched in
the pinpoint; real asyncio.Task.cancel() support will come once provider
IO is async-native (larger refactor).
Tests (tests/test_run_turn_loop_continuation.py, 8 tests):
- TestNoTurnSuffixInjection (2): zero '[turn' strings in any submitted
prompt, both default and explicit-continuation paths
- TestContinuationDefaultStopsAfterTurnZero (2): default loops run exactly
one turn; engine.submit_message called exactly once despite max_turns=10
- TestExplicitContinuationBehaviour (2): turn 0 = original, turn N = continuation
verbatim; max_turns still respected
- TestCLIContinuationFlag (2): CLI default emits only '## Turn 1';
--continuation-prompt wires through to multi-turn behaviour
Full suite: 67/67 passing.
Closes ROADMAP #163. Files #164.
## Gap
#77 Phase 1 added machine-readable error kind discriminants and #156 extended
them to text-mode output. However, the hint field is still prose derived from
splitting existing error text — not a stable registry-backed remediation
contract.
Downstream claws inspecting the hint field still need to parse human wording
to decide whether to retry, escalate, or terminate.
## Fix Shape
1. Remediation registry: remediation_for(kind, operation) -> Remediation struct
with action (retry/escalate/terminate/configure), target, and stable message
2. Stable hint outputs per error class (no more prose splitting)
3. Golden fixture tests replacing split_error_hint() string hacks
## Source
gaebal-gajae dogfood sweep 2026-04-22 05:30 KST
## Problem
Three interactive slash commands are documented in `claw --help` but have no
corresponding section in USAGE.md:
- `/ultraplan [task]` — Run a deep planning prompt with multi-step reasoning
- `/teleport <symbol-or-path>` — Jump to a file or symbol by searching the workspace
- `/bughunter [scope]` — Inspect the codebase for likely bugs
New users see these commands in the help output but don't know:
- What each command does
- How to use it
- When to use it vs. other commands
- What kind of results to expect
## Fix
Added new section "Advanced slash commands (Interactive REPL only)" to USAGE.md
with documentation for all three commands:
1. **`/ultraplan`** — multi-step reasoning for complex tasks
- Example: `/ultraplan refactor the auth module to use async/await`
- Output: structured plan with numbered steps and reasoning
2. **`/teleport`** — navigate to a file or symbol
- Example: `/teleport UserService`, `/teleport src/auth.rs`
- Output: file content with the requested symbol highlighted
3. **`/bughunter`** — scan for likely bugs
- Example: `/bughunter src/handlers`, `/bughunter` (all)
- Output: list of suspicious patterns with explanations
## Impact
Users can now discover these commands and understand when to use them without
having to guess or search external sources. Bridges the gap between `--help`
output and full documentation.
Also filed ROADMAP #155 documenting the gap.
Closes ROADMAP #155.
## Problem
When a user types `claw --model gpt-4` or `--model qwen-plus`, they get:
```
error: invalid model syntax: 'gpt-4'. Expected provider/model (e.g., anthropic/claude-opus-4-6) or known alias
```
USAGE.md documents that "The error message now includes a hint that names the detected env var" — but this hint does not actually exist. The user has to re-read USAGE.md or guess the correct prefix.
## Fix
Enhance `validate_model_syntax` to detect when a model name looks like it belongs to a different provider:
1. **OpenAI models** (starts with `gpt-` or `gpt_`):
```
Did you mean `openai/gpt-4`? (Requires OPENAI_API_KEY env var)
```
2. **Qwen/DashScope models** (starts with `qwen`):
```
Did you mean `qwen/qwen-plus`? (Requires DASHSCOPE_API_KEY env var)
```
3. **Grok/xAI models** (starts with `grok`):
```
Did you mean `xai/grok-3`? (Requires XAI_API_KEY env var)
```
Unrelated invalid models (e.g., `asdfgh`) do not get a spurious hint.
## Verification
- `claw --model gpt-4` → hints `openai/gpt-4` + `OPENAI_API_KEY`
- `claw --model qwen-plus` → hints `qwen/qwen-plus` + `DASHSCOPE_API_KEY`
- `claw --model grok-3` → hints `xai/grok-3` + `XAI_API_KEY`
- `claw --model asdfgh` → generic error (no hint)
## Tests
Added 3 new assertions in `parses_multiple_diagnostic_subcommands`:
- GPT model error hints openai/ prefix and OPENAI_API_KEY
- Qwen model error hints qwen/ prefix and DASHSCOPE_API_KEY
- Unrelated models don't get a spurious hint
All 177 rusty-claude-cli tests pass.
Closes ROADMAP #154.
## Problem
Users frequently ask after building:
- "Where is the claw binary?"
- "Did the build actually work?"
- "Why can't I run \`claw\` from anywhere?"
This happens because \`cargo build\` puts the binary in \`rust/target/debug/claw\`
(or \`rust/target/release/claw\`), and new users don't know:
1. Where to find it
2. How to test it
3. How to add it to PATH (optional but common follow-up)
## Fix
Added new section "Post-build: locate the binary and verify" to README covering:
1. **Binary location table:** debug vs. release, macOS/Linux vs. Windows paths
2. **Verification commands:** Test the binary with \`--help\` and \`doctor\`
3. **Three ways to add to PATH:**
- Symlink (macOS/Linux): \`ln -s ... /usr/local/bin/claw\`
- cargo install: \`cargo install --path . --force\`
- Shell profile update: add rust/target/debug to \$PATH
4. **Troubleshooting:** Common errors ("command not found", "permission denied",
debug vs. release build speed)
## Impact
New users can now:
- Find the binary immediately after build
- Run it and verify with \`claw doctor\`
- Know their options for system-wide access
Also filed ROADMAP #153 documenting the gap.
Closes ROADMAP #153.
Filed from nudge directive at 21:17 KST. Implementation exists on worktree
`jobdori-127-verb-suffix` but needs rebase due to merge with #141.
Ready for Phase 1 implementation once conflicts resolved.
## Problem
`workspace_fingerprint(path)` hashes the raw path string without
canonicalization. Two equivalent paths (e.g. `/tmp/foo` vs
`/private/tmp/foo` on macOS) produce different fingerprints and
therefore different session stores. #150 fixed the test-side symptom;
this fixes the underlying product contract.
## Discovery path
#150 fix (canonicalize in test) was a workaround. Q's ack on #150
surfaced the deeper gap: the function itself is still fragile for
any caller passing a non-canonical path:
1. Embedded callers with a raw `--data-dir` path
2. Programmatic `SessionStore::from_cwd(user_path)` calls
3. NixOS store paths, Docker bind mounts, case-insensitive normalization
The REPL's default flow happens to work because `env::current_dir()`
returns canonical paths on macOS. But any caller passing a raw path
risks silent session-store divergence.
## Fix
Canonicalize inside `SessionStore::from_cwd()` and `from_data_dir()`
before computing the fingerprint. Kept `workspace_fingerprint()` itself
as a pure function for determinism — canonicalization is the entry
point's responsibility.
```rust
let canonical_cwd = fs::canonicalize(cwd).unwrap_or_else(|_| cwd.to_path_buf());
let sessions_root = canonical_cwd.join(".claw").join("sessions").join(workspace_fingerprint(&canonical_cwd));
```
Falls back to the raw path if canonicalize fails (directory doesn't
exist yet).
## Test-side updates
Three legacy-session tests expected the non-canonical base path to
match the store's workspace_root. Updated them to canonicalize
`base` after creation — same defensive pattern as #150, now
explicit across all three tests.
## Regression test
Added `session_store_from_cwd_canonicalizes_equivalent_paths` that
creates two stores from equivalent paths (raw vs canonical) and
asserts they resolve to the same sessions_dir.
## Verification
- `cargo test -p runtime session_store_` — 9/9 pass
- `cargo test --workspace` — all green, no FAILED markers
- No behavior change for existing users (REPL default flow already
used canonical paths)
## Backward compatibility
Users on macOS who always went through `env::current_dir()`:
no hash change, sessions resume identically.
Users who ever called with a non-canonical path: hash would change,
but those sessions were already broken (couldn't be resumed from a
canonical-path cwd). Net improvement.
Closes ROADMAP #151.
## #150 Fix: resume_latest test flake
**Problem:** `resume_latest_restores_the_most_recent_managed_session` intermittently
fails when run in the workspace suite or multiple times in sequence, but passes in
isolation.
**Root cause:** `workspace_fingerprint(path)` hashes the path string without
canonicalization. On macOS, `/tmp` is a symlink to `/private/tmp`. The test
creates a temp dir via `std::env::temp_dir().join(...)` which returns
`/var/folders/...` (non-canonical). When the subprocess spawns,
`env::current_dir()` returns the canonical path `/private/var/folders/...`.
The two fingerprints differ, so the subprocess looks in
`.claw/sessions/<hash1>` while files are in `.claw/sessions/<hash2>`.
Session discovery fails.
**Fix:** Call `fs::canonicalize(&project_dir)` after creating the directory
to ensure test and subprocess use identical path representations.
**Verification:** 5 consecutive runs of the full test suite — all pass.
Previously: 5/5 failed when run in sequence.
## #246 Filing: Reminder cron outcome ambiguity (control-loop blocker)
The `clawcode-dogfood-cycle-reminder` cron times out repeatedly with no
structured feedback on whether the nudge was delivered, skipped, or died in-flight.
**Phase 1 outcome schema** — add explicit field to cron result:
- `delivered` — nudge posted to Discord
- `timed_out_before_send` — died before posting
- `timed_out_after_send` — posted but cleanup timed out
- `skipped_due_to_active_cycle` — previous cycle active
- `aborted_gateway_draining` — daemon shutdown
Assigned to gaebal-gajae (cron/orchestration domain). Unblocks trustworthy
dogfood cycle observability.
Closes ROADMAP #150. Filed ROADMAP #246.
## Problem
`runtime::config::tests::validates_unknown_top_level_keys_with_line_and_field_name`
intermittently fails during `cargo test --workspace` (witnessed during
#147 and #148 workspace runs) but passes deterministically in isolation.
Example failure from workspace run:
test result: FAILED. 464 passed; 1 failed
## Root cause
`runtime/src/config.rs::tests::temp_dir()` used nanosecond timestamp
alone for namespace isolation:
std::env::temp_dir().join(format!("runtime-config-{nanos}"))
Under parallel test execution on fast machines with coarse clock
resolution, two tests start within the same nanosecond bucket and
collide on the same path. One test's `fs::remove_dir_all(root)` then
races another's in-flight `fs::create_dir_all()`.
Other crates already solved this pattern:
- plugins::tests::temp_dir(label) — label-parameterized
- runtime::git_context::tests::temp_dir(label) — label-parameterized
runtime/src/config.rs was missed.
## Fix
Added process id + monotonically-incrementing atomic counter to the
namespace, making every callsite provably unique regardless of clock
resolution or scheduling:
static COUNTER: AtomicU64 = AtomicU64::new(0);
let pid = std::process::id();
let seq = COUNTER.fetch_add(1, Ordering::Relaxed);
std::env::temp_dir().join(format!("runtime-config-{pid}-{nanos}-{seq}"))
Chose counter+pid over the label-parameterized pattern to avoid
touching all 20 callsites in the same commit (mechanical noise with
no added safety — counter alone is sufficient).
## Verification
Before: one failure per workspace run (config test flake).
After: 5 consecutive `cargo test --workspace` runs — zero config
test failures. Only pre-existing `resume_latest` flake remains
(orthogonal, unrelated to this change).
for i in 1 2 3 4 5; do cargo test --workspace; done
# All 5 runs: config tests green. Only resume_latest flake appears.
cargo test -p runtime
# 465 passed; 0 failed
## ROADMAP.md
Added Pinpoint #149 documenting the gap, root cause, and fix.
Closes ROADMAP #149.
## Scope
Two deltas in one commit:
### #128 closure (docs)
Re-verified on main HEAD `4cb8fa0`: malformed `--model` strings already
rejected at parse time (`validate_model_syntax` in parse_args). All
historical repro cases now produce specific errors:
claw --model '' → error: model string cannot be empty
claw --model 'bad model' → error: invalid model syntax: 'bad model' contains spaces
claw --model 'sonet' → error: invalid model syntax: 'sonet'. Expected provider/model or known alias
claw --model '@invalid' → error: invalid model syntax: '@invalid'. Expected provider/model ...
claw --model 'totally-not-real-xyz' → error: invalid model syntax: ...
claw --model sonnet → ok, resolves to claude-sonnet-4-6
claw --model anthropic/claude-opus-4-6 → ok, passes through
Marked #128 CLOSED in ROADMAP with repro block. Residual provenance gap
split off as #148.
### #148 implementation
**Problem.** After #128 closure, `claw status --output-format json`
still surfaces only the resolved model string. No way for a claw to
distinguish whether `claude-sonnet-4-6` came from `--model sonnet`
(alias resolution) vs `--model claude-sonnet-4-6` (pass-through) vs
`ANTHROPIC_MODEL` env vs `.claw.json` config vs compiled-in default.
Debug forensics had to re-read argv instead of reading a structured
field. Clawhip orchestrators sending `--model` couldn't confirm the
flag was honored vs falling back to default.
**Fix.** Added two fields to status JSON envelope:
- `model_source`: "flag" | "env" | "config" | "default"
- `model_raw`: user's input before alias resolution (null on default)
Text mode appends a `Model source` line under `Model`, showing the
source and raw input (e.g. `Model source flag (raw: sonnet)`).
**Resolution order** (mirrors resolve_repl_model but with source
attribution):
1. If `--model` / `--model=` flag supplied → source: flag, raw: flag value
2. Else if ANTHROPIC_MODEL set → source: env, raw: env value
3. Else if `.claw.json` model key set → source: config, raw: config value
4. Else → source: default, raw: null
## Changes
### rust/crates/rusty-claude-cli/src/main.rs
- Added `ModelSource` enum (Flag/Env/Config/Default) with `as_str()`.
- Added `ModelProvenance` struct (resolved, raw, source) with
three constructors: `default_fallback()`, `from_flag(raw)`, and
`from_env_or_config_or_default(cli_model)`.
- Added `model_flag_raw: Option<String>` field to `CliAction::Status`.
- Parse loop captures raw input in `--model` and `--model=` arms.
- Extended `parse_single_word_command_alias` to thread
`model_flag_raw: Option<&str>` through.
- Extended `print_status_snapshot` signature to accept
`model_flag_raw: Option<&str>`. Resolves provenance at dispatch time
(flag provenance from arg; else probe env/config/default).
- Extended `status_json_value` signature with
`provenance: Option<&ModelProvenance>`. On Some, adds `model_source`
and `model_raw` fields; on None (legacy resume paths), omits them
for backward compat.
- Extended `format_status_report` signature with optional provenance.
On Some, renders `Model source` line after `Model`.
- Updated all existing callers (REPL /status, resume /status, tests)
to pass None (legacy paths don't carry flag provenance).
- Added 2 regression assertions in parse_args test covering both
`--model sonnet` and `--model=...` forms.
### ROADMAP.md
- Marked #128 CLOSED with re-verification block.
- Filed #148 documenting the provenance gap split, fix shape, and
acceptance criteria.
## Live verification
$ claw --model sonnet --output-format json status | jq '{model,model_source,model_raw}'
{"model": "claude-sonnet-4-6", "model_source": "flag", "model_raw": "sonnet"}
$ claw --output-format json status | jq '{model,model_source,model_raw}'
{"model": "claude-opus-4-6", "model_source": "default", "model_raw": null}
$ ANTHROPIC_MODEL=haiku claw --output-format json status | jq '{model,model_source,model_raw}'
{"model": "claude-haiku-4-5-20251213", "model_source": "env", "model_raw": "haiku"}
$ echo '{"model":"claude-opus-4-7"}' > .claw.json && claw --output-format json status | jq '{model,model_source,model_raw}'
{"model": "claude-opus-4-7", "model_source": "config", "model_raw": "claude-opus-4-7"}
$ claw --model sonnet status
Status
Model claude-sonnet-4-6
Model source flag (raw: sonnet)
Permission mode danger-full-access
...
## Tests
- rusty-claude-cli bin: 177 tests pass (2 new assertions for #148)
- Full workspace green except pre-existing resume_latest flake (unrelated)
Closes ROADMAP #128, #148.