claw-code

mirror of https://github.com/ultraworkers/claw-code.git synced 2026-04-24 13:08:11 +08:00

Author	SHA1	Message	Date
YeonGyu-Kim	36883ba4c2	roadmap: cluster update — #161 elevated to diagnostic-strictness family (per gaebal-gajae reframe)	2026-04-23 03:35:03 +09:00
YeonGyu-Kim	f18f45c0cf	roadmap: #161 filed — claw version stale SHA in worktrees (build.rs rerun-if-changed misses worktree HEAD)	2026-04-23 03:31:40 +09:00
YeonGyu-Kim	946e43e0c7	roadmap: doctrine extension — integration support artifacts as first-class deliverable at scale (from #64 framing)	2026-04-23 03:27:19 +09:00
YeonGyu-Kim	ad1cf92620	roadmap: canonical worked example of doctrine loop (#61–#63 sequence preserved for future claws)	2026-04-23 03:17:41 +09:00
YeonGyu-Kim	6a3913e278	roadmap: #160 SHIPPED — reserved-verb classification landed (cycle #63 )	2026-04-23 03:16:34 +09:00
YeonGyu-Kim	b54eacaa6e	roadmap: cycle-pattern doctrine — how violation → reframe → protocol loops create self-enforcing doctrine (from #61–#62)	2026-04-23 03:09:13 +09:00
YeonGyu-Kim	51cee23a27	roadmap: principle — integration bandwidth as constraint when queue is saturated (from #62 framing)	2026-04-23 03:06:22 +09:00
YeonGyu-Kim	35fee5ecde	roadmap: #160 investigation update — verb classification table needed for clean fix	2026-04-23 03:04:50 +09:00
YeonGyu-Kim	f034b01733	roadmap: #160 filed — resume with positional args falls through to Prompt dispatch (#251 family)	2026-04-23 03:02:30 +09:00
YeonGyu-Kim	c4054d2fa3	roadmap: principle — backlog truthfulness is execution speed (from #60 framing)	2026-04-23 02:56:23 +09:00
YeonGyu-Kim	7bd91096a8	roadmap: #136 + #153b marked CLOSED — compact+json already correct, PATH docs comprehensive	2026-04-23 02:55:22 +09:00
YeonGyu-Kim	196fe6b493	roadmap: #136 marked CLOSED — compact+json dispatch already correct	2026-04-23 02:54:41 +09:00
YeonGyu-Kim	dc8b275c9f	roadmap: principle — cycle cadence (hygiene cycles are first-class, from #59 framing)	2026-04-23 02:45:43 +09:00
YeonGyu-Kim	8f4f215e27	roadmap: diagnostic-strictness audit checklist (from cycles #57-#58)	2026-04-23 02:38:06 +09:00
YeonGyu-Kim	86b98d07e9	roadmap: principle — diagnostic surfaces must be at least as strict as runtime (from #122 framing)	2026-04-23 02:25:45 +09:00
YeonGyu-Kim	cb8839e050	roadmap: cluster closure + defer #155/#156 design questions (config section validation, mcp/agents soft-warning)	2026-04-23 02:18:46 +09:00
YeonGyu-Kim	41b0006eea	roadmap: cluster closure note — help-parity family complete (#130c, #130d, #130e)	2026-04-23 02:10:07 +09:00
YeonGyu-Kim	762e9bb212	roadmap: file #130e — help-parity sweep reveals 5 additional anomalies (3 dispatch-order, 2 surface)	2026-04-23 02:00:59 +09:00
YeonGyu-Kim	5e29430d4f	roadmap: file #130d — config command silently ignores --help, displays config dump instead	2026-04-23 01:53:31 +09:00
YeonGyu-Kim	0d8adceb67	roadmap: file #130c — pure-local commands reject --help as extra argument (diff, config, status)	2026-04-23 01:44:11 +09:00
YeonGyu-Kim	9eba71da81	roadmap: file #153b — PATH setup guide follow-up to #153	2026-04-23 01:35:24 +09:00
YeonGyu-Kim	ef5aae3ddd	roadmap: file #130b — filesystem errors lose context, emit generic errno strings (export command case)	2026-04-23 01:33:25 +09:00
YeonGyu-Kim	f05bc037de	docs(#250 , #251 ): Align SCHEMAS.md with actual binary, downgrade #250 to scope-reduced Cycle #46 follow-up to cycle #45's #251 implementation. Closes #250's implementation urgency by aligning docs with reality. SCHEMAS.md Updates: For each of the 4 session-management verbs, added: 1. Status marker (Implemented or Stub only) 2. Actual binary envelope (shape produced by the #251-fixed binary) 3. Aspirational (future) shape (original SCHEMAS.md content, preserved as target) 4. Gap notes where the two diverge Per-verb status: - list-sessions: Implemented, nested field layout - load-session: Implemented, nested session object with local session_not_found error - delete-session: Stub, emits not_yet_implemented (local error, not auth) - flush-transcript: Stub, emits not_yet_implemented (local error, not auth) ROADMAP.md Updates: - #251 marked CLOSED: Full status with commit ref, test counts. - #250 marked SCOPE-REDUCED: Option A resolved by #251, Option C moot, only Option B (doc alignment) remains as future cleanup. Why this matters: Every code change should close its documentation loop. #251 landed on the branch, but SCHEMAS.md still described aspirational shapes without marking which were implemented. Claws reading SCHEMAS.md would have assumed full conformance and hit surprises. Now the document tells the truth about which verbs work, which are stubs, and why. Related: - #251 implementation on feat/jobdori-251-session-dispatch branch - #250 scope-reduced to Option B (field-name harmonization) - #145/#146 parser fall-through fix precedent	2026-04-23 01:28:33 +09:00
YeonGyu-Kim	2fcb85ce4e	ROADMAP #251 : dispatch-order bug — session-management verbs fall through to Prompt before credential check (filed by gaebal-gajae; formalized by Jobdori cycle #40 ) Cycle #40: gaebal-gajae conceived #251 in their 00:00 Discord cycle status but hadn't committed to ROADMAP yet. Jobdori verified their diagnosis with code trace and formalized into ROADMAP with the proper framing relationship to #250. ## What This Pinpoint Says Same observable as #250 (session-management verbs emit missing_credentials instead of SCHEMAS.md envelope) but reframed at the dispatch-order layer: - #250 says: surface missing on canonical binary vs SCHEMAS.md promise - #251 says: top-level parser fall-through happens BEFORE dispatcher could intercept, so credential resolution runs before the verb is classified as a purely-local operation #251's framing is sharper because it identifies WHY the fall-through produces auth errors, not just that it does. ## Verified Code Trace - main.rs:1017-1027 is the _other => Prompt catchall - joins all rest[] tokens into joined, constructs CliAction::Prompt - downstream resolves credentials -> emits missing_credentials - No credential call would be needed had the verb been intercepted Same pattern has been fixed before for other purely-local verbs: - #145: plugins (main.rs:888-906, explicit match arm) - #146: config and diff (main.rs:911-935, same shape) #251 extends this to the 4 session-management verbs. ## Recommended Sequence 1. #251 fix (4 match arms mirroring #145/#146) — principled solution 2. #250's Option B (docs scope note) — guard against future drift 3. #250's Option C (reject with redirect) — unnecessary if #251 lands ## Discipline Per cycle #24 calibration: - Red-state bug? Borderline (silent misroute to auth error class) - Real friction? ✓ (4 documented surfaces emit wrong error class) - Evidence-backed? ✓ (code trace + prior-fix precedent #145/#146) - Same-cycle fix? ✗ (filed + document, boundary discipline #36) - Implementation cost? ~40 lines Rust + tests, bounded ## Credit Conception: gaebal-gajae (Discord msg 1496526112254328902, 00:00 KST) Formalization: Jobdori cycle #40 (code trace + precedent linking) This is the right kind of collaboration: gaebal-gajae saw the dispatch pattern I had missed in #250 (I framed as surface parity; they framed as dispatch order). I verified their diagnosis and committed the ROADMAP entry. Two framings make the pinpoint sharper than either alone.	2026-04-23 00:06:46 +09:00
YeonGyu-Kim	f1103332d0	ROADMAP #130 : re-verify still-open on main HEAD 186d42f; add classifier-cluster pairing note Cycle #39 dogfood re-verification of #130 (filed 2026-04-20). All 5 filesystem failure modes reproduce identically on main HEAD 186d42f, 2 days after original filing. Gap is unchanged. ## What's Added 1. [STILL OPEN — re-verified 2026-04-22 cycle #39] marker on the entry so readers can see immediately that the pinpoint hasn't been accidentally closed. 2. Full 5-mode repro output preserved verbatim for the current HEAD, so future re-verifications have a concrete baseline to diff against. 3. New evidence not in original filing: the classifier actively chose `kind: "unknown"` rather than just omitting the field. This means classify_error_kind() has NO substring match for "Is a directory", "No such file", "Operation not permitted", or "File exists". The typed-error contract is thus twice-broken on this path. 4. Pairing with #247/#248/#249 classifier sweep: the classifier-level part of #130 could land in the same sweep (add substring branches for io::ErrorKind strings). The context-preservation part (fix run_export's bare `?`) is a separate, larger change. ## Why Re-Verification Not Re-Filing Per cycle #24 discipline: speculative re-filings add noise, real confirmations add truth. #130 was already filed with exact repros, code trace, and fix shape. My dogfood hit the same gap on fresh HEAD — the right output is confirming the gap is still there (not filing #251 for the same bug). This is the same pattern as cycle #32's "mark #127 CLOSED" reality-sync: documentation-drift prevention through explicit status markers. ## New Pattern "Reality-sync via re-verification" — re-running a filed pinpoint's repro on fresh HEAD and adding the timestamp + output proves the gap is still real without inventing new filings. Cycle #24 calibration keeps ROADMAP entries honest. Per cycle #24 calibration: - Red-state bug? ⚠️ borderline (errors surfaced, but kind=unknown is demonstrably wrong on a path where the system knows the errno) - Real friction? ✓ (re-verified on fresh HEAD) - Evidence-backed? ✓ (5-mode repro + classifier trace) - Same-cycle fix? ✗ (classifier-level part could join #247/#248/#249 sweep; context-preservation part is larger refactor) - Implementation cost? Classifier part ~10 lines; full context fix ~60 lines Source: Jobdori cycle #39 proactive dogfood in response to Clawhip pinpoint nudge. Probed export filesystem errors; discovered this was #130 reconfirmation, not new bug. Applied reality-sync pattern from cycle #32.	2026-04-23 00:02:58 +09:00
YeonGyu-Kim	186d42f979	ROADMAP #250 : CLI surface parity gap — SCHEMAS.md's list-sessions/delete-session/etc. are Python-only; Rust binary falls through to Prompt with cred error Cycle #38 dogfood finding. Probed session management via the top-level subcommand path documented in SCHEMAS.md; discovered the Rust binary doesn't implement these as top-level subcommands. The literal token 'list-sessions' falls through the _other => Prompt arm and returns 'missing Anthropic credentials' instead of the documented envelope. ## The Gap SCHEMAS.md documents 14 CLAWABLE top-level subcommands. Python audit harness (src/main.py) implements all 14. Rust binary implements ~8 of them as top-level, routing session management through /session slash commands via --resume instead. Repro: $ env -i PATH=$PATH HOME=$HOME claw list-sessions --output-format json {"error":"missing Anthropic credentials; ...","kind":"missing_credentials"} $ claw --resume latest /session list --output-format json {"active":"...","kind":"session_list","sessions":[...]} $ python3 -m src.main list-sessions --output-format json {"command":"list-sessions","sessions":[...],"exit_code":0} Same operation, three different CLI shapes across implementations. ## Classification This is BOTH: - a parser-level trust gap (6th in #108/#117/#119/#122/#127 family; same _other => Prompt fall-through), AND - a cross-implementation parity gap (SCHEMAS.md at repo root doesn't match Rust binary's top-level surface) Unlike prior fall-throughs where the input was malformed, the input here IS a documented surface. The fall-through is wrong for a different reason: the surface exists in the protocol but not in this implementation. ## Three Fix Options Option A: Implement surfaces on Rust binary (highest cost, full parity) Option B: Scope SCHEMAS.md to Python harness (docs-only) Option C: Reject at parse time with redirect hint (cheapest, #127 pattern) Recommended: C first (prevents cred misdirection), then B for docs hygiene, then A if demand justifies. ## Discipline Per cycle #24 calibration: - Red-state bug? ⚠️ borderline — silent misroute to cred error on a documented surface. Not a crash but a real wrong-contract response. - Real friction? ✓ (claws reading SCHEMAS.md hit wrong error on canonical binary) - Evidence-backed? ✓ (dogfood probe + SCHEMAS.md cross-reference + code trace) - Implementation cost? Option C: ~30 lines (bounded). Option A: larger. - Same-cycle fix? ✗ (file + document, defer implementation per #36 boundary discipline) ## Family Position Natural bundle: #127 + #250 — parser-level fall-through pair with class distinction. #127 fixed suffix-arg-on-valid-verb case. #250 extends to 'entire Python-harness verb treated as prompt.' Same fall-through arm, different entry class. Source: Jobdori cycle #38 proactive dogfood in response to Clawhip pinpoint nudge at msg 1496518474019639408. Probed session management CLI after gaebal-gajae's status sync confirmed no red-state regressions this cycle; found this cross-implementation surface parity gap by comparing SCHEMAS.md claims against actual Rust binary behavior.	2026-04-22 23:37:45 +09:00
YeonGyu-Kim	5f8d1b92a6	ROADMAP #249 : resumed-session slash command error envelopes omit `kind` field Cycle #37 dogfood finding post-#247 merge. Two Err arms in the resumed-session JSON path at main.rs:2747 and main.rs:2783 emit error envelopes WITHOUT the `kind` field required by the §4.44 typed-envelope contract. ## The Pinpoint Probed resumed-session slash command JSON path: $ claw --output-format json --resume latest /session {"command":"/session","error":"unsupported resumed slash command","type":"error"} # no kind field $ claw --output-format json --resume latest /xyz-unknown {"command":"/xyz-unknown","error":"Unknown slash command: /xyz-unknown\n Help /help lists available slash commands","type":"error"} # no kind field AND multi-line error without split hint Compare to happy path which DOES include kind: $ claw --output-format json --resume latest /session list {"active":"...","kind":"session_list",...} Contract awareness exists. It's just not applied in the Err arms. ## Scope Two atomic fixes in main.rs: - Line 2747: SlashCommand::parse() Err → add kind via classify_error_kind() - Line 2783: run_resume_command() Err → add kind + call split_error_hint() ~15 lines Rust total. Bounded. ## Family Classification §4.44 typed-envelope contract sweep: - #179 (parse-error real message quality) — closed - #181 (envelope exit_code matches process exit) — closed - #247 (classify_error_kind misses prompt-patterns) — closed - #248 (verb-qualified unknown option errors) — in-flight (another agent) - #249 (resumed-session slash error envelopes omit kind) — filed Natural bundle #247+#248+#249: classifier/envelope completeness across all three CLI paths (top-level parse, subcommand options, resumed-session slash). ## Discipline Per cycle #24 calibration: - Red-state bug? ✗ (errors surfaced, exit codes correct) - Real friction? ✓ (typed-error contract violation; claws dispatching on error.kind get undefined for all resumed slash-command errors) - Evidence-backed? ✓ (dogfood probe + code trace identified both Err arms) - Implementation cost? ~15 lines (bounded) - Same-cycle fix? ✗ (Rust change, deferred per file-not-fix discipline) ## Not Implementing This Cycle Per the boundary discipline established in cycle #36: I don't touch another agent's in-flight work, and I don't implement a Rust fix same-cycle when the pattern is "file + document + let owner/maintainer decide." Filing with concrete fix shape is the correct output. If demand or red-state symptoms arrive, implementation can follow the same path as #247: file → fix in branch → review → merge. Source: Jobdori cycle #37 proactive dogfood in response to Clawhip pinpoint nudge at msg 1496518474019639408.	2026-04-22 23:33:50 +09:00
YeonGyu-Kim	84466bbb6c	fix: #247 classify prompt-related parse errors + unify JSON hint plumbing Cycle #34 dogfood follow-through on Jobdori cycle #33 pinpoint (#247 filed at fbcbe9d). Closes the two typed-error contract drifts surfaced in that pinpoint against the Rust `claw` binary. ## What was wrong 1. `classify_error_kind()` (main.rs:~251) used substring matching but did NOT match two common prompt-related parse errors: - "prompt subcommand requires a prompt string" - "empty prompt: provide a subcommand..." Both fell through to `"unknown"`. §4.44 typed-error contract specifies `parse \| usage \| unknown` as distinct classes, so claws dispatching on `error.kind == "cli_parse"` missed those paths entirely. 2. JSON mode dropped the `Run `claw --help` for usage.` hint. Text mode appends it at stderr-print time (main.rs:~234) AFTER split_error_hint() has already serialized the envelope, so JSON consumers never saw it. Text-mode humans got an actionable pointer; machine consumers did not. ## Fix Two small, targeted edits: 1. `classify_error_kind()`: add explicit branches for "prompt subcommand requires" and "empty prompt:" (the latter anchored with `starts_with` so it never hijacks unrelated error messages containing the word). Both route to `cli_parse`. 2. JSON error render path in `main()`: after calling split_error_hint(), if the message carried no embedded hint AND kind is `cli_parse` AND the short-reason does not already embed a `claw --help` pointer, synthesize the same `Run `claw --help` for usage.` trailer that text-mode stderr appends. The embedded-pointer check prevents duplication on the `empty prompt: ... (run `claw --help`)` message which already carries inline guidance. ## Verification Direct repro on the compiled binary: $ claw --output-format json prompt {"error":"prompt subcommand requires a prompt string", "hint":"Run `claw --help` for usage.", "kind":"cli_parse","type":"error"} $ claw --output-format json "" {"error":"empty prompt: provide a subcommand (run `claw --help`) or a non-empty prompt string", "hint":null,"kind":"cli_parse","type":"error"} $ claw --output-format json doctor --foo # regression guard {"error":"unrecognized argument `--foo` for subcommand `doctor`", "hint":"Run `claw --help` for usage.", "kind":"cli_parse","type":"error"} Text mode unchanged in shape; `[error-kind: ...]` prefix now reads `cli_parse` for the two previously-misclassified paths. ## Regression coverage - Unit test `classify_error_kind_covers_prompt_parse_errors_247`: locks both patterns route to `cli_parse` AND that generic "prompt"-containing messages still fall through to `unknown`. - Integration tests in `tests/output_format_contract.rs`: * prompt_subcommand_without_arg_emits_cli_parse_envelope_with_hint_247 * empty_positional_arg_emits_cli_parse_envelope_247 * whitespace_only_positional_arg_emits_cli_parse_envelope_247 * unrecognized_argument_still_classifies_as_cli_parse_247_regression_guard - Full rusty-claude-cli test suite: 218 tests pass (180 bin unit + 15 output_format_contract + 12 resume_slash + 7 compact + 3 mock + 1 cli). ## Family / related Joins §4.44 typed-envelope contract gap family closure: #130, #179, #181, and now #247. All four quartet items now have real fixes landed on the canonical binary surface rather than only the Python harness. ROADMAP.md: #247 marked CLOSED with before/after evidence preserved.	2026-04-22 22:43:14 +09:00
YeonGyu-Kim	fbcbe9d8d5	ROADMAP #247 : classify_error_kind() misses prompt-related parse errors; hint dropped in JSON envelope Cycle #33 dogfood finding from direct probe of Rust claw binary: ## The Pinpoint Two related contract drifts in the typed-error envelope: ### 1. Error-kind misclassification `classify_error_kind()` at main.rs:246-280 uses substring matching but does NOT match two common parse error messages: - "prompt subcommand requires a prompt string" → classified as 'unknown' - "empty prompt: provide a subcommand..." → classified as 'unknown' The §4.44 typed-error contract specifies 'parse \| usage \| unknown' as DISTINCT classes. Known parse errors should be 'cli_parse', not 'unknown'. ### 2. Hint lost in JSON mode Text mode appends 'Run `claw --help` for usage.' to parse errors. JSON mode emits 'hint: null'. The trailer is added at the stderr-print stage AFTER split_error_hint() has already serialized the envelope, so JSON consumers never see it. ## Repro Dogfooded on main HEAD dd0993c (cycle #33): $ claw --output-format json prompt {"error":"prompt subcommand requires a prompt string","hint":null,"kind":"unknown","type":"error"} Expected: kind="cli_parse" + hint="Run \\`claw --help\\` for usage." ## Impact - Claws dispatching on typed error.kind fall back to substring matching - JSON consumers lose actionable hint that text-mode users see - Joins JSON envelope field-quality family (#90, #91, #92, #110, #115, #116, #130, #179, #181, #247) ## Fix Shape 1. Add prompt-pattern clauses to classify_error_kind() (~4 lines) 2. Move hint plumbing to BEFORE JSON envelope serialization (~15 lines) 3. Add golden-fixture regression tests per cycle #30 pattern Not a red-state bug (error IS surfaced, exit code IS correct), but real contract drift. Deferred for implementation; filed per Clawhip nudge to 'add one concrete follow-up to ROADMAP.md'. Per cycle #24 calibration: - Red-state bug? ✗ (errors exit 1 correctly) - Real friction? ✓ (typed-error contract drift) - Evidence-backed? ✓ (dogfood probe + code trace identified both leaks) - Implementation cost? ~20 lines Rust (bounded) - Demand signal needed? Medium — any claw doing error.kind dispatch on prompt-path errors is affected Source: Jobdori cycle #33 direct dogfood 2026-04-22 22:30 KST in response to Clawhip pinpoint nudge at msg 1496503374621970583.	2026-04-22 22:34:35 +09:00
YeonGyu-Kim	dd0993c157	docs: cycle #32 — mark #127 CLOSED; document in-flight branch obsolescence Cycle #32 dogfood finding: #127 was fixed on main via `a3270db` + `79352a2` (2026-04-20), but the ROADMAP.md entry still lacked a [CLOSED] marker. The in-flight branches `feat/jobdori-127-clean` and `feat/jobdori-127-verb-suffix-flags` were superseded and are now obsolete. ## What This Fixes Documentation drift: Pinpoint #127 was complete in code but unmarked in ROADMAP. New contributors checking the roadmap would see it as open work, potentially duplicating effort. Stale branches: Two branches (`feat/jobdori-127-clean`, `feat/jobdori-127-verb-suffix-flags`) contain the fix attempt bundled with an unrelated large-scope refactor (5365 lines removed from ROADMAP.md, root-level governance docs deleted, command infra refactored). Their fix was superseded; branches are functionally obsolete. ## Verification Re-verified all 4 #127 scenarios pass on main HEAD `b903e16`: $ claw doctor --json → rejected with "did you mean" hint $ claw doctor garbage → rejected $ claw doctor --unknown-flag → rejected $ claw doctor --output-format json → works (canonical form) All behavior matches #127 acceptance criteria. ## Cluster Impact Post-closure: parser-level trust gap quintet (#108 + #117 + #119 + #122 + #127) is 5/5 closed. The `_other => Prompt` fall-through audit is complete. ## Discipline Check Per cycle #24 calibration: - Red-state bug? ✗ (behavior is correct on main) - Real friction? ✓ (ROADMAP drift; obsolete branches adrift) - Evidence-backed? ✓ (dogfood probe confirmed closure; git log confirmed supersession; branch diff confirmed scope contamination) ## Relationship to Gaebal-gajae's Option A Guidance Cycle #32 started by proposing separating the #127 fix from the attached refactor. On deeper probe, discovered the fix was already superseded on main via different commits. Option A (separate the fix) is retroactively satisfied: the fix landed cleanly, the refactor never did. The remaining action is governance hygiene: mark closure, document supersession, flag obsolete branches for deletion. ## Next Actions (not in this commit) - Delete `feat/jobdori-127-clean` locally and on fork (after confirmation) - Delete `feat/jobdori-127-verb-suffix-flags` locally and on fork - Monitor whether any attached refactor content should be re-proposed in its own scoped PR Source: Jobdori cycle #32 dogfood in response to Clawhip 10-min nudge. Proposed Option A (separate fix from refactor); probe revealed the fix already landed via a different commit path, rendering the refactor-only branch obsolete.	2026-04-22 22:28:22 +09:00
YeonGyu-Kim	1d155e4304	docs: ROADMAP.md — file #180 (discoverability gap: --help/--version outside JSON contract) Cycle #24 dogfood discovery. Running proactive edge-case dogfood on the JSON contract, hit a real pinpoint: --help and --version are outside the parser-front-door contract. The gap: 1. "claw --help --output-format json" returns text (not envelope) 2. "claw bootstrap --help --output-format json" returns text (not envelope) 3. "claw --version" doesn't exist at all Why it matters: - Claws can't programmatically discover the CLI surface - Version checking requires side-effectful commands - Natural follow-up gap to #178/#179 parser-front-door work Discoverability scenarios: - Orchestrator checking whether a new command (e.g., turn-loop) is available - Version compat check before dispatching work - Enumerating available commands for routing decisions Filed as Pinpoint #180 in ROADMAP.md with: - Gap description + 3-case repro - Impact analysis (version compat, surface enumeration, governance) - Root cause (argparse default HelpAction prints text + exits) - Fix shape (3 stages, ~40 lines total) - Stage A: --version + JSON envelope version metadata - Stage B: --help JSON routing via custom HelpAction - Stage C: optional 'schema-info' command for pre-dispatch discovery - Acceptance criteria (4 cases, including backward compat) - Priority: Medium (not red-state, but real discoverability gap) Status: Filed, implementation deferred. Following maintainership equilibrium: pinpoints stay documented but don't force code changes. If external demand arrives (claw author building a dispatcher, orchestrator doing version checks), the fix can ship in one cycle using the shape already documented. No code changes this cycle. Pure ROADMAP filing. Continues the maintainership pattern: find friction, document it, defer until evidence-backed demand arrives. Source: Jobdori proactive dogfood at 2026-04-22 20:58 KST.	2026-04-22 21:01:40 +09:00
YeonGyu-Kim	85de7f9814	fix: #166 — flush-transcript now accepts --directory / --output-format / --session-id; session-creation command parity with #160/#165 lifecycle triplet	2026-04-22 18:04:25 +09:00
YeonGyu-Kim	d453eedae6	fix: #165 — load-session CLI now parity-matches list/delete (--directory, --output-format, typed JSON errors) The #160 session-lifecycle CLI triplet was asymmetric: list-sessions and delete-session accepted --directory + --output-format and emitted typed JSON error envelopes, but load-session had neither flag and dumped a raw Python traceback (including the SessionNotFoundError class name) on a missing session. Three concrete impacts this fix closes: 1. Alternate session-store locations (e.g. /tmp/claw-run-XXX/.port_sessions) were unreachable via load-session; claws had to chdir or monkeypatch DEFAULT_SESSION_DIR to work around it. 2. Not-found emitted a multi-line Python stack, not a parseable envelope. Claws deciding retry/escalate/give-up had only exit code 1 to work with. 3. The traceback leaked 'src.session_store.SessionNotFoundError' verbatim, coupling version-pinned claws to our internal exception class name. Now all three triplet commands accept the same flag pair and emit the same JSON error shape: Success (json mode): {"session_id": "alpha", "loaded": true, "messages_count": 3, "input_tokens": 42, "output_tokens": 99} Not-found: {"session_id": "missing", "loaded": false, "error": {"kind": "session_not_found", "message": "session 'missing' not found in /path", "directory": "/path", "retryable": false}} Corrupted file: {"session_id": "broken", "loaded": false, "error": {"kind": "session_load_failed", "message": "...", "directory": "/path", "retryable": true}} Exit code contract: - 0 on successful load - 1 on not-found (preserves existing $?) - 1 on OSError/JSONDecodeError (distinct 'kind' in JSON) Backward compat: legacy 'claw load-session ID' text output unchanged byte-for-byte. Only new behaviour is the flags and structured error path. Tests (tests/test_load_session_cli.py, 13 tests): - TestDirectoryFlagParity (2): --directory works + fallback to CWD/.port_sessions - TestOutputFormatFlagParity (2): json schema + text-mode backward compat - TestNotFoundTypedError (2): JSON envelope on not-found; no traceback in either mode; no internal class name leak - TestLoadFailedDistinctFromNotFound (1): corrupted file = session_load_failed with retryable=true, distinct from session_not_found - TestTripletParityConsistency (6): parametrised over [list, delete, load] * [--directory, --output-format] — explicit parity guard for future regressions Full suite: 80/80 passing, zero regression. Discovered via Jobdori dogfood sweep 2026-04-22 17:44 KST — ran 'claw load-session nonexistent' expecting a clean error, got a Python traceback. Filed #165 + fixed in same commit. Closes ROADMAP #165.	2026-04-22 17:44:48 +09:00
YeonGyu-Kim	79a9f0e6f6	fix: #163 — remove [turn N] suffix pollution from run_turn_loop; file #164 timeout-cancellation followup #163: run_turn_loop no longer injects f'{prompt} [turn N]' into follow-up prompts. The suffix was never defined or interpreted anywhere — not by the engine, not by the system prompt, not by any LLM. It looked like a real user-typed annotation in the transcript and made replay/analysis fragile. New behaviour: - turn 0 submits the original prompt (unchanged) - turn > 0 submits caller-supplied continuation_prompt if provided, else the loop stops cleanly — no fabricated user turn - added continuation_prompt: str \| None = None parameter to run_turn_loop - added --continuation-prompt CLI flag for claws scripting multi-turn loops - zero '[turn' strings ever appear in mutable_messages or stdout now Behaviour change for existing callers: - Before: run_turn_loop(prompt, max_turns=3) submitted 3 turns ('prompt', 'prompt [turn 2]', 'prompt [turn 3]') - After: run_turn_loop(prompt, max_turns=3) submits 1 turn ('prompt') - To preserve old multi-turn behaviour, pass continuation_prompt='Continue.' or any structured follow-up text One existing timeout test (test_budget_is_cumulative_across_turns) updated to pass continuation_prompt so the cumulative-budget contract is actually exercised across turns instead of trivially satisfied by a one-turn loop. #164 filed: addresses reviewer feedback on #161. The wall-clock timeout bounds the caller-facing wait, but the underlying submit_message worker thread keeps running and can mutate engine state after the timeout TurnResult is returned. A cooperative cancel_event pattern is sketched in the pinpoint; real asyncio.Task.cancel() support will come once provider IO is async-native (larger refactor). Tests (tests/test_run_turn_loop_continuation.py, 8 tests): - TestNoTurnSuffixInjection (2): zero '[turn' strings in any submitted prompt, both default and explicit-continuation paths - TestContinuationDefaultStopsAfterTurnZero (2): default loops run exactly one turn; engine.submit_message called exactly once despite max_turns=10 - TestExplicitContinuationBehaviour (2): turn 0 = original, turn N = continuation verbatim; max_turns still respected - TestCLIContinuationFlag (2): CLI default emits only '## Turn 1'; --continuation-prompt wires through to multi-turn behaviour Full suite: 67/67 passing. Closes ROADMAP #163. Files #164.	2026-04-22 17:37:22 +09:00
YeonGyu-Kim	41a6091355	file: #163 — run_turn_loop injects [turn N] suffix into follow-up prompts; multi-turn sessions semantically broken	2026-04-22 10:07:35 +09:00
YeonGyu-Kim	bc94870a54	file: #162 — submit_message appends budget-exceeded turn before returning max_budget_reached; session state corrupted on overflow	2026-04-22 09:38:00 +09:00
YeonGyu-Kim	ee3aa29a5e	file: #161 — run_turn_loop has no wall-clock timeout, stalled turn blocks indefinitely	2026-04-22 08:57:38 +09:00
YeonGyu-Kim	a389f8dff1	file: #160 — session_store missing list_sessions, delete_session, session_exists — claw cannot enumerate or clean up sessions without filesystem hacks	2026-04-22 08:47:52 +09:00
YeonGyu-Kim	7a014170ba	file: #159 — run_turn_loop hardcodes empty denied_tools, permission denials absent from multi-turn sessions	2026-04-22 06:48:03 +09:00
YeonGyu-Kim	986f8e89fd	file: #158 — compact_messages_if_needed drops turns silently, no structured compaction event	2026-04-22 06:37:54 +09:00
YeonGyu-Kim	ef1cfa1777	file: #157 — structured remediation registry for error hints (Phase 3 of #77 ) ## Gap #77 Phase 1 added machine-readable error kind discriminants and #156 extended them to text-mode output. However, the hint field is still prose derived from splitting existing error text — not a stable registry-backed remediation contract. Downstream claws inspecting the hint field still need to parse human wording to decide whether to retry, escalate, or terminate. ## Fix Shape 1. Remediation registry: remediation_for(kind, operation) -> Remediation struct with action (retry/escalate/terminate/configure), target, and stable message 2. Stable hint outputs per error class (no more prose splitting) 3. Golden fixture tests replacing split_error_hint() string hacks ## Source gaebal-gajae dogfood sweep 2026-04-22 05:30 KST	2026-04-22 05:31:00 +09:00
YeonGyu-Kim	14c5ef1808	file: #156 — error classification for text-mode output (Phase 2 of #77 ) ROADMAP entry for natural Phase 2 follow-up to #77 Phase 1 (JSON error kind classification). Text-mode errors currently prose-only with no structured class; observability tools parsing stderr need the kind token. Two implementation options: - Prefix line before error prose: [error-kind: missing_credentials] - Suffix comment: # error_class=missing_credentials Scope: ~20 lines. Non-breaking (adds classification, doesn't change error text). Source: Cycle 11 dogfood probe at 23:18 KST — product surface clean after today's batch, identified natural next step for error-classification symmetry.	2026-04-21 23:19:58 +09:00
YeonGyu-Kim	4b53b97e36	docs: #155 — add USAGE.md documentation for /ultraplan, /teleport, /bughunter commands ## Problem Three interactive slash commands are documented in `claw --help` but have no corresponding section in USAGE.md: - `/ultraplan [task]` — Run a deep planning prompt with multi-step reasoning - `/teleport <symbol-or-path>` — Jump to a file or symbol by searching the workspace - `/bughunter [scope]` — Inspect the codebase for likely bugs New users see these commands in the help output but don't know: - What each command does - How to use it - When to use it vs. other commands - What kind of results to expect ## Fix Added new section "Advanced slash commands (Interactive REPL only)" to USAGE.md with documentation for all three commands: 1. `/ultraplan` — multi-step reasoning for complex tasks - Example: `/ultraplan refactor the auth module to use async/await` - Output: structured plan with numbered steps and reasoning 2. `/teleport` — navigate to a file or symbol - Example: `/teleport UserService`, `/teleport src/auth.rs` - Output: file content with the requested symbol highlighted 3. `/bughunter` — scan for likely bugs - Example: `/bughunter src/handlers`, `/bughunter` (all) - Output: list of suspicious patterns with explanations ## Impact Users can now discover these commands and understand when to use them without having to guess or search external sources. Bridges the gap between `--help` output and full documentation. Also filed ROADMAP #155 documenting the gap. Closes ROADMAP #155.	2026-04-21 21:49:04 +09:00
YeonGyu-Kim	3cfe6e2b14	feat: #154 — hint provider prefix and env var when model name looks like different provider ## Problem When a user types `claw --model gpt-4` or `--model qwen-plus`, they get: ``` error: invalid model syntax: 'gpt-4'. Expected provider/model (e.g., anthropic/claude-opus-4-6) or known alias ``` USAGE.md documents that "The error message now includes a hint that names the detected env var" — but this hint does not actually exist. The user has to re-read USAGE.md or guess the correct prefix. ## Fix Enhance `validate_model_syntax` to detect when a model name looks like it belongs to a different provider: 1. OpenAI models (starts with `gpt-` or `gpt_`): ``` Did you mean `openai/gpt-4`? (Requires OPENAI_API_KEY env var) ``` 2. Qwen/DashScope models (starts with `qwen`): ``` Did you mean `qwen/qwen-plus`? (Requires DASHSCOPE_API_KEY env var) ``` 3. Grok/xAI models (starts with `grok`): ``` Did you mean `xai/grok-3`? (Requires XAI_API_KEY env var) ``` Unrelated invalid models (e.g., `asdfgh`) do not get a spurious hint. ## Verification - `claw --model gpt-4` → hints `openai/gpt-4` + `OPENAI_API_KEY` - `claw --model qwen-plus` → hints `qwen/qwen-plus` + `DASHSCOPE_API_KEY` - `claw --model grok-3` → hints `xai/grok-3` + `XAI_API_KEY` - `claw --model asdfgh` → generic error (no hint) ## Tests Added 3 new assertions in `parses_multiple_diagnostic_subcommands`: - GPT model error hints openai/ prefix and OPENAI_API_KEY - Qwen model error hints qwen/ prefix and DASHSCOPE_API_KEY - Unrelated models don't get a spurious hint All 177 rusty-claude-cli tests pass. Closes ROADMAP #154.	2026-04-21 21:40:48 +09:00
YeonGyu-Kim	71f5f83adb	feat: #153 — add post-build binary location and verification guide to README ## Problem Users frequently ask after building: - "Where is the claw binary?" - "Did the build actually work?" - "Why can't I run \`claw\` from anywhere?" This happens because \`cargo build\` puts the binary in \`rust/target/debug/claw\` (or \`rust/target/release/claw\`), and new users don't know: 1. Where to find it 2. How to test it 3. How to add it to PATH (optional but common follow-up) ## Fix Added new section "Post-build: locate the binary and verify" to README covering: 1. Binary location table: debug vs. release, macOS/Linux vs. Windows paths 2. Verification commands: Test the binary with \`--help\` and \`doctor\` 3. Three ways to add to PATH: - Symlink (macOS/Linux): \`ln -s ... /usr/local/bin/claw\` - cargo install: \`cargo install --path . --force\` - Shell profile update: add rust/target/debug to \$PATH 4. Troubleshooting: Common errors ("command not found", "permission denied", debug vs. release build speed) ## Impact New users can now: - Find the binary immediately after build - Run it and verify with \`claw doctor\` - Know their options for system-wide access Also filed ROADMAP #153 documenting the gap. Closes ROADMAP #153.	2026-04-21 21:29:59 +09:00
YeonGyu-Kim	dddbd78dbd	file: #152 — diagnostic verb suffixes allow arbitrary positional args, double error prefix Filed from nudge directive at 21:17 KST. Implementation exists on worktree `jobdori-127-verb-suffix` but needs rebase due to merge with #141. Ready for Phase 1 implementation once conflicts resolved.	2026-04-21 21:19:51 +09:00
YeonGyu-Kim	7bc66e86e8	feat: #151 — canonicalize workspace path in SessionStore::from_cwd/data_dir ## Problem `workspace_fingerprint(path)` hashes the raw path string without canonicalization. Two equivalent paths (e.g. `/tmp/foo` vs `/private/tmp/foo` on macOS) produce different fingerprints and therefore different session stores. #150 fixed the test-side symptom; this fixes the underlying product contract. ## Discovery path #150 fix (canonicalize in test) was a workaround. Q's ack on #150 surfaced the deeper gap: the function itself is still fragile for any caller passing a non-canonical path: 1. Embedded callers with a raw `--data-dir` path 2. Programmatic `SessionStore::from_cwd(user_path)` calls 3. NixOS store paths, Docker bind mounts, case-insensitive normalization The REPL's default flow happens to work because `env::current_dir()` returns canonical paths on macOS. But any caller passing a raw path risks silent session-store divergence. ## Fix Canonicalize inside `SessionStore::from_cwd()` and `from_data_dir()` before computing the fingerprint. Kept `workspace_fingerprint()` itself as a pure function for determinism — canonicalization is the entry point's responsibility. ```rust let canonical_cwd = fs::canonicalize(cwd).unwrap_or_else(\|_\| cwd.to_path_buf()); let sessions_root = canonical_cwd.join(".claw").join("sessions").join(workspace_fingerprint(&canonical_cwd)); ``` Falls back to the raw path if canonicalize fails (directory doesn't exist yet). ## Test-side updates Three legacy-session tests expected the non-canonical base path to match the store's workspace_root. Updated them to canonicalize `base` after creation — same defensive pattern as #150, now explicit across all three tests. ## Regression test Added `session_store_from_cwd_canonicalizes_equivalent_paths` that creates two stores from equivalent paths (raw vs canonical) and asserts they resolve to the same sessions_dir. ## Verification - `cargo test -p runtime session_store_` — 9/9 pass - `cargo test --workspace` — all green, no FAILED markers - No behavior change for existing users (REPL default flow already used canonical paths) ## Backward compatibility Users on macOS who always went through `env::current_dir()`: no hash change, sessions resume identically. Users who ever called with a non-canonical path: hash would change, but those sessions were already broken (couldn't be resumed from a canonical-path cwd). Net improvement. Closes ROADMAP #151.	2026-04-21 21:06:09 +09:00
YeonGyu-Kim	eaa077bf91	fix: #150 — eliminate symlink canonicalization flake in resume_latest test + file #246 (reminder outcome ambiguity) ## #150 Fix: resume_latest test flake Problem: `resume_latest_restores_the_most_recent_managed_session` intermittently fails when run in the workspace suite or multiple times in sequence, but passes in isolation. Root cause: `workspace_fingerprint(path)` hashes the path string without canonicalization. On macOS, `/tmp` is a symlink to `/private/tmp`. The test creates a temp dir via `std::env::temp_dir().join(...)` which returns `/var/folders/...` (non-canonical). When the subprocess spawns, `env::current_dir()` returns the canonical path `/private/var/folders/...`. The two fingerprints differ, so the subprocess looks in `.claw/sessions/<hash1>` while files are in `.claw/sessions/<hash2>`. Session discovery fails. Fix: Call `fs::canonicalize(&project_dir)` after creating the directory to ensure test and subprocess use identical path representations. Verification: 5 consecutive runs of the full test suite — all pass. Previously: 5/5 failed when run in sequence. ## #246 Filing: Reminder cron outcome ambiguity (control-loop blocker) The `clawcode-dogfood-cycle-reminder` cron times out repeatedly with no structured feedback on whether the nudge was delivered, skipped, or died in-flight. Phase 1 outcome schema — add explicit field to cron result: - `delivered` — nudge posted to Discord - `timed_out_before_send` — died before posting - `timed_out_after_send` — posted but cleanup timed out - `skipped_due_to_active_cycle` — previous cycle active - `aborted_gateway_draining` — daemon shutdown Assigned to gaebal-gajae (cron/orchestration domain). Unblocks trustworthy dogfood cycle observability. Closes ROADMAP #150. Filed ROADMAP #246.	2026-04-21 21:01:09 +09:00
YeonGyu-Kim	bc259ec6f9	fix: #149 — eliminate parallel-test flake in runtime::config tests ## Problem `runtime::config::tests::validates_unknown_top_level_keys_with_line_and_field_name` intermittently fails during `cargo test --workspace` (witnessed during #147 and #148 workspace runs) but passes deterministically in isolation. Example failure from workspace run: test result: FAILED. 464 passed; 1 failed ## Root cause `runtime/src/config.rs::tests::temp_dir()` used nanosecond timestamp alone for namespace isolation: std::env::temp_dir().join(format!("runtime-config-{nanos}")) Under parallel test execution on fast machines with coarse clock resolution, two tests start within the same nanosecond bucket and collide on the same path. One test's `fs::remove_dir_all(root)` then races another's in-flight `fs::create_dir_all()`. Other crates already solved this pattern: - plugins::tests::temp_dir(label) — label-parameterized - runtime::git_context::tests::temp_dir(label) — label-parameterized runtime/src/config.rs was missed. ## Fix Added process id + monotonically-incrementing atomic counter to the namespace, making every callsite provably unique regardless of clock resolution or scheduling: static COUNTER: AtomicU64 = AtomicU64::new(0); let pid = std::process::id(); let seq = COUNTER.fetch_add(1, Ordering::Relaxed); std::env::temp_dir().join(format!("runtime-config-{pid}-{nanos}-{seq}")) Chose counter+pid over the label-parameterized pattern to avoid touching all 20 callsites in the same commit (mechanical noise with no added safety — counter alone is sufficient). ## Verification Before: one failure per workspace run (config test flake). After: 5 consecutive `cargo test --workspace` runs — zero config test failures. Only pre-existing `resume_latest` flake remains (orthogonal, unrelated to this change). for i in 1 2 3 4 5; do cargo test --workspace; done # All 5 runs: config tests green. Only resume_latest flake appears. cargo test -p runtime # 465 passed; 0 failed ## ROADMAP.md Added Pinpoint #149 documenting the gap, root cause, and fix. Closes ROADMAP #149.	2026-04-21 20:54:12 +09:00
YeonGyu-Kim	f84c7c4ed5	feat: #148 + #128 closure — model provenance in claw status JSON/text ## Scope Two deltas in one commit: ### #128 closure (docs) Re-verified on main HEAD `4cb8fa0`: malformed `--model` strings already rejected at parse time (`validate_model_syntax` in parse_args). All historical repro cases now produce specific errors: claw --model '' → error: model string cannot be empty claw --model 'bad model' → error: invalid model syntax: 'bad model' contains spaces claw --model 'sonet' → error: invalid model syntax: 'sonet'. Expected provider/model or known alias claw --model '@invalid' → error: invalid model syntax: '@invalid'. Expected provider/model ... claw --model 'totally-not-real-xyz' → error: invalid model syntax: ... claw --model sonnet → ok, resolves to claude-sonnet-4-6 claw --model anthropic/claude-opus-4-6 → ok, passes through Marked #128 CLOSED in ROADMAP with repro block. Residual provenance gap split off as #148. ### #148 implementation Problem. After #128 closure, `claw status --output-format json` still surfaces only the resolved model string. No way for a claw to distinguish whether `claude-sonnet-4-6` came from `--model sonnet` (alias resolution) vs `--model claude-sonnet-4-6` (pass-through) vs `ANTHROPIC_MODEL` env vs `.claw.json` config vs compiled-in default. Debug forensics had to re-read argv instead of reading a structured field. Clawhip orchestrators sending `--model` couldn't confirm the flag was honored vs falling back to default. Fix. Added two fields to status JSON envelope: - `model_source`: "flag" \| "env" \| "config" \| "default" - `model_raw`: user's input before alias resolution (null on default) Text mode appends a `Model source` line under `Model`, showing the source and raw input (e.g. `Model source flag (raw: sonnet)`). Resolution order (mirrors resolve_repl_model but with source attribution): 1. If `--model` / `--model=` flag supplied → source: flag, raw: flag value 2. Else if ANTHROPIC_MODEL set → source: env, raw: env value 3. Else if `.claw.json` model key set → source: config, raw: config value 4. Else → source: default, raw: null ## Changes ### rust/crates/rusty-claude-cli/src/main.rs - Added `ModelSource` enum (Flag/Env/Config/Default) with `as_str()`. - Added `ModelProvenance` struct (resolved, raw, source) with three constructors: `default_fallback()`, `from_flag(raw)`, and `from_env_or_config_or_default(cli_model)`. - Added `model_flag_raw: Option<String>` field to `CliAction::Status`. - Parse loop captures raw input in `--model` and `--model=` arms. - Extended `parse_single_word_command_alias` to thread `model_flag_raw: Option<&str>` through. - Extended `print_status_snapshot` signature to accept `model_flag_raw: Option<&str>`. Resolves provenance at dispatch time (flag provenance from arg; else probe env/config/default). - Extended `status_json_value` signature with `provenance: Option<&ModelProvenance>`. On Some, adds `model_source` and `model_raw` fields; on None (legacy resume paths), omits them for backward compat. - Extended `format_status_report` signature with optional provenance. On Some, renders `Model source` line after `Model`. - Updated all existing callers (REPL /status, resume /status, tests) to pass None (legacy paths don't carry flag provenance). - Added 2 regression assertions in parse_args test covering both `--model sonnet` and `--model=...` forms. ### ROADMAP.md - Marked #128 CLOSED with re-verification block. - Filed #148 documenting the provenance gap split, fix shape, and acceptance criteria. ## Live verification $ claw --model sonnet --output-format json status \| jq '{model,model_source,model_raw}' {"model": "claude-sonnet-4-6", "model_source": "flag", "model_raw": "sonnet"} $ claw --output-format json status \| jq '{model,model_source,model_raw}' {"model": "claude-opus-4-6", "model_source": "default", "model_raw": null} $ ANTHROPIC_MODEL=haiku claw --output-format json status \| jq '{model,model_source,model_raw}' {"model": "claude-haiku-4-5-20251213", "model_source": "env", "model_raw": "haiku"} $ echo '{"model":"claude-opus-4-7"}' > .claw.json && claw --output-format json status \| jq '{model,model_source,model_raw}' {"model": "claude-opus-4-7", "model_source": "config", "model_raw": "claude-opus-4-7"} $ claw --model sonnet status Status Model claude-sonnet-4-6 Model source flag (raw: sonnet) Permission mode danger-full-access ... ## Tests - rusty-claude-cli bin: 177 tests pass (2 new assertions for #148) - Full workspace green except pre-existing resume_latest flake (unrelated) Closes ROADMAP #128, #148.	2026-04-21 20:48:46 +09:00

1 2 3 4 5

248 Commits