claw-code

mirror of https://github.com/ultraworkers/claw-code.git synced 2026-04-29 00:02:01 +08:00

Author	SHA1	Message	Date
YeonGyu-Kim	1754b4600a	docs: ERROR_HANDLING.md — unified error handler pattern for orchestration code Cycle #22 ships documentation that operationalizes cycles #178–#179. Problem context: After #178 (parse-error envelope) and #179 (stderr hygiene + real error message), claws can now build a unified error handler for all 14 clawable commands. But there was no guide on how to actually do that. Operators had the pieces; they didn't have the pattern. This file changes that. New file: ERROR_HANDLING.md - Quick reference: exit codes + envelope shapes (0=success, 1=error, 2=timeout) - One-handler pattern: ~80 lines of Python showing how to parse error.kind, check retryable, and decide recovery strategy - Four practical recovery patterns: - Retry on transient errors (filesystem, timeout) - Reuse session after timeout (if cancel_observed=true) - Validate command syntax before dispatch (dry-run --help) - Log errors for observability - Error kinds enumeration (parse, session_not_found, filesystem, runtime, timeout) - Common mistakes to avoid (6 patterns with BAD vs GOOD examples) - Testing your error handler (unit test examples) Operational impact: Orchestration code now has a canonical pattern. Claws can: - Copy-paste the run_claw_command() function (works for all commands) - Classify errors uniformly (no special cases per command) - Decide recovery deterministically (error.kind + retryable + cancel_observed) - Log/monitor/escalate with confidence Related cycles: - #178: Parse-error envelope (commands now emit structured JSON on invalid argv) - #179: Stderr hygiene + real message (JSON mode silences argparse, carries actual error) - #164 Stage B: cancel_observed field (callers know if session is safe for reuse) Updated CLAUDE.md: - Added ERROR_HANDLING.md to 'Related docs' section - Now documents the one-handler pattern as a guideline No code changes. No test changes. Pure documentation. This completes the documentation trail from protocol (SCHEMAS.md) → governance (OPT_OUT_AUDIT.md, OPT_OUT_DEMAND_LOG.md) → practice (ERROR_HANDLING.md).	2026-04-28 14:46:12 +09:00
YeonGyu-Kim	fddac8b9bc	docs: OPT_OUT_DEMAND_LOG.md — evidentiary base for governance decisions Cycle #21 ships governance infrastructure, not implementation. Maintainership mode means sometimes the right deliverable is a decision framework, not code. Problem context: OPT_OUT_AUDIT.md (cycle #18 bonus) established 'demand-backed audit' as the next step. But without a structured way to record demand signals, 'demand-backed' was just a slogan — the next audit cycle would have no evidence to work from. This commit creates the evidentiary base: New file: OPT_OUT_DEMAND_LOG.md - Per-surface entries for all 12 OPT_OUT commands (Groups A/B/C) - Current state: 0 signals across all surfaces (consistent with audit prediction) - Signal entry template with required fields: - Source (who/what) - Use case (concrete orchestration problem) - Markdown-alternative-checked (why existing output insufficient) - Date - Promotion thresholds: - 2+ independent signals for same surface → file promotion pinpoint - 1 signal + existing stable schema → file pinpoint for discussion - 0 signals → stays OPT_OUT (rationale preserved) Decision framework for cycle #22 (audit close): - If 0 signals total: move to PERMANENTLY_OPT_OUT, close audit - If 1-2 signals: file individual promotion pinpoints with evidence - If 3+ signals: reopen audit, question classification itself Updated files: - OPT_OUT_AUDIT.md: Added demand log reference in Related section - CLAUDE.md: Added prerequisites for promotions (must have logged signals), added 'File a demand signal' workflow section Philosophy: 'Prevent speculative expansion' — schema bloat protection discipline. Every new CLAWABLE surface is a maintenance tax. Evidence requirement keeps the protocol lean. OPT_OUT surfaces are intentionally not-clawable until proven otherwise by external demand. Operational impact: Next cycles can now: 1. Watch for real claws hitting OPT_OUT surface limits 2. Log signals in structured format (no ad-hoc filing) 3. Run audit at cycle #22 with actual data, not speculation No code changes. No test changes. Pure governance infrastructure. Related: #18 cycle (OPT_OUT_AUDIT.md), maintainership phase transition.	2026-04-28 14:46:12 +09:00
YeonGyu-Kim	73b7f55be2	fix: #179 — JSON mode now fully suppresses argparse stderr + preserves real error message Dogfood discovered #178 had two residual gaps: 1. Stderr pollution: argparse usage + error text still leaked to stderr even in JSON mode (envelope was correct on stdout, but stderr noise broke the 'machine-first protocol' contract — claws capturing both streams got dual output) 2. Generic error message: envelope carried 'invalid command or argument (argparse rejection)' instead of argparse's actual text like 'the following arguments are required: session_id' or 'invalid choice: typo (choose from ...)' Before #179: $ claw load-session --output-format json [stdout] {"error": {"message": "invalid command or argument (argparse rejection)"}} [stderr] usage: main.py load-session [-h] ... main.py load-session: error: the following arguments are required: session_id [exit 1] After #179: $ claw load-session --output-format json [stdout] {"error": {"message": "the following arguments are required: session_id"}} [stderr] (empty) [exit 1] Implementation: - New _ArgparseError exception class captures argparse's real message - main() monkey-patches parser.error (+ all subparser.error) in JSON mode to raise _ArgparseError instead of print-to-stderr + sys.exit(2) - _emit_parse_error_envelope() now receives the real message verbatim - Text mode path unchanged: still uses original argparse print+exit behavior Contract: - JSON mode: stdout carries envelope with argparse's actual error; stderr silent - Text mode: unchanged — argparse usage to stderr, exit 2 - Parse errors still error.kind='parse', retryable=false Test additions (5 new, 14 total in test_parse_error_envelope.py): - TestParseErrorStderrHygiene (5): - test_json_mode_stderr_is_silent_on_unknown_command - test_json_mode_stderr_is_silent_on_missing_arg - test_json_mode_envelope_carries_real_argparse_message - test_json_mode_envelope_carries_invalid_choice_details (verifies valid-choices list) - test_text_mode_stderr_preserved_on_unknown_command (backward compat) Operational impact: Claws capturing both stdout and stderr no longer get garbled output. The envelope message now carries discoverability info (valid command list, missing-arg name) that claws can use for retry/recovery without probing the CLI a second time. Test results: 201 → 206 passing, 3 skipped unchanged, zero regression. Pinpoint discovered via dogfood at 2026-04-22 20:30 KST (cycle #20).	2026-04-28 14:46:12 +09:00
YeonGyu-Kim	6006912ee8	feat: #178 — argparse errors emit JSON envelope when --output-format json requested Dogfood pinpoint: running 'claw nonexistent-command --output-format json' bypasses the JSON envelope contract — argparse dumps human-readable usage to stderr with exit 2, breaking the SCHEMAS.md guarantee that JSON mode returns structured output. Problem: $ claw nonexistent --output-format json usage: main.py [-h] {summary,manifest,...} ... main.py: error: argument command: invalid choice: 'nonexistent' (choose from ...) [exit 2 — no envelope, claws must parse argparse usage messages] Fix: $ claw nonexistent --output-format json { "timestamp": "2026-04-22T11:00:29Z", "command": "nonexistent-command", "exit_code": 1, "output_format": "json", "schema_version": "1.0", "error": { "kind": "parse", "operation": "argparse", "target": "nonexistent-command", "retryable": false, "message": "invalid command or argument (argparse rejection)", "hint": "run with no arguments to see available subcommands" } } [exit 1, clean JSON envelope on stdout per SCHEMAS.md] Changes: - src/main.py: - _wants_json_output(argv): pre-scan for --output-format json before parsing - _emit_parse_error_envelope(argv, message): emit wrapped envelope on stdout - main(): catch SystemExit from argparse; if JSON requested, emit envelope instead of letting argparse's help dump go through - tests/test_parse_error_envelope.py (new, 9 tests): - TestParseErrorJsonEnvelope (7): unknown command, =syntax, text mode unchanged, invalid flag, missing command, valid command unaffected, common fields - TestParseErrorSchemaCompliance (2): error.kind='parse', retryable=false Contract: - text mode (default): unchanged — argparse dumps help to stderr, exits 2 - JSON mode: envelope per SCHEMAS.md, error.kind='parse', exit 1 - Parse errors always retryable=false (typo won't self-fix) - error.kind='parse' already enumerated in SCHEMAS.md (no schema changes) This closes a real gap: claws invoking unknown commands in JSON mode can now route via exit code + envelope.kind='parse' instead of scraping argparse output. Test results: 192 → 201 passing, 3 skipped unchanged, zero regression. Pinpoint discovered via dogfood at 2026-04-22 19:59 KST (cycle #19).	2026-04-28 14:46:12 +09:00
YeonGyu-Kim	85e01eb167	docs: OPT_OUT_AUDIT.md — decision table for 12 exempt surfaces (#175–#177 prep) Filed explicit decision criteria for the 12 OPT_OUT surfaces (commands that do not support --output-format json) documented in test_cli_parity_audit.py. Categorized by rationale: - Group A (4): Rich-Markdown reports (summary, manifest, parity-audit, setup-report) Markdown-as-output is intentional; JSON would be information loss. Unlikely promotions (remain OPT_OUT long-term). - Group B (3): List filters with --query/--limit (subsystems, commands, tools) Query layer already exists; users have escape hatch. Remain OPT_OUT (promotion effort >> value). - Group C (5): Simulation/debug surfaces (remote-mode, ssh-mode, teleport-mode, direct-connect-mode, deep-link-mode) Intentionally non-production; JSON output doesn't add value. Remain OPT_OUT (simulation tools, not orchestration endpoints). Audit workflow documented: 1. Survey: Check if external claws actually request JSON versions 2. Cost estimate: Schema + tests for each surface 3. Value estimate: Real demand vs hypothetical 4. Decision: CLAWABLE, remain OPT_OUT, or new pinpoint Promotion criteria locked (only if clear use case + schema simple + demand exists). Outcome prediction: All 12 likely remain OPT_OUT (documented rationale per group). Timeline: Survey period (cycles #19–#21), final decision (cycle #22). Related pinpoints: #175 (summary/manifest JSON parallel?), #176 (--query-json?), #177 (mode simulators ever CLAWABLE?). This closes the documentation loop from cycles #173–#174 (protocol closure → field evolution → reframe). Now governance rules are explicit for future work.	2026-04-28 14:46:12 +09:00
YeonGyu-Kim	4942a4bd01	docs: CLAUDE.md reframe — market Python harness as machine-first protocol validation layer Rewrote CLAUDE.md to accurately describe the Python reference implementation: - Shifted framing from outdated Rust-focused guidance to protocol-validation focus - Clarified that src/tests/ is a dogfood surface proving SCHEMAS.md contract - Added machine-first marketing: deterministic, self-describing, clawable - Documented all 14 clawable commands (post-#164 Stage B promotion) - Added OPT_OUT surfaces audit queue (12 commands, future work) - Included protocol layers: Coverage → Enforcement → Documentation → Alignment - Added quick-start workflow for Python harness - Documented common workflows (add command, modify fields, promote OPT_OUT→CLAWABLE) - Emphasized protocol governance: SCHEMAS.md as source of truth - Exit codes documented as signals (0=success, 1=error, 2=timeout) Result: Developers can now understand the Python harness purpose without reading ROADMAP.md or inferring from test names. Protocol-first mental model is explicit. Related: #173 (protocol closure), #164 Stage B (field evolution), #174 (this cycle).	2026-04-28 14:46:12 +09:00
YeonGyu-Kim	e30c9736c6	feat: #164 Stage B CLOSURE — turn-loop JSON + cancel_observed coverage + CLAWABLE promotion Closes all three gaebal-gajae-identified closure criteria for #164 Stage B: 1. turn-loop runtime surface exposes cancel_observed consistently 2. cancellation path tests validate safe-to-reuse semantics 3. turn-loop promoted from OPT_OUT to CLAWABLE surface Changes: src/main.py: - turn-loop accepts --output-format {text,json} - JSON envelope includes per-turn cancel_observed + final_cancel_observed - All turn fields exposed: prompt, output, stop_reason, cancel_observed, matched_commands, matched_tools - Exit code 2 on final timeout preserved tests/test_cli_parity_audit.py: - CLAWABLE_SURFACES now contains 14 commands (was 13) - Removed 'turn-loop' from OPT_OUT_SURFACES - Parametrized --output-format test auto-validates turn-loop JSON tests/test_cancel_observed_field.py (new, 9 tests): - TestCancelObservedField (5 tests): field contract - default False - explicit True preserved - normal completion → False - bootstrap JSON exposes field - turn-loop JSON exposes per-turn field - TestCancelObservedSafeReuseSemantics (2 tests): reuse contract - timeout result has cancel_observed=True when signaled - engine.mutable_messages not corrupted after cancelled turn - engine accepts fresh message after cancellation - TestCancelObservedSchemaCompliance (2 tests): SCHEMAS.md contract - cancel_observed is always bool - final_cancel_observed convenience field present Closure criteria validated: - ✅ Field exposed in bootstrap JSON - ✅ Field exposed per-turn in turn-loop JSON - ✅ Field is always bool, never null - ✅ Safe-to-reuse: engine can accept fresh messages after cancellation - ✅ mutable_messages not corrupted by cancelled turn - ✅ turn-loop promoted from OPT_OUT (14 clawable commands now) Protocol now distinguishes at runtime: timeout + cancel_observed=false → infra/wedge (escalate) timeout + cancel_observed=true → cooperative cancellation (safe to retry) Test results: 182 → 192 passing, +10 tests, zero regression, 3 skipped unchanged. Closes #164 Stage B. Stage C (async-native preemption) remains future work.	2026-04-28 14:46:12 +09:00
YeonGyu-Kim	f3891a73ee	feat: #164 Stage B prep — add cancel_observed field to TurnResult #164 Stage B requires exposing whether cancellation was observed at the turn-result level. This commit adds the infrastructure field: Changes: - TurnResult.cancel_observed: bool = False (query_engine.py) - _build_timeout_result() accepts cancel_observed parameter (runtime.py) - Two timeout paths now pass cancel_event.is_set() to signal observation (runtime.py) - bootstrap command includes cancel_observed in turn JSON (main.py) - SCHEMAS.md documents Turn Result Fields with cancel_observed contract Usage: When a turn timeout occurs, cancel_observed=true indicates that the engine observed the cancellation event being set. This allows callers to distinguish: - timeout with no cancel → infrastructure/network stall - timeout with cancel observed → cooperative cancellation was triggered Backward compat: - Existing TurnResult construction without cancel_observed defaults to False - bootstrap JSON output still validates per SCHEMAS.md (new field is always present) Test results: 182 passing, 3 skipped, zero regression. Related: #161 (wall-clock timeout), #164 (cancellation observability protocol) ROADMAP continues #164 with Stage C (test coverage for cancellation + turn envelope).	2026-04-28 14:46:12 +09:00
YeonGyu-Kim	07ec9fa5cd	feat: #173 — wrap_json_envelope() applied to all 13 clawable commands (LOOP CLOSED) Completes the coverage → enforcement → documentation → alignment cycle. Every clawable command now emits the canonical JSON envelope per SCHEMAS.md: Common fields (now real in output): - timestamp (ISO 8601 UTC) - command (argv[1]) - exit_code (0/1/2) - output_format ('json') - schema_version ('1.0') 13 commands wrapped: - list-sessions, delete-session, load-session, flush-transcript - show-command, show-tool - exec-command, exec-tool, route, bootstrap - command-graph, tool-pool, bootstrap-graph Implementation: - Added wrap_json_envelope() helper in src/main.py - Wrapped all 18 JSON output paths (13 success + 5 error paths) - Applied exit_code=1 to error/not-found envelopes - Kept text mode byte-identical (backward compat preserved) Test updates: - 3 skipped common-field tests now pass automatically - 3 existing tests updated to verify common envelope fields while preserving command-specific field checks - test_list_sessions_cli_runs, test_delete_session_cli_idempotent, test_load_session_cli::test_json_mode_on_success Full suite: 179 → 182 passing (+3 activated from skipped), zero regression. Loop completion: Coverage (#167-#170) ✅ All 13 commands accept --output-format Enforcement (#171) ✅ CI blocks new commands without --output-format Documentation (#172) ✅ SCHEMAS.md defines envelope contract Alignment (#173 this) ✅ Actual output matches SCHEMAS.md contract Example output now: $ claw list-sessions --output-format json { "timestamp": "2026-04-22T10:34:12Z", "command": "list-sessions", "exit_code": 0, "output_format": "json", "schema_version": "1.0", "sessions": ["alpha", "bravo"], "count": 2 } Closes ROADMAP #173. Protocol is now documented AND real. Claws can build ONE error handler, ONE timestamp parser, ONE version check instead of 13 special cases.	2026-04-28 14:46:12 +09:00
YeonGyu-Kim	5027fa0aad	test: #173 prep — JSON envelope field consistency validation Adds parametrised test suite validating that clawable-surface commands' JSON output matches their declared envelope contracts per SCHEMAS.md. Two phases: Phase 1 (this commit): Consistency baseline. - Collect ENVELOPE_CONTRACTS registry mapping each command to its required and optional fields - TestJsonEnvelopeConsistency: parametrised test iterates over 13 commands, invokes with --output-format json, validates that actual JSON envelope contains all required fields - test_envelope_field_value_types: spot-check types (int, str, list) for consistency Phase 2 (future #173): Common field wrapping. - Once wrap_json_envelope() is applied, all commands will emit timestamp, command, exit_code, output_format, schema_version - Currently skipped via @pytest.mark.skip, these tests will activate automatically when wrapping is implemented: TestJsonEnvelopeCommonFieldPrep::test_all_envelopes_include_timestamp TestJsonEnvelopeCommonFieldPrep::test_all_envelopes_include_command TestJsonEnvelopeCommonFieldPrep::test_all_envelopes_include_exit_code_and_schema_version Why this matters: - #172 documented the JSON contract; this test validates it - Currently detects when actual output diverges from SCHEMAS.md (e.g. list-sessions emits 'count', not 'sessions_count') - As #173 wraps commands, test suite auto-validates new common fields - Prevents regression: accidental field removal breaks the test suite Current status: 11 passed (consistency), 6 skipped (awaiting #173) Full suite: 168 → 179 passing, zero regression. Closes ROADMAP #173 prep (framework for common field validation). Actual field wrapping remains for next cycle.	2026-04-28 14:46:12 +09:00
YeonGyu-Kim	bf6dfb11fa	docs: add SCHEMAS.md — field-level JSON contract for clawable CLI surfaces Documents the unified JSON envelope contract across all 13 clawable-surface commands. Extends the parity work (#171) to the field level: every command that accepts --output-format json must emit predictable field names, types, and optionality. Common fields (all envelopes): - timestamp (ISO 8601 UTC) - command (argv[1]) - exit_code (0/1/2) - output_format ('json') - schema_version ('1.0') Error envelope (exit 1, failure): - error.kind (enum: filesystem\|auth\|session\|parse\|runtime\|mcp\|delivery\|usage\|policy\|unknown) - error.operation (syscall/method name) - error.target (resource path/name) - error.retryable (bool) - error.message (platform error text) - error.hint (optional: actionable next step) Not-found envelope (exit 1, not a failure): - found: false - error.kind (enum: command_not_found\|tool_not_found\|session_not_found) - error.message, error.retryable Per-command success schemas documented for 13 commands: list-sessions, delete-session, load-session, flush-transcript, show-command, show-tool, exec-command, exec-tool, route, bootstrap, command-graph, tool-pool, bootstrap-graph Why this matters: - #171 enforced that commands have --output-format; #172 enforces that the JSON fields are PREDICTABLE - Downstream claws can build ONE error handler + per-command jq query, not special-casing logic per command family - Field consistency enables generic automation patterns (error dedupe, failure aggregation, cross-command monitoring) Related: - ROADMAP #172 (field-level contract stabilization, Gaebal-gajae priority #1) - ROADMAP #171 (parity audit CI automation — already landed) - #164 Stage B (cancellation observability — adds cancel_observed field) - #164 Stage A (already done — adds stop_reason field to TurnResult) Fixture/regression testing: - Golden JSON snapshots: tests/fixtures/json/<command>.json (future) - Consistency test: test_json_envelope_field_consistency.py (future) - Versioning: schema_version='1.0' for current; bump to 2.0 for breaking changes	2026-04-28 14:46:12 +09:00
YeonGyu-Kim	02c4cd96c4	fix: #171 — automate cross-surface CLI parity audit via argparse introspection Stops manual parity inspection from being a human-noticed concern. When a developer adds a new subcommand to the claw-code CLI, this test suite enforces explicit classification: - CLAWABLE_SURFACES: MUST accept --output-format {text,json} - OPT_OUT_SURFACES: explicitly exempt with documented rationale A new command that forgets to opt into one of these two sets FAILS loudly with TestCommandClassificationCoverage::test_every_registered_ command_is_classified. No silent drift possible. Technique: argparse introspection at test time walks the _actions tree, discovers every registered subcommand, and compares against the declared classification sets. Contract is enforced machine-first instead of depending on human review. Three test classes covering three invariants: TestClawableSurfaceParity (14 tests): - test_all_clawable_surfaces_accept_output_format: every member of CLAWABLE_SURFACES has --output-format flag registered - test_clawable_surface_output_format_choices (parametrised over 13 commands): each must accept exactly {text, json} and default to 'text' for backward compat TestCommandClassificationCoverage (3 tests): - test_every_registered_command_is_classified: any new subcommand must be explicitly added to CLAWABLE_SURFACES or OPT_OUT_SURFACES - test_no_command_in_both_sets: sanity check for classification conflicts - test_all_classified_commands_actually_exist: no phantom commands (catches stale entries after a command is removed) TestJsonOutputContractEndToEnd (10 tests): - test_command_emits_parseable_json (parametrised over 10 clawable commands): actual subprocess invocation with --output-format json produces valid parseable JSON on stdout Classification: CLAWABLE_SURFACES (13): Session lifecycle: list-sessions, delete-session, load-session, flush-transcript Inspect: show-command, show-tool Execution: exec-command, exec-tool, route, bootstrap Diagnostic inventory: command-graph, tool-pool, bootstrap-graph OPT_OUT_SURFACES (12): Rich-Markdown reports (future JSON schema): summary, manifest, parity-audit, setup-report List filter commands: subsystems, commands, tools Turn-loop: structured_output is future work Simulation/debug: remote-mode, ssh-mode, teleport-mode, direct-connect-mode, deep-link-mode Full suite: 141 → 168 passing (+27), zero regression. Closes ROADMAP #171. Why this matters: Before: parity was human-monitored; every new command was a drift risk. The CLUSTER 3 sweep required manually auditing every subcommand and landing fixes as separate pinpoints. After: parity is machine-enforced. If a future developer adds a new command without --output-format, the test suite blocks it immediately with a concrete error message pointing at the missing flag. This is the first step in Gaebal-gajae's identified upper-level work: operationalised parity instead of aspirational parity. Related clusters: - Clawability principle: machine-first protocol enforcement - Test-first regression guard: extends TestTripletParityConsistency (#160/#165) and TestFullFamilyParity (#166) from per-cluster parity to cross-surface parity	2026-04-28 14:46:12 +09:00
YeonGyu-Kim	67f1e87b38	fix: #170 — bootstrap-graph now accepts --output-format; diagnostic surface parity complete Final diagnostic surface in the JSON parity sweep: bootstrap-graph (the runtime bootstrap/prefetch visualization) now supports --output-format. Concrete addition: - bootstrap-graph: --output-format {text,json} JSON envelope: {stages: [str], note: 'bootstrap-graph is markdown-only in this version'} Envelope explanation: bootstrap-graph's Markdown output is rich and textual; raw JSON embedding maintains the markdown format (split into lines array) rather than attempting lossy structural extraction that would lose information. This is an honest limitation in this cycle; full JSON schema can be added in a future audit if claws require structured bootstrap data (dependency graphs, prefetch timing, etc.). Backward compatibility: - Default is 'text' (Markdown unchanged) Closes ROADMAP #170. Related: #167, #168, #169. Diagnostic/inventory surface family is now uniformly JSON-capable. Summary, manifest, parity-audit, setup-report, command-graph, tool-pool, bootstrap-graph all accept --output-format.	2026-04-28 14:46:12 +09:00
YeonGyu-Kim	98c295770b	fix: #169 — command-graph and tool-pool now accept --output-format; diagnostic inventory JSON parity Extends the diagnostic surface audit with the two inventory-structure commands: command-graph (command family segmentation) and tool-pool (assembled tool inventory). Both now expose their underlying rich datastructures via JSON envelope. Concrete additions: - command-graph: --output-format {text,json} - tool-pool: --output-format {text,json} JSON envelope shapes: command-graph: {builtins_count, plugin_like_count, skill_like_count, total_count, builtins: [{name, source_hint}], plugin_like: [{name, source_hint}], skill_like: [{name, source_hint}]} tool-pool: {simple_mode, include_mcp, tool_count, tools: [{name, source_hint}]} Backward compatibility: - Default is 'text' (Markdown unchanged) - Text output byte-identical to pre-#169 Tests (4 new, test_command_graph_tool_pool_output_format.py): - TestCommandGraphOutputFormat (2): JSON structure + text compat - TestToolPoolOutputFormat (2): JSON structure + text compat Full suite: 137 → 141 passing, zero regression. Closes ROADMAP #169. Why this matters: Claws auditing the codebase can now ask 'what commands exist' and 'what tools exist' and get structured, parseable answers instead of regex-parsing Markdown headers and counting list items. Related clusters: - Diagnostic surfaces (#169 adds to #167/#168 work-verb parity) - Inventory introspection (command-graph + tool-pool are the two foundational 'what do we have?' queries)	2026-04-28 14:46:12 +09:00
YeonGyu-Kim	daf53fa97f	fix: #168 — exec-command / exec-tool / route / bootstrap now accept --output-format; CLI family JSON parity COMPLETE Extends the #167 inspect-surface parity fix to the four remaining CLI outliers: the commands claws actually invoke to DO work, not just inspect state. After this commit, the entire claw-code CLI family speaks a unified JSON envelope contract. Concrete additions: - exec-command: --output-format {text,json} - exec-tool: --output-format {text,json} - route: --output-format {text,json} - bootstrap: --output-format {text,json} JSON envelope shapes: exec-command (handled): {name, prompt, source_hint, handled: true, message} exec-command (not-found): {name, prompt, handled: false, error: {kind:'command_not_found', message, retryable: false}} exec-tool (handled): {name, payload, source_hint, handled: true, message} exec-tool (not-found): {name, payload, handled: false, error: {kind:'tool_not_found', message, retryable: false}} route: {prompt, limit, match_count, matches: [{kind, name, score, source_hint}]} bootstrap: {prompt, limit, setup: {python_version, implementation, platform_name, test_command}, routed_matches: [{kind, name, score, source_hint}], command_execution_messages: [str], tool_execution_messages: [str], turn: {prompt, output, stop_reason}, persisted_session_path} Exit codes (unchanged from pre-#168): 0 = success 1 = exec not-found (exec-command, exec-tool only) Backward compatibility: - Default (no --output-format) is 'text' - exec-command/exec-tool text output byte-identical - route text output: unchanged tab-separated kind/name/score/source_hint - bootstrap text output: unchanged Markdown runtime session report Tests (13 new, test_exec_route_bootstrap_output_format.py): - TestExecCommandOutputFormat (3): handled + not-found JSON; text compat - TestExecToolOutputFormat (3): handled + not-found JSON; text compat - TestRouteOutputFormat (3): JSON envelope; zero-matches case; text compat - TestBootstrapOutputFormat (2): JSON envelope; text-mode Markdown compat - TestFamilyWideJsonParity (2): parametrised over ALL 6 family commands (show-command, show-tool, exec-command, exec-tool, route, bootstrap) — every one accepts --output-format json and emits parseable JSON; every one defaults to text mode without a leading {. One future regression on any family member breaks this test. Full suite: 124 → 137 passing, zero regression. Closes ROADMAP #168. This completes the CLI-wide JSON parity sweep: - Session-lifecycle family: #160 (list/delete), #165 (load), #166 (flush) - Inspect family: #167 (show-command, show-tool) - Work-verb family: #168 (exec-command, exec-tool, route, bootstrap) ENTIRE CLI SURFACE is now machine-readable via --output-format json with typed errors, deterministic exit codes, and consistent envelope shape. Claws no longer need to regex-parse any CLI output. Related clusters: - Clawability principle: 'machine-readable in state and failure modes' (ROADMAP top-level). 9 pinpoints in this cluster; all now landed. - Typed-error envelope consistency: command_not_found / tool_not_found / session_not_found / session_load_failed all share {kind, message, retryable} shape. - Work-verb semantics: exec-* surfaces expose 'handled' boolean (not 'found') because 'not handled' is the operational signal — claws dispatch on whether the work was performed, not whether the entry exists in the inventory.	2026-04-28 14:46:12 +09:00
YeonGyu-Kim	6d4f78b8a0	fix: #167 — show-command and show-tool now accept --output-format flag; CLI parity with session-lifecycle family Closes the inspect-capability parity gap: show-command and show-tool were the only discovery/inspection CLI commands lacking --output-format support, making them outliers in the ecosystem that already had unified JSON contracts across list-sessions, load-session, delete-session, and flush-transcript (#160/#165/#166). Concrete additions: - show-command: --output-format {text,json} - show-tool: --output-format {text,json} JSON envelope shape (found case): {name, found: true, source_hint, responsibility} JSON envelope shape (not-found case): {name, found: false, error: {kind:'command_not_found'\|'tool_not_found', message, retryable: false}} Exit codes: 0 = success 1 = not found Backward compatibility: - Default (no --output-format) is 'text' (unchanged) - Text output byte-identical to pre-#167 (three newline-separated lines) Tests (10 new, test_show_command_tool_output_format.py): - TestShowCommandOutputFormat (5): found + not-found in JSON; text mode backward compat; text is default - TestShowToolOutputFormat (3): found + not-found in JSON; text mode backward compat - TestShowCommandToolFormatParity (2): both accept same flag choices; consistent JSON envelope shape Full suite: 114 → 124 passing, zero regression. Closes ROADMAP #167. Why this matters: Before: Claws calling show-command/show-tool had to parse human-readable prose output via regex, with no structured error signal. After: Same envelope contract as load-session and friends: JSON-first, typed errors, machine-parseable. Related clusters: - Session-lifecycle CLI parity family (#160, #165, #166, #167) - Machine-readable error contracts (same vein as #162 atomicity + #164 cancellation state-safety: structured boundaries for orchestration)	2026-04-28 14:46:12 +09:00
YeonGyu-Kim	f047347294	fix: #164 Stage A — cooperative cancellation via cancel_event in submit_message Closes the #161 follow-up gap identified in review: wall-clock timeout bounded caller-facing wait but did not cancel the underlying provider thread, which could silently mutate mutable_messages / transcript_store / permission_denials / total_usage after the caller had already observed stop_reason='timeout'. A ghost turn committed post-deadline would poison any session that got persisted afterwards. Stage A scope (this commit): runtime + engine layer cooperative cancel. Engine layer (src/query_engine.py): - submit_message now accepts cancel_event: threading.Event \| None = None - Two safe checkpoints: 1. Entry (before max_turns / budget projection) — earliest possible return 2. Post-budget (after output synthesis, before mutation) — catches cancel that arrives while output was being computed - Both checkpoints return stop_reason='cancelled' with state UNCHANGED (mutable_messages, transcript_store, permission_denials, total_usage all preserved exactly as on entry) - cancel_event=None preserves legacy behaviour with zero overhead (no checkpoint checks at all) Runtime layer (src/runtime.py): - run_turn_loop creates one cancel_event per invocation when a deadline is in play (and None otherwise, preserving legacy fast path) - Passes the same event to every submit_message call across turns, so a late cancel on turn N-1 affects turn N - On timeout (either pre-call or mid-call), runtime explicitly calls cancel_event.set() before future.cancel() + synthesizing the timeout TurnResult. This upgrades #161's best-effort future.cancel() (which only cancels not-yet-started futures) to cooperative mid-flight cancel. Stop reason taxonomy after Stage A: 'completed' — turn committed, state mutated exactly once 'max_budget_reached' — overflow, state unchanged (#162) 'max_turns_reached' — capacity exceeded, state unchanged 'cancelled' — cancel_event observed, state unchanged (#164 Stage A) 'timeout' — synthesised by runtime, not engine (#161) The 'cancelled' vs 'timeout' split matters: - 'timeout' is the runtime's best-effort signal to the caller: deadline hit - 'cancelled' is the engine's confirmation: cancel was observed + honoured If the provider call wedges entirely (never reaches a checkpoint), the caller still sees 'timeout' and the thread is leaked — but any NEXT submit_message call on the same engine observes the event at entry and returns 'cancelled' immediately, preventing ghost-turn accumulation. This is the honest cooperative limit in Python threading land; true preemption requires async-native provider IO (future work, not Stage A). Tests (29 new tests, tests/test_submit_message_cancellation.py + tests/ test_run_turn_loop_cancellation.py): Engine-layer (12 tests): - TestCancellationBeforeCall (5): pre-set event returns 'cancelled' immediately; mutable_messages, transcript_store, usage, permission_denials all preserved - TestCancellationAfterBudgetCheck (1): cancel set mid-call (after projection, before commit) still honoured; output synthesised but state untouched - TestCancellationAfterCommit (2): post-commit cancel not observable (honest limit) BUT next call on same engine observes it + returns 'cancelled' - TestLegacyCallersUnchanged (3): cancel_event=None preserves #162 atomicity + max_turns contract with zero behaviour change - TestCancellationVsOtherStopReasons (2): cancel precedes max_turns check; cancel does not retroactively override a completed turn Runtime-layer (5 tests): - TestTimeoutPropagatesCancelEvent (3): submit_message receives a real Event object when deadline is set; None in legacy mode; timeout actually calls event.set() so in-flight threads observe at their next checkpoint - TestCancelEventSharedAcrossTurns (1): same event object passed to every turn (object identity check) — late cancel on turn N-1 must affect turn N Regression: 3 existing timeout test mocks updated to accept cancel_event kwarg (mocks that previously had signature (prompt, commands, tools, denials) now have (prompt, commands, tools, denials, cancel_event=None) since runtime passes cancel_event positionally on the timeout path). Full suite: 97 → 114 passing, zero regression. Closes ROADMAP #164 Stage A. What's explicitly NOT in Stage A: - Preemptive cancellation of wedged provider IO (requires asyncio-native provider path; larger refactor) - Timeout on the legacy unbounded run_turn_loop path (by design: legacy callers opt out of cancellation entirely) - CLI exposure of 'cancelled' as a distinct exit code (currently 'cancelled' maps to the same stop_reason != 'completed' break condition as others; CLI surface for cancel is a separate pinpoint if warranted)	2026-04-28 14:46:12 +09:00
YeonGyu-Kim	2c47609b8e	chore: gitignore .port_sessions/ to prevent dogfood-run pollution Every 'claw flush-transcript' call without --directory writes to .port_sessions/<uuid>.json in CWD. Without a gitignore entry, every dogfood run leaves dozens of untracked files in the repo, masking real changes in 'git status' output. Now that #160/#166 ship structured session lifecycle commands and deterministic --session-id, this directory is purely transient by default — belongs in .gitignore.	2026-04-28 14:46:12 +09:00
YeonGyu-Kim	5b973f1fed	fix: #166 — flush-transcript now accepts --directory / --output-format / --session-id; session-creation command parity with #160/#165 lifecycle triplet	2026-04-28 14:46:12 +09:00
YeonGyu-Kim	e81e6c70a8	fix: #159 — run_turn_loop no longer hardcodes empty denied_tools; permission denials now parity-match bootstrap_session #159: multi-turn sessions had a silent security asymmetry: denied_tools were always empty in run_turn_loop, even though bootstrap_session inferred them from the routed matches. Result: any tool gated as 'destructive' (bash-family commands, rm, etc) would silently appear unblocked across all turns in multi-turn mode, giving a false 'clean' permission picture to any claw consuming TurnResult.permission_denials. Fix: compute denied_tools once at loop start via _infer_permission_denials, then pass the same denials to every submit_message call (both timeout and legacy unbounded paths). This mirrors the existing bootstrap_session pattern. Acceptance: run_turn_loop('run bash ls').permission_denials now matches what bootstrap_session returns — both infer the same denials from the routed matches. Multi-turn security posture is symmetric. Tests (tests/test_run_turn_loop_permissions.py, 2 tests): - test_turn_loop_surfaces_permission_denials_like_bootstrap: Symmetry check confirming both paths infer identical denials for destructive tools - test_turn_loop_with_continuation_preserves_denials: Denials inferred at loop start are passed consistently to all turns; captured via mock and verified non-empty Full suite: 82/82 passing, zero regression. Closes ROADMAP #159.	2026-04-28 14:46:12 +09:00
YeonGyu-Kim	118cdc8030	fix: #165 — load-session CLI now parity-matches list/delete (--directory, --output-format, typed JSON errors) The #160 session-lifecycle CLI triplet was asymmetric: list-sessions and delete-session accepted --directory + --output-format and emitted typed JSON error envelopes, but load-session had neither flag and dumped a raw Python traceback (including the SessionNotFoundError class name) on a missing session. Three concrete impacts this fix closes: 1. Alternate session-store locations (e.g. /tmp/claw-run-XXX/.port_sessions) were unreachable via load-session; claws had to chdir or monkeypatch DEFAULT_SESSION_DIR to work around it. 2. Not-found emitted a multi-line Python stack, not a parseable envelope. Claws deciding retry/escalate/give-up had only exit code 1 to work with. 3. The traceback leaked 'src.session_store.SessionNotFoundError' verbatim, coupling version-pinned claws to our internal exception class name. Now all three triplet commands accept the same flag pair and emit the same JSON error shape: Success (json mode): {"session_id": "alpha", "loaded": true, "messages_count": 3, "input_tokens": 42, "output_tokens": 99} Not-found: {"session_id": "missing", "loaded": false, "error": {"kind": "session_not_found", "message": "session 'missing' not found in /path", "directory": "/path", "retryable": false}} Corrupted file: {"session_id": "broken", "loaded": false, "error": {"kind": "session_load_failed", "message": "...", "directory": "/path", "retryable": true}} Exit code contract: - 0 on successful load - 1 on not-found (preserves existing $?) - 1 on OSError/JSONDecodeError (distinct 'kind' in JSON) Backward compat: legacy 'claw load-session ID' text output unchanged byte-for-byte. Only new behaviour is the flags and structured error path. Tests (tests/test_load_session_cli.py, 13 tests): - TestDirectoryFlagParity (2): --directory works + fallback to CWD/.port_sessions - TestOutputFormatFlagParity (2): json schema + text-mode backward compat - TestNotFoundTypedError (2): JSON envelope on not-found; no traceback in either mode; no internal class name leak - TestLoadFailedDistinctFromNotFound (1): corrupted file = session_load_failed with retryable=true, distinct from session_not_found - TestTripletParityConsistency (6): parametrised over [list, delete, load] * [--directory, --output-format] — explicit parity guard for future regressions Full suite: 80/80 passing, zero regression. Discovered via Jobdori dogfood sweep 2026-04-22 17:44 KST — ran 'claw load-session nonexistent' expecting a clean error, got a Python traceback. Filed #165 + fixed in same commit. Closes ROADMAP #165.	2026-04-28 14:46:12 +09:00
YeonGyu-Kim	f953c73262	fix: #163 — remove [turn N] suffix pollution from run_turn_loop; file #164 timeout-cancellation followup #163: run_turn_loop no longer injects f'{prompt} [turn N]' into follow-up prompts. The suffix was never defined or interpreted anywhere — not by the engine, not by the system prompt, not by any LLM. It looked like a real user-typed annotation in the transcript and made replay/analysis fragile. New behaviour: - turn 0 submits the original prompt (unchanged) - turn > 0 submits caller-supplied continuation_prompt if provided, else the loop stops cleanly — no fabricated user turn - added continuation_prompt: str \| None = None parameter to run_turn_loop - added --continuation-prompt CLI flag for claws scripting multi-turn loops - zero '[turn' strings ever appear in mutable_messages or stdout now Behaviour change for existing callers: - Before: run_turn_loop(prompt, max_turns=3) submitted 3 turns ('prompt', 'prompt [turn 2]', 'prompt [turn 3]') - After: run_turn_loop(prompt, max_turns=3) submits 1 turn ('prompt') - To preserve old multi-turn behaviour, pass continuation_prompt='Continue.' or any structured follow-up text One existing timeout test (test_budget_is_cumulative_across_turns) updated to pass continuation_prompt so the cumulative-budget contract is actually exercised across turns instead of trivially satisfied by a one-turn loop. #164 filed: addresses reviewer feedback on #161. The wall-clock timeout bounds the caller-facing wait, but the underlying submit_message worker thread keeps running and can mutate engine state after the timeout TurnResult is returned. A cooperative cancel_event pattern is sketched in the pinpoint; real asyncio.Task.cancel() support will come once provider IO is async-native (larger refactor). Tests (tests/test_run_turn_loop_continuation.py, 8 tests): - TestNoTurnSuffixInjection (2): zero '[turn' strings in any submitted prompt, both default and explicit-continuation paths - TestContinuationDefaultStopsAfterTurnZero (2): default loops run exactly one turn; engine.submit_message called exactly once despite max_turns=10 - TestExplicitContinuationBehaviour (2): turn 0 = original, turn N = continuation verbatim; max_turns still respected - TestCLIContinuationFlag (2): CLI default emits only '## Turn 1'; --continuation-prompt wires through to multi-turn behaviour Full suite: 67/67 passing. Closes ROADMAP #163. Files #164.	2026-04-28 14:46:12 +09:00
YeonGyu-Kim	9e40edae7d	fix: #162 — budget-overflow no longer corrupts session state in submit_message Previously, QueryEnginePort.submit_message() checked the token budget AFTER appending the prompt to mutable_messages, transcript_store, and permission_denials, and AFTER calling compact_messages_if_needed(). On overflow it set stop_reason='max_budget_reached' but the overflow turn was already committed. Any caller that persisted the session afterwards wrote the rejected prompt to disk — the session was silently poisoned even though the TurnResult said the turn never completed. Fix: - Restructure submit_message so the budget check early-returns BEFORE any mutation of mutable_messages, transcript_store, permission_denials, or total_usage. - The returned TurnResult.usage reflects pre-call state (overflow never advanced the usage counter). - Normal (in-budget) path unchanged: mutation happens exactly once, at the end, only on 'completed' results. This closes the atomicity gap: submit_message is now either 'turn committed' (stop_reason='completed') or 'turn rejected, state untouched' (stop_reason in {'max_budget_reached', 'max_turns_reached'}). Callers can safely retry with a fresh budget or a smaller prompt without worrying about phantom committed turns from prior rejections. Tests (tests/test_submit_message_budget.py, 10 tests): - TestBudgetOverflowDoesNotMutate (5): mutable_messages / transcript / permission_denials / total_usage / TurnResult.usage all pre-mutation after overflow - TestOverflowPersistence (2): first-turn overflow persists empty session; successful-turn-then-overflow persists only the successful turn - TestEngineUsableAfterOverflow (2): subsequent in-budget call still works with no residue; repeated overflows don't accumulate hidden state - TestNormalPathStillCommits (1): regression guard — non-overflow path still commits mutable_messages/transcript/usage as expected Full suite: 59/59 passing, zero regression. Blocker: none. Closes ROADMAP #162.	2026-04-28 14:46:12 +09:00
YeonGyu-Kim	c8da8e6b0a	fix: #161 — wall-clock timeout for run_turn_loop; stalled turns now abort with stop_reason='timeout' Previously, run_turn_loop was bounded only by max_turns (turn count). If engine.submit_message stalled — slow provider, hung network, infinite stream — the loop blocked indefinitely with no cancellation path. Claws calling run_turn_loop in CI or orchestration had no reliable way to enforce a deadline; the loop would hang until OS kill or human intervention. Fix: - Add timeout_seconds parameter to run_turn_loop (default None = legacy unbounded). - When set, each submit_message call runs inside a ThreadPoolExecutor and is bounded by the remaining wall-clock budget (total across all turns, not per-turn). - On timeout, synthesize a TurnResult with stop_reason='timeout' carrying the turn's prompt and routed matches so transcripts preserve orchestration context. - Exhausted/negative budget short-circuits before calling submit_message. - Legacy path (timeout_seconds=None) bypasses the executor entirely — zero overhead for callers that don't opt in. CLI: - Added --timeout-seconds flag to 'turn-loop' command. - Exit code 2 when the loop terminated on timeout (vs 0 for completed), so shell scripts can distinguish 'done' from 'budget exhausted'. Tests (tests/test_run_turn_loop_timeout.py, 6 tests): - Legacy unbounded path unchanged (timeout_seconds=None never emits 'timeout') - Hung submit_message aborted within budget (0.3s budget, 5s mock hang → exit <1.5s) - Budget is cumulative across turns (0.6s budget, 0.4s per turn, not per-turn) - timeout_seconds=0 short-circuits first turn without calling submit_message - Negative timeout treated as exhausted (guard against caller bugs) - Timeout TurnResult carries correct prompt, matches, UsageSummary shape Full suite: 49/49 passing, zero regression. Blocker: none. Closes ROADMAP #161.	2026-04-28 14:46:12 +09:00
YeonGyu-Kim	614df6ecc2	feat(#160 ): wire claw list-sessions and delete-session CLI commands Closes the last #160 gap: claws can now manage session lifecycle entirely through the CLI without filesystem hacks. New commands: - claw list-sessions [--directory DIR] [--output-format text\|json] Enumerates stored session IDs. JSON mode emits {sessions, count}. Missing/empty directories return empty list (exit 0), not an error. - claw delete-session SESSION_ID [--directory DIR] [--output-format text\|json] Idempotent: not-found is exit 0 with status='not_found' (no raise). Partial-failure: exit 1 with typed JSON error envelope: {session_id, deleted: false, error: {kind, message, retryable}} The 'session_delete_failed' kind is retryable=true so orchestrators know to retry vs escalate. Public API surface extended in src/__init__.py: - list_sessions, session_exists, delete_session - SessionNotFoundError, SessionDeleteError Tests added (tests/test_porting_workspace.py): - test_list_sessions_cli_runs: text + json modes against tempdir - test_delete_session_cli_idempotent: first call deleted=true, second call deleted=false (exit 0, status=not_found) - test_delete_session_cli_partial_failure_exit_1: permission error surfaces as exit 1 + typed JSON error with retryable=true All 43 tests pass. The session storage abstraction chapter is closed: - storage layer decoupled from claw code (#160 initial impl) - delete contract hardened + caller-audited (#160 hardening pass) - CLI wired with idempotency preserved at exit-code boundary (this commit)	2026-04-28 14:46:12 +09:00
YeonGyu-Kim	2e6553a8e5	fix(#160 ): harden delete_session contract — idempotency, race-safety, typed partial-failure Addresses review feedback on initial #160 implementation: 1. delete_session() contract now explicit: - Idempotent: delete(x); delete(x) is safe, second call returns False - Race-safe: TOCTOU between exists()/unlink() eliminated via unlink-then-catch - Partial-failure typed: permission/IO errors wrapped in SessionDeleteError (OSError subclass) so callers can distinguish 'not found' (return False) from 'could not delete' (raise) 2. New SessionDeleteError class for partial-failure surfacing. Distinct from SessionNotFoundError (KeyError subclass for missing loads). 3. Caller audit confirmed: no code outside session_store globs .port_sessions or imports DEFAULT_SESSION_DIR. Storage layout is fully encapsulated. 4. Added tests/test_session_store.py — 18 tests covering: - list_sessions: empty/missing/sorted/non-json filter - session_exists: true/false/missing-dir - load_session: SessionNotFoundError typing (KeyError subclass, not FileNotFoundError) - delete_session idempotency: first/second/never-existed calls - delete_session partial-failure: SessionDeleteError wraps OSError - delete_session race-safety: concurrent deletion returns False, not raise - Full save->list->exists->load->delete roundtrip All 18 tests pass. Merge-ready: contract documented, caller-audited, race-safe.	2026-04-28 14:46:12 +09:00
YeonGyu-Kim	12cb0ed79f	fix: #160 — add list_sessions, session_exists, delete_session to session_store - list_sessions(directory=None) -> list[str]: enumerate stored session IDs - session_exists(session_id, directory=None) -> bool: check existence without FileNotFoundError - delete_session(session_id, directory=None) -> bool: unlink a session file - load_session now raises typed SessionNotFoundError (subclass of KeyError) instead of FileNotFoundError - Claws can now manage session lifecycle without reaching past the module to glob filesystem Closes ROADMAP #160. Acceptance: claw can call list_sessions(), session_exists(id), delete_session(id) without importing Path or knowing .port_sessions/<id>.json layout.	2026-04-28 14:46:12 +09:00
YeonGyu-Kim	7c63f9765c	file: #163 — run_turn_loop injects [turn N] suffix into follow-up prompts; multi-turn sessions semantically broken	2026-04-28 14:46:12 +09:00
YeonGyu-Kim	d30e47c8ad	file: #162 — submit_message appends budget-exceeded turn before returning max_budget_reached; session state corrupted on overflow	2026-04-28 14:46:12 +09:00
YeonGyu-Kim	38073c403e	file: #161 — run_turn_loop has no wall-clock timeout, stalled turn blocks indefinitely	2026-04-28 14:46:12 +09:00
Yeachan-Heo	77afde768c	Clarify allowed tool status handling Reject empty --allowedTools inputs instead of treating them as an empty restriction, and surface status JSON metadata that distinguishes default unrestricted tools from flag-provided allow lists. Confidence: high Scope-risk: narrow Tested: cargo test -p rusty-claude-cli rejects_empty_allowed_tools_flag -- --nocapture Tested: cargo test -p tools allowed_tools_rejects_empty_token_lists -- --nocapture Tested: cargo check -p rusty-claude-cli -p tools Tested: cargo test -p rusty-claude-cli -p tools Not-tested: full workspace cargo fmt --check is blocked by pre-existing unrelated formatting drift	2026-04-28 05:44:14 +00:00
Yeachan-Heo	6db68a2baa	Expose tool permission gates as structured worker blockers Worker boot could previously stall on an interactive MCP/tool permission prompt while readiness and startup-timeout surfaces only had generic idle/no-evidence shapes. This adds a first-class blocked lifecycle state, structured event payload, startup evidence fields, and regression coverage so callers can report the exact server/tool gate instead of pane-scraping. Constraint: ROADMAP #200 requires tool/server identity, prompt age, and session-only versus always-allow capability in status/evidence surfaces Rejected: Treat MCP/tool prompts as trust gates \| conflates distinct prompts and loses tool identity Rejected: Leave allow-scope as pane text only \| clawhip still could not classify the blocker without scraping Confidence: high Scope-risk: moderate Directive: Keep tool_permission_required distinct from trust_required; downstream claws rely on server/tool payload plus allow-scope metadata Tested: cargo test -p runtime tool_permission Tested: cargo fmt -p runtime -- --check && cargo clippy -p runtime --all-targets -- -D warnings && cargo test -p runtime Tested: cargo test --workspace Not-tested: live interactive MCP permission prompt in tmux	2026-04-27 09:28:09 +00:00
Yeachan-Heo	5b910356a2	Preserve trust boundaries during pulled follow-up The pull brought the branch current with origin/main while replaying local follow-up work. Conflict resolution kept the roadmap/progress additions and integrated the runtime event/trust changes with upstream's newer surfaces. The trust allowlist now treats worktree_pattern as an additional required predicate, including the missing-worktree case, so auto-trust cannot fall back to cwd-only matching when a worktree constraint was declared. The runtime formatting cleanup keeps clippy/fmt green after the merge. Constraint: Local branch was 109 commits behind origin/main with dirty tracked follow-up work. Rejected: Drop the autostash after conflict resolution \| keeping it preserves a reversible safety backup for unrelated recovery. Confidence: high Scope-risk: moderate Directive: Do not relax worktree_pattern matching without preserving the missing-worktree regression. Tested: git diff --cached --check; cargo fmt -p runtime -- --check; cargo clippy -p runtime --all-targets -- -D warnings; cargo test -p runtime; cargo test --workspace; architect verification approved Not-tested: Live tmux/worker auto-trust behavior outside unit/integration tests	2026-04-27 09:05:50 +00:00
YeonGyu-Kim	a389f8dff1	file: #160 — session_store missing list_sessions, delete_session, session_exists — claw cannot enumerate or clean up sessions without filesystem hacks	2026-04-22 08:47:52 +09:00
YeonGyu-Kim	7a014170ba	file: #159 — run_turn_loop hardcodes empty denied_tools, permission denials absent from multi-turn sessions	2026-04-22 06:48:03 +09:00
YeonGyu-Kim	986f8e89fd	file: #158 — compact_messages_if_needed drops turns silently, no structured compaction event	2026-04-22 06:37:54 +09:00
YeonGyu-Kim	ef1cfa1777	file: #157 — structured remediation registry for error hints (Phase 3 of #77 ) ## Gap #77 Phase 1 added machine-readable error kind discriminants and #156 extended them to text-mode output. However, the hint field is still prose derived from splitting existing error text — not a stable registry-backed remediation contract. Downstream claws inspecting the hint field still need to parse human wording to decide whether to retry, escalate, or terminate. ## Fix Shape 1. Remediation registry: remediation_for(kind, operation) -> Remediation struct with action (retry/escalate/terminate/configure), target, and stable message 2. Stable hint outputs per error class (no more prose splitting) 3. Golden fixture tests replacing split_error_hint() string hacks ## Source gaebal-gajae dogfood sweep 2026-04-22 05:30 KST	2026-04-22 05:31:00 +09:00
YeonGyu-Kim	f1e4ad7574	feat: #156 — error classification for text-mode output (Phase 2 of #77 ) ## Problem #77 Phase 1 added machine-readable error `kind` discriminants to JSON error payloads. Text-mode (stderr) errors still emit prose-only output with no structured classification. Observability tools (log aggregators, CI error parsers) parsing stderr can't distinguish error classes without regex-scraping the prose. ## Fix Added `[error-kind: <class>]` prefix line to all text-mode error output. The prefix appears before the error prose, making it immediately parseable by line-based log tools without any substring matching. Examples: ## Impact - Stderr observers (log aggregators, CI systems) can now parse error class from the first line without regex or substring scraping - Same classifier function used for JSON (#77 P1) and text modes - Text-mode output remains human-readable (error prose unchanged) - Prefix format follows syslog/structured-logging conventions ## Tests All 179 rusty-claude-cli tests pass. Verified on 3 different error classes. Closes ROADMAP #156.	2026-04-22 00:21:32 +09:00
YeonGyu-Kim	14c5ef1808	file: #156 — error classification for text-mode output (Phase 2 of #77 ) ROADMAP entry for natural Phase 2 follow-up to #77 Phase 1 (JSON error kind classification). Text-mode errors currently prose-only with no structured class; observability tools parsing stderr need the kind token. Two implementation options: - Prefix line before error prose: [error-kind: missing_credentials] - Suffix comment: # error_class=missing_credentials Scope: ~20 lines. Non-breaking (adds classification, doesn't change error text). Source: Cycle 11 dogfood probe at 23:18 KST — product surface clean after today's batch, identified natural next step for error-classification symmetry.	2026-04-21 23:19:58 +09:00
YeonGyu-Kim	9362900b1b	feat: #77 Phase 1 — machine-readable error classification in JSON error payloads ## Problem All JSON error payloads had the same three-field envelope: ```json {"type": "error", "error": "<prose with hint baked in>"} ``` Five distinct error classes were indistinguishable at the schema level: - missing_credentials (no API key) - missing_worker_state (no state file) - session_not_found / session_load_failed - cli_parse (unrecognized args) - invalid_model_syntax Downstream claws had to regex-scrape the prose to route failures. ## Fix 1. Added `classify_error_kind()` — prefix/keyword classifier that returns a snake_case discriminant token for 12 known error classes: `missing_credentials`, `missing_manifests`, `missing_worker_state`, `session_not_found`, `session_load_failed`, `no_managed_sessions`, `cli_parse`, `invalid_model_syntax`, `unsupported_command`, `unsupported_resumed_command`, `confirmation_required`, `api_http_error`, plus `unknown` fallback. 2. Added `split_error_hint()` — splits multi-line error messages into (short_reason, optional_hint) so the runbook prose stops being stuffed into the `error` field. 3. Extended JSON envelope at 4 emit sites: - Main error sink (line ~213) - Session load failure in resume_session - Stub command (unsupported_command) - Unknown resumed command (unsupported_resumed_command) ## New JSON shape ```json { "type": "error", "error": "short reason (first line)", "kind": "missing_credentials", "hint": "Hint: export ANTHROPIC_API_KEY..." } ``` `kind` is always present. `hint` is null when no runbook follows. `error` now carries only the short reason, not the full multi-line prose. ## Tests Added 2 new regression tests: - `classify_error_kind_returns_correct_discriminants` — all 9 known classes + fallback - `split_error_hint_separates_reason_from_runbook` — with and without hints All 179 rusty-claude-cli tests pass. Full workspace green. Closes ROADMAP #77 Phase 1.	2026-04-21 22:38:13 +09:00
YeonGyu-Kim	ff45e971aa	fix: #80 — session-lookup error messages now show actual workspace-fingerprint directory ## Problem Two session error messages advertised `.claw/sessions/` as the managed-session location, but the actual on-disk layout is `.claw/sessions/<workspace_fingerprint>/` where the fingerprint is a 16-char FNV-1a hash of the CWD path. Users see error messages like: ``` no managed sessions found in .claw/sessions/ ``` But the real directory is: ``` .claw/sessions/8497f4bcf995fc19/ ``` The error copy was a direct lie — it made workspace-fingerprint partitioning invisible and left users confused about whether sessions were lost or just in a different partition. ## Fix Updated two error formatters to accept the resolved `sessions_root` path and extract the actual workspace-fingerprint directory: 1. format_missing_session_reference: now shows the actual fingerprint dir and explains that it's a workspace-specific partition 2. format_no_managed_sessions: now shows the actual fingerprint dir and includes a note that sessions from other CWDs are intentionally invisible Updated all three call sites to pass `&self.sessions_root` to the formatters. ## Examples Before: ``` no managed sessions found in .claw/sessions/ ``` After: ``` no managed sessions found in .claw/sessions/8497f4bcf995fc19/ Start `claw` to create a session, then rerun with `--resume latest`. Note: claw partitions sessions per workspace fingerprint; sessions from other CWDs are invisible. ``` ``` session not found: nonexistent-id Hint: managed sessions live in .claw/sessions/8497f4bcf995fc19/ (workspace-specific partition). Try `latest` for the most recent session or `/session list` in the REPL. ``` ## Impact - Users can now tell from the error message that they're looking in the right directory (the one their current CWD maps to) - The workspace-fingerprint partitioning stops being invisible - Operators understand why sessions from adjacent CWDs don't appear - Error copy matches the actual on-disk structure ## Tests All 466 runtime tests pass. Verified on two real workspaces with actual workspace-fingerprint directories. Closes ROADMAP #80.	2026-04-21 22:18:12 +09:00
YeonGyu-Kim	4b53b97e36	docs: #155 — add USAGE.md documentation for /ultraplan, /teleport, /bughunter commands ## Problem Three interactive slash commands are documented in `claw --help` but have no corresponding section in USAGE.md: - `/ultraplan [task]` — Run a deep planning prompt with multi-step reasoning - `/teleport <symbol-or-path>` — Jump to a file or symbol by searching the workspace - `/bughunter [scope]` — Inspect the codebase for likely bugs New users see these commands in the help output but don't know: - What each command does - How to use it - When to use it vs. other commands - What kind of results to expect ## Fix Added new section "Advanced slash commands (Interactive REPL only)" to USAGE.md with documentation for all three commands: 1. `/ultraplan` — multi-step reasoning for complex tasks - Example: `/ultraplan refactor the auth module to use async/await` - Output: structured plan with numbered steps and reasoning 2. `/teleport` — navigate to a file or symbol - Example: `/teleport UserService`, `/teleport src/auth.rs` - Output: file content with the requested symbol highlighted 3. `/bughunter` — scan for likely bugs - Example: `/bughunter src/handlers`, `/bughunter` (all) - Output: list of suspicious patterns with explanations ## Impact Users can now discover these commands and understand when to use them without having to guess or search external sources. Bridges the gap between `--help` output and full documentation. Also filed ROADMAP #155 documenting the gap. Closes ROADMAP #155.	2026-04-21 21:49:04 +09:00
YeonGyu-Kim	3cfe6e2b14	feat: #154 — hint provider prefix and env var when model name looks like different provider ## Problem When a user types `claw --model gpt-4` or `--model qwen-plus`, they get: ``` error: invalid model syntax: 'gpt-4'. Expected provider/model (e.g., anthropic/claude-opus-4-6) or known alias ``` USAGE.md documents that "The error message now includes a hint that names the detected env var" — but this hint does not actually exist. The user has to re-read USAGE.md or guess the correct prefix. ## Fix Enhance `validate_model_syntax` to detect when a model name looks like it belongs to a different provider: 1. OpenAI models (starts with `gpt-` or `gpt_`): ``` Did you mean `openai/gpt-4`? (Requires OPENAI_API_KEY env var) ``` 2. Qwen/DashScope models (starts with `qwen`): ``` Did you mean `qwen/qwen-plus`? (Requires DASHSCOPE_API_KEY env var) ``` 3. Grok/xAI models (starts with `grok`): ``` Did you mean `xai/grok-3`? (Requires XAI_API_KEY env var) ``` Unrelated invalid models (e.g., `asdfgh`) do not get a spurious hint. ## Verification - `claw --model gpt-4` → hints `openai/gpt-4` + `OPENAI_API_KEY` - `claw --model qwen-plus` → hints `qwen/qwen-plus` + `DASHSCOPE_API_KEY` - `claw --model grok-3` → hints `xai/grok-3` + `XAI_API_KEY` - `claw --model asdfgh` → generic error (no hint) ## Tests Added 3 new assertions in `parses_multiple_diagnostic_subcommands`: - GPT model error hints openai/ prefix and OPENAI_API_KEY - Qwen model error hints qwen/ prefix and DASHSCOPE_API_KEY - Unrelated models don't get a spurious hint All 177 rusty-claude-cli tests pass. Closes ROADMAP #154.	2026-04-21 21:40:48 +09:00
YeonGyu-Kim	71f5f83adb	feat: #153 — add post-build binary location and verification guide to README ## Problem Users frequently ask after building: - "Where is the claw binary?" - "Did the build actually work?" - "Why can't I run \`claw\` from anywhere?" This happens because \`cargo build\` puts the binary in \`rust/target/debug/claw\` (or \`rust/target/release/claw\`), and new users don't know: 1. Where to find it 2. How to test it 3. How to add it to PATH (optional but common follow-up) ## Fix Added new section "Post-build: locate the binary and verify" to README covering: 1. Binary location table: debug vs. release, macOS/Linux vs. Windows paths 2. Verification commands: Test the binary with \`--help\` and \`doctor\` 3. Three ways to add to PATH: - Symlink (macOS/Linux): \`ln -s ... /usr/local/bin/claw\` - cargo install: \`cargo install --path . --force\` - Shell profile update: add rust/target/debug to \$PATH 4. Troubleshooting: Common errors ("command not found", "permission denied", debug vs. release build speed) ## Impact New users can now: - Find the binary immediately after build - Run it and verify with \`claw doctor\` - Know their options for system-wide access Also filed ROADMAP #153 documenting the gap. Closes ROADMAP #153.	2026-04-21 21:29:59 +09:00
YeonGyu-Kim	79352a2d20	feat: #152 — hint `--output-format json` when user types `--json` on diagnostic verbs ## Problem Users commonly type `claw doctor --json`, `claw status --json`, or `claw system-prompt --json` expecting JSON output. These fail with `unrecognized argument \`--json\` for subcommand` with no hint that `--output-format json` is the correct flag. ## Discovery Filed as #152 during 21:17 dogfood nudge. The #127 worktree contained a more comprehensive patch but conflicted with #141 (unified --help). On re-investigation of main, Bugs 1 and 3 from #127 are already closed (positional arg rejection works, no double "error:" prefix). Only Bug 2 (the `--json` hint) remained. ## Fix Two call sites add the hint: 1. `parse_single_word_command_alias`'s diagnostic-verb suffix path: when rest[1] == "--json", append "Did you mean \`--output-format json\`?" 2. `parse_system_prompt_options` unknown-option path: same hint when the option is exactly `--json`. ## Verification Before: $ claw doctor --json error: unrecognized argument `--json` for subcommand `doctor` Run `claw --help` for usage. After: $ claw doctor --json error: unrecognized argument `--json` for subcommand `doctor` Did you mean `--output-format json`? Run `claw --help` for usage. Covers: `doctor --json`, `status --json`, `sandbox --json`, `system-prompt --json`, and any other diagnostic verb that routes through `parse_single_word_command_alias`. Other unrecognized args (`claw doctor garbage`) correctly don't trigger the hint. ## Tests - 2 new assertions in `parses_multiple_diagnostic_subcommands`: - `claw doctor --json` produces hint - `claw doctor garbage` does NOT produce hint - 177 rusty-claude-cli tests pass - Workspace tests green Closes ROADMAP #152.	2026-04-21 21:23:17 +09:00
YeonGyu-Kim	dddbd78dbd	file: #152 — diagnostic verb suffixes allow arbitrary positional args, double error prefix Filed from nudge directive at 21:17 KST. Implementation exists on worktree `jobdori-127-verb-suffix` but needs rebase due to merge with #141. Ready for Phase 1 implementation once conflicts resolved.	2026-04-21 21:19:51 +09:00
YeonGyu-Kim	7bc66e86e8	feat: #151 — canonicalize workspace path in SessionStore::from_cwd/data_dir ## Problem `workspace_fingerprint(path)` hashes the raw path string without canonicalization. Two equivalent paths (e.g. `/tmp/foo` vs `/private/tmp/foo` on macOS) produce different fingerprints and therefore different session stores. #150 fixed the test-side symptom; this fixes the underlying product contract. ## Discovery path #150 fix (canonicalize in test) was a workaround. Q's ack on #150 surfaced the deeper gap: the function itself is still fragile for any caller passing a non-canonical path: 1. Embedded callers with a raw `--data-dir` path 2. Programmatic `SessionStore::from_cwd(user_path)` calls 3. NixOS store paths, Docker bind mounts, case-insensitive normalization The REPL's default flow happens to work because `env::current_dir()` returns canonical paths on macOS. But any caller passing a raw path risks silent session-store divergence. ## Fix Canonicalize inside `SessionStore::from_cwd()` and `from_data_dir()` before computing the fingerprint. Kept `workspace_fingerprint()` itself as a pure function for determinism — canonicalization is the entry point's responsibility. ```rust let canonical_cwd = fs::canonicalize(cwd).unwrap_or_else(\|_\| cwd.to_path_buf()); let sessions_root = canonical_cwd.join(".claw").join("sessions").join(workspace_fingerprint(&canonical_cwd)); ``` Falls back to the raw path if canonicalize fails (directory doesn't exist yet). ## Test-side updates Three legacy-session tests expected the non-canonical base path to match the store's workspace_root. Updated them to canonicalize `base` after creation — same defensive pattern as #150, now explicit across all three tests. ## Regression test Added `session_store_from_cwd_canonicalizes_equivalent_paths` that creates two stores from equivalent paths (raw vs canonical) and asserts they resolve to the same sessions_dir. ## Verification - `cargo test -p runtime session_store_` — 9/9 pass - `cargo test --workspace` — all green, no FAILED markers - No behavior change for existing users (REPL default flow already used canonical paths) ## Backward compatibility Users on macOS who always went through `env::current_dir()`: no hash change, sessions resume identically. Users who ever called with a non-canonical path: hash would change, but those sessions were already broken (couldn't be resumed from a canonical-path cwd). Net improvement. Closes ROADMAP #151.	2026-04-21 21:06:09 +09:00
YeonGyu-Kim	eaa077bf91	fix: #150 — eliminate symlink canonicalization flake in resume_latest test + file #246 (reminder outcome ambiguity) ## #150 Fix: resume_latest test flake Problem: `resume_latest_restores_the_most_recent_managed_session` intermittently fails when run in the workspace suite or multiple times in sequence, but passes in isolation. Root cause: `workspace_fingerprint(path)` hashes the path string without canonicalization. On macOS, `/tmp` is a symlink to `/private/tmp`. The test creates a temp dir via `std::env::temp_dir().join(...)` which returns `/var/folders/...` (non-canonical). When the subprocess spawns, `env::current_dir()` returns the canonical path `/private/var/folders/...`. The two fingerprints differ, so the subprocess looks in `.claw/sessions/<hash1>` while files are in `.claw/sessions/<hash2>`. Session discovery fails. Fix: Call `fs::canonicalize(&project_dir)` after creating the directory to ensure test and subprocess use identical path representations. Verification: 5 consecutive runs of the full test suite — all pass. Previously: 5/5 failed when run in sequence. ## #246 Filing: Reminder cron outcome ambiguity (control-loop blocker) The `clawcode-dogfood-cycle-reminder` cron times out repeatedly with no structured feedback on whether the nudge was delivered, skipped, or died in-flight. Phase 1 outcome schema — add explicit field to cron result: - `delivered` — nudge posted to Discord - `timed_out_before_send` — died before posting - `timed_out_after_send` — posted but cleanup timed out - `skipped_due_to_active_cycle` — previous cycle active - `aborted_gateway_draining` — daemon shutdown Assigned to gaebal-gajae (cron/orchestration domain). Unblocks trustworthy dogfood cycle observability. Closes ROADMAP #150. Filed ROADMAP #246.	2026-04-21 21:01:09 +09:00
YeonGyu-Kim	bc259ec6f9	fix: #149 — eliminate parallel-test flake in runtime::config tests ## Problem `runtime::config::tests::validates_unknown_top_level_keys_with_line_and_field_name` intermittently fails during `cargo test --workspace` (witnessed during #147 and #148 workspace runs) but passes deterministically in isolation. Example failure from workspace run: test result: FAILED. 464 passed; 1 failed ## Root cause `runtime/src/config.rs::tests::temp_dir()` used nanosecond timestamp alone for namespace isolation: std::env::temp_dir().join(format!("runtime-config-{nanos}")) Under parallel test execution on fast machines with coarse clock resolution, two tests start within the same nanosecond bucket and collide on the same path. One test's `fs::remove_dir_all(root)` then races another's in-flight `fs::create_dir_all()`. Other crates already solved this pattern: - plugins::tests::temp_dir(label) — label-parameterized - runtime::git_context::tests::temp_dir(label) — label-parameterized runtime/src/config.rs was missed. ## Fix Added process id + monotonically-incrementing atomic counter to the namespace, making every callsite provably unique regardless of clock resolution or scheduling: static COUNTER: AtomicU64 = AtomicU64::new(0); let pid = std::process::id(); let seq = COUNTER.fetch_add(1, Ordering::Relaxed); std::env::temp_dir().join(format!("runtime-config-{pid}-{nanos}-{seq}")) Chose counter+pid over the label-parameterized pattern to avoid touching all 20 callsites in the same commit (mechanical noise with no added safety — counter alone is sufficient). ## Verification Before: one failure per workspace run (config test flake). After: 5 consecutive `cargo test --workspace` runs — zero config test failures. Only pre-existing `resume_latest` flake remains (orthogonal, unrelated to this change). for i in 1 2 3 4 5; do cargo test --workspace; done # All 5 runs: config tests green. Only resume_latest flake appears. cargo test -p runtime # 465 passed; 0 failed ## ROADMAP.md Added Pinpoint #149 documenting the gap, root cause, and fix. Closes ROADMAP #149.	2026-04-21 20:54:12 +09:00
YeonGyu-Kim	f84c7c4ed5	feat: #148 + #128 closure — model provenance in claw status JSON/text ## Scope Two deltas in one commit: ### #128 closure (docs) Re-verified on main HEAD `4cb8fa0`: malformed `--model` strings already rejected at parse time (`validate_model_syntax` in parse_args). All historical repro cases now produce specific errors: claw --model '' → error: model string cannot be empty claw --model 'bad model' → error: invalid model syntax: 'bad model' contains spaces claw --model 'sonet' → error: invalid model syntax: 'sonet'. Expected provider/model or known alias claw --model '@invalid' → error: invalid model syntax: '@invalid'. Expected provider/model ... claw --model 'totally-not-real-xyz' → error: invalid model syntax: ... claw --model sonnet → ok, resolves to claude-sonnet-4-6 claw --model anthropic/claude-opus-4-6 → ok, passes through Marked #128 CLOSED in ROADMAP with repro block. Residual provenance gap split off as #148. ### #148 implementation Problem. After #128 closure, `claw status --output-format json` still surfaces only the resolved model string. No way for a claw to distinguish whether `claude-sonnet-4-6` came from `--model sonnet` (alias resolution) vs `--model claude-sonnet-4-6` (pass-through) vs `ANTHROPIC_MODEL` env vs `.claw.json` config vs compiled-in default. Debug forensics had to re-read argv instead of reading a structured field. Clawhip orchestrators sending `--model` couldn't confirm the flag was honored vs falling back to default. Fix. Added two fields to status JSON envelope: - `model_source`: "flag" \| "env" \| "config" \| "default" - `model_raw`: user's input before alias resolution (null on default) Text mode appends a `Model source` line under `Model`, showing the source and raw input (e.g. `Model source flag (raw: sonnet)`). Resolution order (mirrors resolve_repl_model but with source attribution): 1. If `--model` / `--model=` flag supplied → source: flag, raw: flag value 2. Else if ANTHROPIC_MODEL set → source: env, raw: env value 3. Else if `.claw.json` model key set → source: config, raw: config value 4. Else → source: default, raw: null ## Changes ### rust/crates/rusty-claude-cli/src/main.rs - Added `ModelSource` enum (Flag/Env/Config/Default) with `as_str()`. - Added `ModelProvenance` struct (resolved, raw, source) with three constructors: `default_fallback()`, `from_flag(raw)`, and `from_env_or_config_or_default(cli_model)`. - Added `model_flag_raw: Option<String>` field to `CliAction::Status`. - Parse loop captures raw input in `--model` and `--model=` arms. - Extended `parse_single_word_command_alias` to thread `model_flag_raw: Option<&str>` through. - Extended `print_status_snapshot` signature to accept `model_flag_raw: Option<&str>`. Resolves provenance at dispatch time (flag provenance from arg; else probe env/config/default). - Extended `status_json_value` signature with `provenance: Option<&ModelProvenance>`. On Some, adds `model_source` and `model_raw` fields; on None (legacy resume paths), omits them for backward compat. - Extended `format_status_report` signature with optional provenance. On Some, renders `Model source` line after `Model`. - Updated all existing callers (REPL /status, resume /status, tests) to pass None (legacy paths don't carry flag provenance). - Added 2 regression assertions in parse_args test covering both `--model sonnet` and `--model=...` forms. ### ROADMAP.md - Marked #128 CLOSED with re-verification block. - Filed #148 documenting the provenance gap split, fix shape, and acceptance criteria. ## Live verification $ claw --model sonnet --output-format json status \| jq '{model,model_source,model_raw}' {"model": "claude-sonnet-4-6", "model_source": "flag", "model_raw": "sonnet"} $ claw --output-format json status \| jq '{model,model_source,model_raw}' {"model": "claude-opus-4-6", "model_source": "default", "model_raw": null} $ ANTHROPIC_MODEL=haiku claw --output-format json status \| jq '{model,model_source,model_raw}' {"model": "claude-haiku-4-5-20251213", "model_source": "env", "model_raw": "haiku"} $ echo '{"model":"claude-opus-4-7"}' > .claw.json && claw --output-format json status \| jq '{model,model_source,model_raw}' {"model": "claude-opus-4-7", "model_source": "config", "model_raw": "claude-opus-4-7"} $ claw --model sonnet status Status Model claude-sonnet-4-6 Model source flag (raw: sonnet) Permission mode danger-full-access ... ## Tests - rusty-claude-cli bin: 177 tests pass (2 new assertions for #148) - Full workspace green except pre-existing resume_latest flake (unrelated) Closes ROADMAP #128, #148.	2026-04-21 20:48:46 +09:00

1 2 3 4 5 ...

947 Commits