claw-code

mirror of https://github.com/ultraworkers/claw-code.git synced 2026-06-16 06:36:50 +08:00

Author	SHA1	Message	Date
YeonGyu-Kim	445a059eee	test: #173 prep — JSON envelope field consistency validation Adds parametrised test suite validating that clawable-surface commands' JSON output matches their declared envelope contracts per SCHEMAS.md. Two phases: Phase 1 (this commit): Consistency baseline. - Collect ENVELOPE_CONTRACTS registry mapping each command to its required and optional fields - TestJsonEnvelopeConsistency: parametrised test iterates over 13 commands, invokes with --output-format json, validates that actual JSON envelope contains all required fields - test_envelope_field_value_types: spot-check types (int, str, list) for consistency Phase 2 (future #173): Common field wrapping. - Once wrap_json_envelope() is applied, all commands will emit timestamp, command, exit_code, output_format, schema_version - Currently skipped via @pytest.mark.skip, these tests will activate automatically when wrapping is implemented: TestJsonEnvelopeCommonFieldPrep::test_all_envelopes_include_timestamp TestJsonEnvelopeCommonFieldPrep::test_all_envelopes_include_command TestJsonEnvelopeCommonFieldPrep::test_all_envelopes_include_exit_code_and_schema_version Why this matters: - #172 documented the JSON contract; this test validates it - Currently detects when actual output diverges from SCHEMAS.md (e.g. list-sessions emits 'count', not 'sessions_count') - As #173 wraps commands, test suite auto-validates new common fields - Prevents regression: accidental field removal breaks the test suite Current status: 11 passed (consistency), 6 skipped (awaiting #173) Full suite: 168 → 179 passing, zero regression. Closes ROADMAP #173 prep (framework for common field validation). Actual field wrapping remains for next cycle.	2026-04-30 01:06:57 +09:00
YeonGyu-Kim	f10b036a09	docs: add SCHEMAS.md — field-level JSON contract for clawable CLI surfaces Documents the unified JSON envelope contract across all 13 clawable-surface commands. Extends the parity work (#171) to the field level: every command that accepts --output-format json must emit predictable field names, types, and optionality. Common fields (all envelopes): - timestamp (ISO 8601 UTC) - command (argv[1]) - exit_code (0/1/2) - output_format ('json') - schema_version ('1.0') Error envelope (exit 1, failure): - error.kind (enum: filesystem\|auth\|session\|parse\|runtime\|mcp\|delivery\|usage\|policy\|unknown) - error.operation (syscall/method name) - error.target (resource path/name) - error.retryable (bool) - error.message (platform error text) - error.hint (optional: actionable next step) Not-found envelope (exit 1, not a failure): - found: false - error.kind (enum: command_not_found\|tool_not_found\|session_not_found) - error.message, error.retryable Per-command success schemas documented for 13 commands: list-sessions, delete-session, load-session, flush-transcript, show-command, show-tool, exec-command, exec-tool, route, bootstrap, command-graph, tool-pool, bootstrap-graph Why this matters: - #171 enforced that commands have --output-format; #172 enforces that the JSON fields are PREDICTABLE - Downstream claws can build ONE error handler + per-command jq query, not special-casing logic per command family - Field consistency enables generic automation patterns (error dedupe, failure aggregation, cross-command monitoring) Related: - ROADMAP #172 (field-level contract stabilization, Gaebal-gajae priority #1) - ROADMAP #171 (parity audit CI automation — already landed) - #164 Stage B (cancellation observability — adds cancel_observed field) - #164 Stage A (already done — adds stop_reason field to TurnResult) Fixture/regression testing: - Golden JSON snapshots: tests/fixtures/json/<command>.json (future) - Consistency test: test_json_envelope_field_consistency.py (future) - Versioning: schema_version='1.0' for current; bump to 2.0 for breaking changes	2026-04-30 01:06:57 +09:00
YeonGyu-Kim	778623226d	fix: #171 — automate cross-surface CLI parity audit via argparse introspection Stops manual parity inspection from being a human-noticed concern. When a developer adds a new subcommand to the claw-code CLI, this test suite enforces explicit classification: - CLAWABLE_SURFACES: MUST accept --output-format {text,json} - OPT_OUT_SURFACES: explicitly exempt with documented rationale A new command that forgets to opt into one of these two sets FAILS loudly with TestCommandClassificationCoverage::test_every_registered_ command_is_classified. No silent drift possible. Technique: argparse introspection at test time walks the _actions tree, discovers every registered subcommand, and compares against the declared classification sets. Contract is enforced machine-first instead of depending on human review. Three test classes covering three invariants: TestClawableSurfaceParity (14 tests): - test_all_clawable_surfaces_accept_output_format: every member of CLAWABLE_SURFACES has --output-format flag registered - test_clawable_surface_output_format_choices (parametrised over 13 commands): each must accept exactly {text, json} and default to 'text' for backward compat TestCommandClassificationCoverage (3 tests): - test_every_registered_command_is_classified: any new subcommand must be explicitly added to CLAWABLE_SURFACES or OPT_OUT_SURFACES - test_no_command_in_both_sets: sanity check for classification conflicts - test_all_classified_commands_actually_exist: no phantom commands (catches stale entries after a command is removed) TestJsonOutputContractEndToEnd (10 tests): - test_command_emits_parseable_json (parametrised over 10 clawable commands): actual subprocess invocation with --output-format json produces valid parseable JSON on stdout Classification: CLAWABLE_SURFACES (13): Session lifecycle: list-sessions, delete-session, load-session, flush-transcript Inspect: show-command, show-tool Execution: exec-command, exec-tool, route, bootstrap Diagnostic inventory: command-graph, tool-pool, bootstrap-graph OPT_OUT_SURFACES (12): Rich-Markdown reports (future JSON schema): summary, manifest, parity-audit, setup-report List filter commands: subsystems, commands, tools Turn-loop: structured_output is future work Simulation/debug: remote-mode, ssh-mode, teleport-mode, direct-connect-mode, deep-link-mode Full suite: 141 → 168 passing (+27), zero regression. Closes ROADMAP #171. Why this matters: Before: parity was human-monitored; every new command was a drift risk. The CLUSTER 3 sweep required manually auditing every subcommand and landing fixes as separate pinpoints. After: parity is machine-enforced. If a future developer adds a new command without --output-format, the test suite blocks it immediately with a concrete error message pointing at the missing flag. This is the first step in Gaebal-gajae's identified upper-level work: operationalised parity instead of aspirational parity. Related clusters: - Clawability principle: machine-first protocol enforcement - Test-first regression guard: extends TestTripletParityConsistency (#160/#165) and TestFullFamilyParity (#166) from per-cluster parity to cross-surface parity	2026-04-30 01:06:57 +09:00
YeonGyu-Kim	c29b8f7e1b	fix: #170 — bootstrap-graph now accepts --output-format; diagnostic surface parity complete Final diagnostic surface in the JSON parity sweep: bootstrap-graph (the runtime bootstrap/prefetch visualization) now supports --output-format. Concrete addition: - bootstrap-graph: --output-format {text,json} JSON envelope: {stages: [str], note: 'bootstrap-graph is markdown-only in this version'} Envelope explanation: bootstrap-graph's Markdown output is rich and textual; raw JSON embedding maintains the markdown format (split into lines array) rather than attempting lossy structural extraction that would lose information. This is an honest limitation in this cycle; full JSON schema can be added in a future audit if claws require structured bootstrap data (dependency graphs, prefetch timing, etc.). Backward compatibility: - Default is 'text' (Markdown unchanged) Closes ROADMAP #170. Related: #167, #168, #169. Diagnostic/inventory surface family is now uniformly JSON-capable. Summary, manifest, parity-audit, setup-report, command-graph, tool-pool, bootstrap-graph all accept --output-format.	2026-04-30 01:06:57 +09:00
YeonGyu-Kim	8e901220cc	fix: #169 — command-graph and tool-pool now accept --output-format; diagnostic inventory JSON parity Extends the diagnostic surface audit with the two inventory-structure commands: command-graph (command family segmentation) and tool-pool (assembled tool inventory). Both now expose their underlying rich datastructures via JSON envelope. Concrete additions: - command-graph: --output-format {text,json} - tool-pool: --output-format {text,json} JSON envelope shapes: command-graph: {builtins_count, plugin_like_count, skill_like_count, total_count, builtins: [{name, source_hint}], plugin_like: [{name, source_hint}], skill_like: [{name, source_hint}]} tool-pool: {simple_mode, include_mcp, tool_count, tools: [{name, source_hint}]} Backward compatibility: - Default is 'text' (Markdown unchanged) - Text output byte-identical to pre-#169 Tests (4 new, test_command_graph_tool_pool_output_format.py): - TestCommandGraphOutputFormat (2): JSON structure + text compat - TestToolPoolOutputFormat (2): JSON structure + text compat Full suite: 137 → 141 passing, zero regression. Closes ROADMAP #169. Why this matters: Claws auditing the codebase can now ask 'what commands exist' and 'what tools exist' and get structured, parseable answers instead of regex-parsing Markdown headers and counting list items. Related clusters: - Diagnostic surfaces (#169 adds to #167/#168 work-verb parity) - Inventory introspection (command-graph + tool-pool are the two foundational 'what do we have?' queries)	2026-04-30 01:06:57 +09:00
YeonGyu-Kim	4ae59f27e6	fix: #168 — exec-command / exec-tool / route / bootstrap now accept --output-format; CLI family JSON parity COMPLETE Extends the #167 inspect-surface parity fix to the four remaining CLI outliers: the commands claws actually invoke to DO work, not just inspect state. After this commit, the entire claw-code CLI family speaks a unified JSON envelope contract. Concrete additions: - exec-command: --output-format {text,json} - exec-tool: --output-format {text,json} - route: --output-format {text,json} - bootstrap: --output-format {text,json} JSON envelope shapes: exec-command (handled): {name, prompt, source_hint, handled: true, message} exec-command (not-found): {name, prompt, handled: false, error: {kind:'command_not_found', message, retryable: false}} exec-tool (handled): {name, payload, source_hint, handled: true, message} exec-tool (not-found): {name, payload, handled: false, error: {kind:'tool_not_found', message, retryable: false}} route: {prompt, limit, match_count, matches: [{kind, name, score, source_hint}]} bootstrap: {prompt, limit, setup: {python_version, implementation, platform_name, test_command}, routed_matches: [{kind, name, score, source_hint}], command_execution_messages: [str], tool_execution_messages: [str], turn: {prompt, output, stop_reason}, persisted_session_path} Exit codes (unchanged from pre-#168): 0 = success 1 = exec not-found (exec-command, exec-tool only) Backward compatibility: - Default (no --output-format) is 'text' - exec-command/exec-tool text output byte-identical - route text output: unchanged tab-separated kind/name/score/source_hint - bootstrap text output: unchanged Markdown runtime session report Tests (13 new, test_exec_route_bootstrap_output_format.py): - TestExecCommandOutputFormat (3): handled + not-found JSON; text compat - TestExecToolOutputFormat (3): handled + not-found JSON; text compat - TestRouteOutputFormat (3): JSON envelope; zero-matches case; text compat - TestBootstrapOutputFormat (2): JSON envelope; text-mode Markdown compat - TestFamilyWideJsonParity (2): parametrised over ALL 6 family commands (show-command, show-tool, exec-command, exec-tool, route, bootstrap) — every one accepts --output-format json and emits parseable JSON; every one defaults to text mode without a leading {. One future regression on any family member breaks this test. Full suite: 124 → 137 passing, zero regression. Closes ROADMAP #168. This completes the CLI-wide JSON parity sweep: - Session-lifecycle family: #160 (list/delete), #165 (load), #166 (flush) - Inspect family: #167 (show-command, show-tool) - Work-verb family: #168 (exec-command, exec-tool, route, bootstrap) ENTIRE CLI SURFACE is now machine-readable via --output-format json with typed errors, deterministic exit codes, and consistent envelope shape. Claws no longer need to regex-parse any CLI output. Related clusters: - Clawability principle: 'machine-readable in state and failure modes' (ROADMAP top-level). 9 pinpoints in this cluster; all now landed. - Typed-error envelope consistency: command_not_found / tool_not_found / session_not_found / session_load_failed all share {kind, message, retryable} shape. - Work-verb semantics: exec-* surfaces expose 'handled' boolean (not 'found') because 'not handled' is the operational signal — claws dispatch on whether the work was performed, not whether the entry exists in the inventory.	2026-04-30 01:06:57 +09:00
YeonGyu-Kim	de97541ebd	fix: #167 — show-command and show-tool now accept --output-format flag; CLI parity with session-lifecycle family Closes the inspect-capability parity gap: show-command and show-tool were the only discovery/inspection CLI commands lacking --output-format support, making them outliers in the ecosystem that already had unified JSON contracts across list-sessions, load-session, delete-session, and flush-transcript (#160/#165/#166). Concrete additions: - show-command: --output-format {text,json} - show-tool: --output-format {text,json} JSON envelope shape (found case): {name, found: true, source_hint, responsibility} JSON envelope shape (not-found case): {name, found: false, error: {kind:'command_not_found'\|'tool_not_found', message, retryable: false}} Exit codes: 0 = success 1 = not found Backward compatibility: - Default (no --output-format) is 'text' (unchanged) - Text output byte-identical to pre-#167 (three newline-separated lines) Tests (10 new, test_show_command_tool_output_format.py): - TestShowCommandOutputFormat (5): found + not-found in JSON; text mode backward compat; text is default - TestShowToolOutputFormat (3): found + not-found in JSON; text mode backward compat - TestShowCommandToolFormatParity (2): both accept same flag choices; consistent JSON envelope shape Full suite: 114 → 124 passing, zero regression. Closes ROADMAP #167. Why this matters: Before: Claws calling show-command/show-tool had to parse human-readable prose output via regex, with no structured error signal. After: Same envelope contract as load-session and friends: JSON-first, typed errors, machine-parseable. Related clusters: - Session-lifecycle CLI parity family (#160, #165, #166, #167) - Machine-readable error contracts (same vein as #162 atomicity + #164 cancellation state-safety: structured boundaries for orchestration)	2026-04-30 01:06:57 +09:00
YeonGyu-Kim	3ec635207e	fix: #164 Stage A — cooperative cancellation via cancel_event in submit_message Closes the #161 follow-up gap identified in review: wall-clock timeout bounded caller-facing wait but did not cancel the underlying provider thread, which could silently mutate mutable_messages / transcript_store / permission_denials / total_usage after the caller had already observed stop_reason='timeout'. A ghost turn committed post-deadline would poison any session that got persisted afterwards. Stage A scope (this commit): runtime + engine layer cooperative cancel. Engine layer (src/query_engine.py): - submit_message now accepts cancel_event: threading.Event \| None = None - Two safe checkpoints: 1. Entry (before max_turns / budget projection) — earliest possible return 2. Post-budget (after output synthesis, before mutation) — catches cancel that arrives while output was being computed - Both checkpoints return stop_reason='cancelled' with state UNCHANGED (mutable_messages, transcript_store, permission_denials, total_usage all preserved exactly as on entry) - cancel_event=None preserves legacy behaviour with zero overhead (no checkpoint checks at all) Runtime layer (src/runtime.py): - run_turn_loop creates one cancel_event per invocation when a deadline is in play (and None otherwise, preserving legacy fast path) - Passes the same event to every submit_message call across turns, so a late cancel on turn N-1 affects turn N - On timeout (either pre-call or mid-call), runtime explicitly calls cancel_event.set() before future.cancel() + synthesizing the timeout TurnResult. This upgrades #161's best-effort future.cancel() (which only cancels not-yet-started futures) to cooperative mid-flight cancel. Stop reason taxonomy after Stage A: 'completed' — turn committed, state mutated exactly once 'max_budget_reached' — overflow, state unchanged (#162) 'max_turns_reached' — capacity exceeded, state unchanged 'cancelled' — cancel_event observed, state unchanged (#164 Stage A) 'timeout' — synthesised by runtime, not engine (#161) The 'cancelled' vs 'timeout' split matters: - 'timeout' is the runtime's best-effort signal to the caller: deadline hit - 'cancelled' is the engine's confirmation: cancel was observed + honoured If the provider call wedges entirely (never reaches a checkpoint), the caller still sees 'timeout' and the thread is leaked — but any NEXT submit_message call on the same engine observes the event at entry and returns 'cancelled' immediately, preventing ghost-turn accumulation. This is the honest cooperative limit in Python threading land; true preemption requires async-native provider IO (future work, not Stage A). Tests (29 new tests, tests/test_submit_message_cancellation.py + tests/ test_run_turn_loop_cancellation.py): Engine-layer (12 tests): - TestCancellationBeforeCall (5): pre-set event returns 'cancelled' immediately; mutable_messages, transcript_store, usage, permission_denials all preserved - TestCancellationAfterBudgetCheck (1): cancel set mid-call (after projection, before commit) still honoured; output synthesised but state untouched - TestCancellationAfterCommit (2): post-commit cancel not observable (honest limit) BUT next call on same engine observes it + returns 'cancelled' - TestLegacyCallersUnchanged (3): cancel_event=None preserves #162 atomicity + max_turns contract with zero behaviour change - TestCancellationVsOtherStopReasons (2): cancel precedes max_turns check; cancel does not retroactively override a completed turn Runtime-layer (5 tests): - TestTimeoutPropagatesCancelEvent (3): submit_message receives a real Event object when deadline is set; None in legacy mode; timeout actually calls event.set() so in-flight threads observe at their next checkpoint - TestCancelEventSharedAcrossTurns (1): same event object passed to every turn (object identity check) — late cancel on turn N-1 must affect turn N Regression: 3 existing timeout test mocks updated to accept cancel_event kwarg (mocks that previously had signature (prompt, commands, tools, denials) now have (prompt, commands, tools, denials, cancel_event=None) since runtime passes cancel_event positionally on the timeout path). Full suite: 97 → 114 passing, zero regression. Closes ROADMAP #164 Stage A. What's explicitly NOT in Stage A: - Preemptive cancellation of wedged provider IO (requires asyncio-native provider path; larger refactor) - Timeout on the legacy unbounded run_turn_loop path (by design: legacy callers opt out of cancellation entirely) - CLI exposure of 'cancelled' as a distinct exit code (currently 'cancelled' maps to the same stop_reason != 'completed' break condition as others; CLI surface for cancel is a separate pinpoint if warranted)	2026-04-30 01:06:57 +09:00
YeonGyu-Kim	6542dded66	chore: gitignore .port_sessions/ to prevent dogfood-run pollution Every 'claw flush-transcript' call without --directory writes to .port_sessions/<uuid>.json in CWD. Without a gitignore entry, every dogfood run leaves dozens of untracked files in the repo, masking real changes in 'git status' output. Now that #160/#166 ship structured session lifecycle commands and deterministic --session-id, this directory is purely transient by default — belongs in .gitignore.	2026-04-30 01:06:57 +09:00
YeonGyu-Kim	6b9879cd1b	fix: #166 — flush-transcript now accepts --directory / --output-format / --session-id; session-creation command parity with #160/#165 lifecycle triplet	2026-04-30 01:06:57 +09:00
YeonGyu-Kim	9c2901eb21	fix: #159 — run_turn_loop no longer hardcodes empty denied_tools; permission denials now parity-match bootstrap_session #159: multi-turn sessions had a silent security asymmetry: denied_tools were always empty in run_turn_loop, even though bootstrap_session inferred them from the routed matches. Result: any tool gated as 'destructive' (bash-family commands, rm, etc) would silently appear unblocked across all turns in multi-turn mode, giving a false 'clean' permission picture to any claw consuming TurnResult.permission_denials. Fix: compute denied_tools once at loop start via _infer_permission_denials, then pass the same denials to every submit_message call (both timeout and legacy unbounded paths). This mirrors the existing bootstrap_session pattern. Acceptance: run_turn_loop('run bash ls').permission_denials now matches what bootstrap_session returns — both infer the same denials from the routed matches. Multi-turn security posture is symmetric. Tests (tests/test_run_turn_loop_permissions.py, 2 tests): - test_turn_loop_surfaces_permission_denials_like_bootstrap: Symmetry check confirming both paths infer identical denials for destructive tools - test_turn_loop_with_continuation_preserves_denials: Denials inferred at loop start are passed consistently to all turns; captured via mock and verified non-empty Full suite: 82/82 passing, zero regression. Closes ROADMAP #159.	2026-04-30 01:06:57 +09:00
YeonGyu-Kim	b2a0c5da03	fix: #165 — load-session CLI now parity-matches list/delete (--directory, --output-format, typed JSON errors) The #160 session-lifecycle CLI triplet was asymmetric: list-sessions and delete-session accepted --directory + --output-format and emitted typed JSON error envelopes, but load-session had neither flag and dumped a raw Python traceback (including the SessionNotFoundError class name) on a missing session. Three concrete impacts this fix closes: 1. Alternate session-store locations (e.g. /tmp/claw-run-XXX/.port_sessions) were unreachable via load-session; claws had to chdir or monkeypatch DEFAULT_SESSION_DIR to work around it. 2. Not-found emitted a multi-line Python stack, not a parseable envelope. Claws deciding retry/escalate/give-up had only exit code 1 to work with. 3. The traceback leaked 'src.session_store.SessionNotFoundError' verbatim, coupling version-pinned claws to our internal exception class name. Now all three triplet commands accept the same flag pair and emit the same JSON error shape: Success (json mode): {"session_id": "alpha", "loaded": true, "messages_count": 3, "input_tokens": 42, "output_tokens": 99} Not-found: {"session_id": "missing", "loaded": false, "error": {"kind": "session_not_found", "message": "session 'missing' not found in /path", "directory": "/path", "retryable": false}} Corrupted file: {"session_id": "broken", "loaded": false, "error": {"kind": "session_load_failed", "message": "...", "directory": "/path", "retryable": true}} Exit code contract: - 0 on successful load - 1 on not-found (preserves existing $?) - 1 on OSError/JSONDecodeError (distinct 'kind' in JSON) Backward compat: legacy 'claw load-session ID' text output unchanged byte-for-byte. Only new behaviour is the flags and structured error path. Tests (tests/test_load_session_cli.py, 13 tests): - TestDirectoryFlagParity (2): --directory works + fallback to CWD/.port_sessions - TestOutputFormatFlagParity (2): json schema + text-mode backward compat - TestNotFoundTypedError (2): JSON envelope on not-found; no traceback in either mode; no internal class name leak - TestLoadFailedDistinctFromNotFound (1): corrupted file = session_load_failed with retryable=true, distinct from session_not_found - TestTripletParityConsistency (6): parametrised over [list, delete, load] * [--directory, --output-format] — explicit parity guard for future regressions Full suite: 80/80 passing, zero regression. Discovered via Jobdori dogfood sweep 2026-04-22 17:44 KST — ran 'claw load-session nonexistent' expecting a clean error, got a Python traceback. Filed #165 + fixed in same commit. Closes ROADMAP #165.	2026-04-30 01:06:57 +09:00
YeonGyu-Kim	11326905e9	fix: #163 — remove [turn N] suffix pollution from run_turn_loop; file #164 timeout-cancellation followup #163: run_turn_loop no longer injects f'{prompt} [turn N]' into follow-up prompts. The suffix was never defined or interpreted anywhere — not by the engine, not by the system prompt, not by any LLM. It looked like a real user-typed annotation in the transcript and made replay/analysis fragile. New behaviour: - turn 0 submits the original prompt (unchanged) - turn > 0 submits caller-supplied continuation_prompt if provided, else the loop stops cleanly — no fabricated user turn - added continuation_prompt: str \| None = None parameter to run_turn_loop - added --continuation-prompt CLI flag for claws scripting multi-turn loops - zero '[turn' strings ever appear in mutable_messages or stdout now Behaviour change for existing callers: - Before: run_turn_loop(prompt, max_turns=3) submitted 3 turns ('prompt', 'prompt [turn 2]', 'prompt [turn 3]') - After: run_turn_loop(prompt, max_turns=3) submits 1 turn ('prompt') - To preserve old multi-turn behaviour, pass continuation_prompt='Continue.' or any structured follow-up text One existing timeout test (test_budget_is_cumulative_across_turns) updated to pass continuation_prompt so the cumulative-budget contract is actually exercised across turns instead of trivially satisfied by a one-turn loop. #164 filed: addresses reviewer feedback on #161. The wall-clock timeout bounds the caller-facing wait, but the underlying submit_message worker thread keeps running and can mutate engine state after the timeout TurnResult is returned. A cooperative cancel_event pattern is sketched in the pinpoint; real asyncio.Task.cancel() support will come once provider IO is async-native (larger refactor). Tests (tests/test_run_turn_loop_continuation.py, 8 tests): - TestNoTurnSuffixInjection (2): zero '[turn' strings in any submitted prompt, both default and explicit-continuation paths - TestContinuationDefaultStopsAfterTurnZero (2): default loops run exactly one turn; engine.submit_message called exactly once despite max_turns=10 - TestExplicitContinuationBehaviour (2): turn 0 = original, turn N = continuation verbatim; max_turns still respected - TestCLIContinuationFlag (2): CLI default emits only '## Turn 1'; --continuation-prompt wires through to multi-turn behaviour Full suite: 67/67 passing. Closes ROADMAP #163. Files #164.	2026-04-30 01:06:57 +09:00
YeonGyu-Kim	c07089eedd	fix: #162 — budget-overflow no longer corrupts session state in submit_message Previously, QueryEnginePort.submit_message() checked the token budget AFTER appending the prompt to mutable_messages, transcript_store, and permission_denials, and AFTER calling compact_messages_if_needed(). On overflow it set stop_reason='max_budget_reached' but the overflow turn was already committed. Any caller that persisted the session afterwards wrote the rejected prompt to disk — the session was silently poisoned even though the TurnResult said the turn never completed. Fix: - Restructure submit_message so the budget check early-returns BEFORE any mutation of mutable_messages, transcript_store, permission_denials, or total_usage. - The returned TurnResult.usage reflects pre-call state (overflow never advanced the usage counter). - Normal (in-budget) path unchanged: mutation happens exactly once, at the end, only on 'completed' results. This closes the atomicity gap: submit_message is now either 'turn committed' (stop_reason='completed') or 'turn rejected, state untouched' (stop_reason in {'max_budget_reached', 'max_turns_reached'}). Callers can safely retry with a fresh budget or a smaller prompt without worrying about phantom committed turns from prior rejections. Tests (tests/test_submit_message_budget.py, 10 tests): - TestBudgetOverflowDoesNotMutate (5): mutable_messages / transcript / permission_denials / total_usage / TurnResult.usage all pre-mutation after overflow - TestOverflowPersistence (2): first-turn overflow persists empty session; successful-turn-then-overflow persists only the successful turn - TestEngineUsableAfterOverflow (2): subsequent in-budget call still works with no residue; repeated overflows don't accumulate hidden state - TestNormalPathStillCommits (1): regression guard — non-overflow path still commits mutable_messages/transcript/usage as expected Full suite: 59/59 passing, zero regression. Blocker: none. Closes ROADMAP #162.	2026-04-30 01:06:57 +09:00
YeonGyu-Kim	af9723cf0a	fix: #161 — wall-clock timeout for run_turn_loop; stalled turns now abort with stop_reason='timeout' Previously, run_turn_loop was bounded only by max_turns (turn count). If engine.submit_message stalled — slow provider, hung network, infinite stream — the loop blocked indefinitely with no cancellation path. Claws calling run_turn_loop in CI or orchestration had no reliable way to enforce a deadline; the loop would hang until OS kill or human intervention. Fix: - Add timeout_seconds parameter to run_turn_loop (default None = legacy unbounded). - When set, each submit_message call runs inside a ThreadPoolExecutor and is bounded by the remaining wall-clock budget (total across all turns, not per-turn). - On timeout, synthesize a TurnResult with stop_reason='timeout' carrying the turn's prompt and routed matches so transcripts preserve orchestration context. - Exhausted/negative budget short-circuits before calling submit_message. - Legacy path (timeout_seconds=None) bypasses the executor entirely — zero overhead for callers that don't opt in. CLI: - Added --timeout-seconds flag to 'turn-loop' command. - Exit code 2 when the loop terminated on timeout (vs 0 for completed), so shell scripts can distinguish 'done' from 'budget exhausted'. Tests (tests/test_run_turn_loop_timeout.py, 6 tests): - Legacy unbounded path unchanged (timeout_seconds=None never emits 'timeout') - Hung submit_message aborted within budget (0.3s budget, 5s mock hang → exit <1.5s) - Budget is cumulative across turns (0.6s budget, 0.4s per turn, not per-turn) - timeout_seconds=0 short-circuits first turn without calling submit_message - Negative timeout treated as exhausted (guard against caller bugs) - Timeout TurnResult carries correct prompt, matches, UsageSummary shape Full suite: 49/49 passing, zero regression. Blocker: none. Closes ROADMAP #161.	2026-04-30 01:06:57 +09:00
YeonGyu-Kim	b88c899ceb	feat(#160 ): wire claw list-sessions and delete-session CLI commands Closes the last #160 gap: claws can now manage session lifecycle entirely through the CLI without filesystem hacks. New commands: - claw list-sessions [--directory DIR] [--output-format text\|json] Enumerates stored session IDs. JSON mode emits {sessions, count}. Missing/empty directories return empty list (exit 0), not an error. - claw delete-session SESSION_ID [--directory DIR] [--output-format text\|json] Idempotent: not-found is exit 0 with status='not_found' (no raise). Partial-failure: exit 1 with typed JSON error envelope: {session_id, deleted: false, error: {kind, message, retryable}} The 'session_delete_failed' kind is retryable=true so orchestrators know to retry vs escalate. Public API surface extended in src/__init__.py: - list_sessions, session_exists, delete_session - SessionNotFoundError, SessionDeleteError Tests added (tests/test_porting_workspace.py): - test_list_sessions_cli_runs: text + json modes against tempdir - test_delete_session_cli_idempotent: first call deleted=true, second call deleted=false (exit 0, status=not_found) - test_delete_session_cli_partial_failure_exit_1: permission error surfaces as exit 1 + typed JSON error with retryable=true All 43 tests pass. The session storage abstraction chapter is closed: - storage layer decoupled from claw code (#160 initial impl) - delete contract hardened + caller-audited (#160 hardening pass) - CLI wired with idempotency preserved at exit-code boundary (this commit)	2026-04-30 01:06:57 +09:00
YeonGyu-Kim	e6ea4d248d	fix(#160 ): harden delete_session contract — idempotency, race-safety, typed partial-failure Addresses review feedback on initial #160 implementation: 1. delete_session() contract now explicit: - Idempotent: delete(x); delete(x) is safe, second call returns False - Race-safe: TOCTOU between exists()/unlink() eliminated via unlink-then-catch - Partial-failure typed: permission/IO errors wrapped in SessionDeleteError (OSError subclass) so callers can distinguish 'not found' (return False) from 'could not delete' (raise) 2. New SessionDeleteError class for partial-failure surfacing. Distinct from SessionNotFoundError (KeyError subclass for missing loads). 3. Caller audit confirmed: no code outside session_store globs .port_sessions or imports DEFAULT_SESSION_DIR. Storage layout is fully encapsulated. 4. Added tests/test_session_store.py — 18 tests covering: - list_sessions: empty/missing/sorted/non-json filter - session_exists: true/false/missing-dir - load_session: SessionNotFoundError typing (KeyError subclass, not FileNotFoundError) - delete_session idempotency: first/second/never-existed calls - delete_session partial-failure: SessionDeleteError wraps OSError - delete_session race-safety: concurrent deletion returns False, not raise - Full save->list->exists->load->delete roundtrip All 18 tests pass. Merge-ready: contract documented, caller-audited, race-safe.	2026-04-30 01:06:57 +09:00
YeonGyu-Kim	0c600e76a7	fix: #160 — add list_sessions, session_exists, delete_session to session_store - list_sessions(directory=None) -> list[str]: enumerate stored session IDs - session_exists(session_id, directory=None) -> bool: check existence without FileNotFoundError - delete_session(session_id, directory=None) -> bool: unlink a session file - load_session now raises typed SessionNotFoundError (subclass of KeyError) instead of FileNotFoundError - Claws can now manage session lifecycle without reaching past the module to glob filesystem Closes ROADMAP #160. Acceptance: claw can call list_sessions(), session_exists(id), delete_session(id) without importing Path or knowing .port_sessions/<id>.json layout.	2026-04-30 01:06:57 +09:00
YeonGyu-Kim	424d5aff74	file: #161 — run_turn_loop has no wall-clock timeout, stalled turn blocks indefinitely	2026-04-30 01:06:57 +09:00
Bellman	f65b2b4f0e	Merge pull request #2861 from ultraworkers/docs/roadmap-341-tasks-json-dual-vocab docs(roadmap): add #341 — tasks JSON error envelope uses dual vocabulary	2026-04-30 01:06:27 +09:00
Yeachan-Heo	f4b74e89dd	Document why /tasks JSON errors need one stdout contract Constraint: ROADMAP-only dogfood follow-up for 16:00 nudge on rebuilt claw git_sha 58569131 Rejected: code change in the command dispatcher \| request was specifically to add one ROADMAP.md-only item Confidence: high Scope-risk: narrow Directive: Keep /tasks distinct from #340; this is unsupported command stub JSON, not session help Tested: git diff --check; scripts/fmt.sh --check Not-tested: runtime behavior change, because this commit only documents the gap	2026-04-29 16:02:10 +00:00
Bellman	5856913104	Merge pull request #2859 from ultraworkers/docs/roadmap-340-session-help-json-stderr docs(roadmap): add #340 — session help JSON error envelope goes to stderr	2026-04-30 00:54:42 +09:00
Yeachan-Heo	d45a0d2f5b	Document stderr-only session help JSON contract gap Capture the dogfood evidence as a roadmap item so the stdout JSON error-envelope contract can be fixed and regression-tested later.\n\nConstraint: User requested exactly one ROADMAP.md-only item #340 from current origin/main.\nConfidence: high\nScope-risk: narrow\nTested: git diff --check; scripts/fmt.sh --check\nNot-tested: Runtime behavior unchanged; documentation-only roadmap entry.	2026-04-29 15:31:59 +00:00
Bellman	dc47482e40	Merge pull request #2857 from ultraworkers/docs/roadmap-339-v2 docs(roadmap): add #339 — session delete not resume-safe, blocks GC automation	2026-04-30 00:26:29 +09:00
YeonGyu-Kim	9537c97231	docs(roadmap): add #339 — session delete not resume-safe, blocks GC automation	2026-04-30 00:18:28 +09:00
Bellman	f56a5afcf7	Merge pull request #2856 from ultraworkers/docs/roadmap-337-workspace-dirty-lifecycle-detail-restore docs(roadmap): restore #337 workspace dirty lifecycle detail gap	2026-04-30 00:14:48 +09:00
Yeachan-Heo	3efaf551ed	Restore roadmap GC lifecycle detail gap Constraint: ROADMAP.md-only restore of lost #337 from PR #2852 / Jobdori dogfood evidence Rejected: Renumbering adjacent items \| preserving existing #338 and surrounding roadmap entries keeps history stable Confidence: high Scope-risk: narrow Directive: Keep #337 before #338 and do not collapse the dirty-file detail requirement into the broader help/status backlog Tested: git diff --check; scripts/fmt.sh --check Not-tested: Product behavior changes; documentation-only change	2026-04-29 15:09:40 +00:00
Bellman	30c9b438ef	Merge pull request #2853 from ultraworkers/docs/roadmap-338-help-json-field-drift docs(roadmap): add #338 for help JSON field drift	2026-04-30 00:06:24 +09:00
Yeachan-Heo	587bb18572	docs(roadmap): add #338 for help JSON field drift Constraint: Respond to 14:30 dogfood nudge with one direct claw-code pinpoint.\nEvidence: rebuilt actual debug binary at git_sha 24ccb59b; compared top-level help --output-format json with resume-safe /help --output-format json.\nFinding: same help surface uses message in top-level JSON and text in slash/resume JSON.\nTested: cargo run --manifest-path rust/Cargo.toml --bin claw -- version --output-format json; ./rust/target/debug/claw help --output-format json; ./rust/target/debug/claw --resume latest /help --output-format json; git diff --check; scripts/fmt.sh --check.\nNot-tested: full Rust suite; roadmap-only documentation change.	2026-04-29 14:34:26 +00:00
Bellman	24ccb59bd2	Merge pull request #2851 from ultraworkers/docs/roadmap-329-slash-agents-json-opacity docs(roadmap): add #329 for slash agents JSON opacity	2026-04-29 23:33:47 +09:00
Yeachan-Heo	0e8e75ef75	docs(roadmap): add #329 for slash agents JSON opacity Constraint: Respond to dogfood nudge with exactly one concrete clawability pinpoint from direct claw-code use.\nEvidence: rebuilt actual debug binary at git_sha 0f7578c0; compared resume-safe /agents --output-format json with top-level claw agents --output-format json.\nFinding: slash /agents JSON only exposes kind,text while top-level agents JSON exposes structured agents[] inventory and provenance.\nTested: cargo run --manifest-path rust/Cargo.toml --bin claw -- version --output-format json; ./rust/target/debug/claw --resume latest /agents --output-format json; ./rust/target/debug/claw agents --output-format json; git diff --check; scripts/fmt.sh --check.\nNot-tested: full Rust suite; roadmap-only documentation change.	2026-04-29 14:01:36 +00:00
Bellman	0f7578c064	Merge pull request #2849 from ultraworkers/docs/roadmap-328-dogfood-pinpoint Add ROADMAP #328 for native-agent source provenance	2026-04-29 22:35:51 +09:00
Yeachan-Heo	213d406cbf	Record why native-agent provenance needs dogfood follow-up Constraint: Scope requested ROADMAP.md only with exactly one new #328 pinpoint from direct claw dogfood.\nRejected: Implementing the agents-help fix now \| user requested roadmap-only evidence item.\nConfidence: high\nScope-risk: narrow\nDirective: Keep agent help source roots derived from the same loader registry as agents list; do not hand-maintain a divergent root list.\nTested: cargo run --manifest-path rust/Cargo.toml --bin claw -- version --output-format json; ./rust/target/debug/claw version --output-format json; ./rust/target/debug/claw agents help --output-format json; ./rust/target/debug/claw agents --output-format json; git diff --check; scripts/fmt.sh --check\nNot-tested: Full Rust test suite; roadmap-only documentation change.	2026-04-29 13:33:23 +00:00
Bellman	ee85fed6ca	Merge pull request #2847 from ultraworkers/docs/roadmap-327-dogfood-pinpoint Add ROADMAP #327 for MCP help source mismatch	2026-04-29 22:06:45 +09:00
Yeachan-Heo	3a34d83749	Record why MCP source help needs dogfood follow-up Constraint: Scope limited to ROADMAP.md and one new pinpoint #327 from actual rebuilt claw dogfood. Rejected: Code fix in this branch \| user requested roadmap-only filing. Confidence: high Scope-risk: narrow Directive: Keep mcp help source lists derived from actual config discovery, not hard-coded partial docs. Tested: ./rust/target/debug/claw version --output-format json; ./rust/target/debug/claw mcp --help; ./rust/target/debug/claw mcp help --output-format json; temp .claw.json mcp list proof; git diff --check; scripts/fmt.sh --check Not-tested: Full Rust test suite, documentation-only change.	2026-04-29 13:02:27 +00:00
Bellman	981aff7c8b	Merge pull request #2845 from ultraworkers/docs/roadmap-326-dogfood-pinpoint docs(roadmap): add #326 pane inventory opacity pinpoint	2026-04-29 21:35:26 +09:00
Yeachan-Heo	c94940effa	docs: add roadmap 326 pane inventory opacity	2026-04-29 12:33:36 +00:00
Bellman	b90875fa8e	Merge pull request #2843 from ultraworkers/docs/roadmap-325-help-json-schema docs(roadmap): add #325 help json schema opacity pinpoint	2026-04-29 21:05:12 +09:00
Yeachan-Heo	2567cbcc78	Pin help JSON schema opacity for automation Document the dogfood gap where help JSON stays parseable but hides command metadata inside a prose message, so future implementation can expose machine-readable command, slash-command, and resume-safety fields.\n\nConstraint: user requested ROADMAP.md-only pinpoint for issue #325 from origin/main d607ff36.\nRejected: implementing the schema now \| requested fix shape is roadmap documentation only.\nConfidence: high\nScope-risk: narrow\nDirective: keep message for humans while adding schema/versioned structured help metadata when implementing.\nTested: git diff --check; scripts/fmt.sh --check\nNot-tested: runtime CLI behavior unchanged by docs-only change	2026-04-29 12:02:14 +00:00
Bellman	d607ff3674	Merge pull request #2840 from ultraworkers/docs/roadmap-324-stale-binary-provenance docs(roadmap): add #324 stale binary provenance pinpoint	2026-04-29 20:34:27 +09:00
Yeachan-Heo	cdf6282965	Record why stale binary provenance needs a roadmap pin Constraint: Documentation-only follow-up from current main e7074f47 after PR #2838; edit scope limited to ROADMAP.md.\nRejected: Implementing provenance detection now \| user requested roadmap entry only.\nConfidence: high\nScope-risk: narrow\nDirective: Future implementation should compare embedded build git_sha/build date to workspace HEAD/dirty state without leaking secrets.\nTested: git diff --check; scripts/fmt.sh --check\nNot-tested: Runtime provenance behavior; this commit only records the roadmap requirement.	2026-04-29 11:31:19 +00:00
Bellman	e7074f47ee	Merge pull request #2838 from ultraworkers/docs/roadmap-322-323-clean docs(roadmap): add #322 #323 — json stream corruption and session identity contradiction	2026-04-29 19:40:50 +09:00
YeonGyu-Kim	9468383b67	docs(roadmap): add #322 #323 — json stream corruption and session identity contradiction	2026-04-29 19:38:00 +09:00
Bellman	1da2781816	Merge pull request #2835 from ultraworkers/docs/roadmap-249-issue-github-oauth-opacity docs(roadmap): add #249 issue GitHub OAuth opacity pinpoint	2026-04-29 19:31:50 +09:00
Yeachan-Heo	9037430d52	docs(roadmap): add #249 issue github oauth opacity pinpoint	2026-04-29 10:01:16 +00:00
Bellman	8e22f757d8	Merge pull request #2834 from ultraworkers/docs/roadmap-248-prompt-mode-silent-hang docs(roadmap): add #248 prompt-mode silent-hang pinpoint	2026-04-29 18:31:48 +09:00
Yeachan-Heo	7676b376ae	docs(roadmap): add #248 prompt-mode silent-hang pinpoint	2026-04-29 08:24:37 +00:00
Sigrid Jin (ง'̀-'́)ง oO	1011a83823	Merge pull request #2829 from ultraworkers/fix/issue-320-session-lifecycle-classification Fix session lifecycle classification for idle tmux shells	2026-04-29 16:11:58 +09:00
Yeachan-Heo	1376d92064	Filter stub commands from resume-safe help Keep claw --help's resume-safe slash command summary aligned with the interactive command list by filtering STUB_COMMANDS and adding regression coverage.	2026-04-29 03:31:34 +00:00
Yeachan-Heo	be53e04671	Classify saved sessions by live work rather than pane existence Operator status previously treated any tmux pane in a workspace as equivalent to active work. The new classifier uses tmux pane command/path metadata as a soft signal, treats plain shells as idle, and adds dirty-worktree abandoned markers to status and session-list output for clawhip consumers. Constraint: Keep issue #320 prototype minimal and additive without new dependencies Rejected: Screen-scraping pane output \| fragile and broader than needed for lifecycle classification Confidence: high Scope-risk: narrow Tested: cargo test -p rusty-claude-cli Tested: cargo check -p rusty-claude-cli Not-tested: cargo clippy -p rusty-claude-cli --all-targets -- -D warnings is blocked by pre-existing commands crate clippy::unnecessary_wraps warnings	2026-04-28 13:12:37 +00:00

1 2 3 4 5 ...

971 Commits