claw-code

mirror of https://github.com/ultraworkers/claw-code.git synced 2026-06-13 13:13:33 +08:00

Author	SHA1	Message	Date
YeonGyu-Kim	af306d489e	feat: #180 implement --version flag for metadata protocol (#28 proactive demand) Cycle #28 closes the low-hanging metadata protocol gap identified in #180. ## The Gap Pinpoint #180 (filed cycle #24) documented a metadata protocol gap: - `--help` works (argparse default) - `--version` does NOT exist The ROADMAP entry deferred implementation pending demand. Cycle #28 dogfood probe found this during routine invariant audit (attempt to call `--version` as part of comprehensive CLI surface coverage). This is concrete evidence of real friction, not speculative gap-filling. ## Implementation Added `--version` flag to argparse in `build_parser()`: ```python parser.add_argument('--version', action='version', version='claw-code 1.0.0 (Python harness)') ``` Simple one-liner. Follows Python argparse conventions (built-in action='version'). ## Tests Added (3) TestMetadataFlags in test_exec_route_bootstrap_output_format.py: 1. test_version_flag_returns_version_text — `claw --version` prints version 2. test_help_flag_returns_help_text — `claw --help` still works 3. test_help_still_works_after_version_added — Both -h and --help work Regression guard on the original help surface. ## Test Status - 214 → 217 tests passing (+3) - Zero regressions - Full suite green ## Discipline This cycle exemplifies the cycle #24 calibration: - #180 was filed as 'deferred pending demand' - Cycle #28 dogfood found actual friction (proactive test coverage gap) - Evidence = concrete ('--version not found during invariant audit') - Action = minimal implementation + regression tests - No speculation, no feature creep, no implementation before evidence Not 'we imagined someone might want this.' Instead: 'we tried to call it during routine maintenance, got ENOENT, fixed it.' ## Related - #180 (cycle #24): Metadata protocol gap filed - Cycle #27: Cross-channel consistency audit established framework - Cycle #28 invariant audit: Discovered actual friction, triggered fix --- Classification (per cycle #24 calibration): - Red-state bug? ✗ (not a malfunction, just an absence) - Real friction? ✓ (audit probe could not call the flag, had to special-case) - Evidence-backed? ✓ (proactive test coverage revealed the gap) Source: Jobdori cycle #28 dogfood — invariant audit attempting comprehensive CLI surface coverage found that --version was unsupported.	2026-04-22 21:56:20 +09:00
YeonGyu-Kim	7724bf98fd	fix: #181 — envelope exit_code must match process exit code (exec-command/exec-tool) Cycle #26 dogfood found a real red-state bug in the JSON envelope contract. ## The Bug exec-command and exec-tool not-found cases return exit code 1 from the process, but the envelope reports exit_code: 0 (the default from wrap_json_envelope). This is a protocol violation. Repro (before fix): $ claw exec-command unknown-cmd test --output-format json > out.json $ echo $? 1 $ jq '.exit_code' out.json 0 # WRONG — envelope lies about exit code Claws reading the envelope's exit_code field get misinformation. A claw implementing the canonical ERROR_HANDLING.md pattern (check exit_code, then classify by error.kind) would incorrectly treat failures as successes when dispatching on the envelope alone. ## Root Cause main.py lines 687–739 (exec-command + exec-tool handlers): - Return statement: 'return 0 if result.handled else 1' (correct) - Envelope wrap: 'wrap_json_envelope(envelope, args.command)' (uses default exit_code=0, IGNORES the return value) The envelope wrap was called BEFORE the return value was computed, so the exit_code field was never synchronized with the actual exit code. ## The Fix Compute exit_code ONCE at the top: exit_code = 0 if result.handled else 1 Pass it explicitly to wrap_json_envelope: wrap_json_envelope(envelope, args.command, exit_code=exit_code) Return the same value: return exit_code This ensures the envelope's exit_code field is always truth — the SAME value the process returns. ## Tests Added (3) TestEnvelopeExitCodeMatchesProcessExit in test_exec_route_bootstrap_output_format.py: 1. test_exec_command_not_found_envelope_exit_matches: Verifies exec-command unknown-cmd returns exit 1 in both envelope and process. 2. test_exec_tool_not_found_envelope_exit_matches: Same for exec-tool. 3. test_all_commands_exit_code_invariant: Audit across 4 known non-zero cases (show-command, show-tool, exec-command, exec-tool not-found). Guards against the same bug in other surfaces. ## Impact - 206 → 209 passing tests (+3) - Zero regressions - Protocol contract now truthful: envelope.exit_code == process exit - Claws using the one-handler pattern from ERROR_HANDLING.md now get correct information ## Related - ERROR_HANDLING.md (cycle #22): Documented exit_code as machine-readable contract field - #178/#179 (cycles #19/#20): Closed parser-front-door contract - This closes a gap in the WORK PROTOCOL contract — envelope values must match reality, not just be structurally present. Classification (per cycle #24 calibration): - Red-state bug: ✓ (contract violation, claws get misinformation) - Real friction: ✓ (discovered via dogfood, not speculative) - Fix ships same-cycle: ✓ (discipline per maintainership mode) Source: Jobdori cycle #26 dogfood — ran multiple edge-case probes, noticed exec-command envelope showed exit_code: 0 while process exited 1. Investigated wrap_json_envelope default behavior, confirmed bug, fixed and tested in same cycle.	2026-04-22 21:33:57 +09:00
YeonGyu-Kim	8247d7d2eb	fix: #179 — JSON mode now fully suppresses argparse stderr + preserves real error message Dogfood discovered #178 had two residual gaps: 1. Stderr pollution: argparse usage + error text still leaked to stderr even in JSON mode (envelope was correct on stdout, but stderr noise broke the 'machine-first protocol' contract — claws capturing both streams got dual output) 2. Generic error message: envelope carried 'invalid command or argument (argparse rejection)' instead of argparse's actual text like 'the following arguments are required: session_id' or 'invalid choice: typo (choose from ...)' Before #179: $ claw load-session --output-format json [stdout] {"error": {"message": "invalid command or argument (argparse rejection)"}} [stderr] usage: main.py load-session [-h] ... main.py load-session: error: the following arguments are required: session_id [exit 1] After #179: $ claw load-session --output-format json [stdout] {"error": {"message": "the following arguments are required: session_id"}} [stderr] (empty) [exit 1] Implementation: - New _ArgparseError exception class captures argparse's real message - main() monkey-patches parser.error (+ all subparser.error) in JSON mode to raise _ArgparseError instead of print-to-stderr + sys.exit(2) - _emit_parse_error_envelope() now receives the real message verbatim - Text mode path unchanged: still uses original argparse print+exit behavior Contract: - JSON mode: stdout carries envelope with argparse's actual error; stderr silent - Text mode: unchanged — argparse usage to stderr, exit 2 - Parse errors still error.kind='parse', retryable=false Test additions (5 new, 14 total in test_parse_error_envelope.py): - TestParseErrorStderrHygiene (5): - test_json_mode_stderr_is_silent_on_unknown_command - test_json_mode_stderr_is_silent_on_missing_arg - test_json_mode_envelope_carries_real_argparse_message - test_json_mode_envelope_carries_invalid_choice_details (verifies valid-choices list) - test_text_mode_stderr_preserved_on_unknown_command (backward compat) Operational impact: Claws capturing both stdout and stderr no longer get garbled output. The envelope message now carries discoverability info (valid command list, missing-arg name) that claws can use for retry/recovery without probing the CLI a second time. Test results: 201 → 206 passing, 3 skipped unchanged, zero regression. Pinpoint discovered via dogfood at 2026-04-22 20:30 KST (cycle #20).	2026-04-22 20:32:28 +09:00
YeonGyu-Kim	517d7e224e	feat: #178 — argparse errors emit JSON envelope when --output-format json requested Dogfood pinpoint: running 'claw nonexistent-command --output-format json' bypasses the JSON envelope contract — argparse dumps human-readable usage to stderr with exit 2, breaking the SCHEMAS.md guarantee that JSON mode returns structured output. Problem: $ claw nonexistent --output-format json usage: main.py [-h] {summary,manifest,...} ... main.py: error: argument command: invalid choice: 'nonexistent' (choose from ...) [exit 2 — no envelope, claws must parse argparse usage messages] Fix: $ claw nonexistent --output-format json { "timestamp": "2026-04-22T11:00:29Z", "command": "nonexistent-command", "exit_code": 1, "output_format": "json", "schema_version": "1.0", "error": { "kind": "parse", "operation": "argparse", "target": "nonexistent-command", "retryable": false, "message": "invalid command or argument (argparse rejection)", "hint": "run with no arguments to see available subcommands" } } [exit 1, clean JSON envelope on stdout per SCHEMAS.md] Changes: - src/main.py: - _wants_json_output(argv): pre-scan for --output-format json before parsing - _emit_parse_error_envelope(argv, message): emit wrapped envelope on stdout - main(): catch SystemExit from argparse; if JSON requested, emit envelope instead of letting argparse's help dump go through - tests/test_parse_error_envelope.py (new, 9 tests): - TestParseErrorJsonEnvelope (7): unknown command, =syntax, text mode unchanged, invalid flag, missing command, valid command unaffected, common fields - TestParseErrorSchemaCompliance (2): error.kind='parse', retryable=false Contract: - text mode (default): unchanged — argparse dumps help to stderr, exits 2 - JSON mode: envelope per SCHEMAS.md, error.kind='parse', exit 1 - Parse errors always retryable=false (typo won't self-fix) - error.kind='parse' already enumerated in SCHEMAS.md (no schema changes) This closes a real gap: claws invoking unknown commands in JSON mode can now route via exit code + envelope.kind='parse' instead of scraping argparse output. Test results: 192 → 201 passing, 3 skipped unchanged, zero regression. Pinpoint discovered via dogfood at 2026-04-22 19:59 KST (cycle #19).	2026-04-22 20:02:39 +09:00
YeonGyu-Kim	11f9e8a5a2	feat: #164 Stage B CLOSURE — turn-loop JSON + cancel_observed coverage + CLAWABLE promotion Closes all three gaebal-gajae-identified closure criteria for #164 Stage B: 1. turn-loop runtime surface exposes cancel_observed consistently 2. cancellation path tests validate safe-to-reuse semantics 3. turn-loop promoted from OPT_OUT to CLAWABLE surface Changes: src/main.py: - turn-loop accepts --output-format {text,json} - JSON envelope includes per-turn cancel_observed + final_cancel_observed - All turn fields exposed: prompt, output, stop_reason, cancel_observed, matched_commands, matched_tools - Exit code 2 on final timeout preserved tests/test_cli_parity_audit.py: - CLAWABLE_SURFACES now contains 14 commands (was 13) - Removed 'turn-loop' from OPT_OUT_SURFACES - Parametrized --output-format test auto-validates turn-loop JSON tests/test_cancel_observed_field.py (new, 9 tests): - TestCancelObservedField (5 tests): field contract - default False - explicit True preserved - normal completion → False - bootstrap JSON exposes field - turn-loop JSON exposes per-turn field - TestCancelObservedSafeReuseSemantics (2 tests): reuse contract - timeout result has cancel_observed=True when signaled - engine.mutable_messages not corrupted after cancelled turn - engine accepts fresh message after cancellation - TestCancelObservedSchemaCompliance (2 tests): SCHEMAS.md contract - cancel_observed is always bool - final_cancel_observed convenience field present Closure criteria validated: - ✅ Field exposed in bootstrap JSON - ✅ Field exposed per-turn in turn-loop JSON - ✅ Field is always bool, never null - ✅ Safe-to-reuse: engine can accept fresh messages after cancellation - ✅ mutable_messages not corrupted by cancelled turn - ✅ turn-loop promoted from OPT_OUT (14 clawable commands now) Protocol now distinguishes at runtime: timeout + cancel_observed=false → infra/wedge (escalate) timeout + cancel_observed=true → cooperative cancellation (safe to retry) Test results: 182 → 192 passing, +10 tests, zero regression, 3 skipped unchanged. Closes #164 Stage B. Stage C (async-native preemption) remains future work.	2026-04-22 19:49:20 +09:00
YeonGyu-Kim	97c4b130dc	feat: #164 Stage B prep — add cancel_observed field to TurnResult #164 Stage B requires exposing whether cancellation was observed at the turn-result level. This commit adds the infrastructure field: Changes: - TurnResult.cancel_observed: bool = False (query_engine.py) - _build_timeout_result() accepts cancel_observed parameter (runtime.py) - Two timeout paths now pass cancel_event.is_set() to signal observation (runtime.py) - bootstrap command includes cancel_observed in turn JSON (main.py) - SCHEMAS.md documents Turn Result Fields with cancel_observed contract Usage: When a turn timeout occurs, cancel_observed=true indicates that the engine observed the cancellation event being set. This allows callers to distinguish: - timeout with no cancel → infrastructure/network stall - timeout with cancel observed → cooperative cancellation was triggered Backward compat: - Existing TurnResult construction without cancel_observed defaults to False - bootstrap JSON output still validates per SCHEMAS.md (new field is always present) Test results: 182 passing, 3 skipped, zero regression. Related: #161 (wall-clock timeout), #164 (cancellation observability protocol) ROADMAP continues #164 with Stage C (test coverage for cancellation + turn envelope).	2026-04-22 19:44:47 +09:00
YeonGyu-Kim	290ab7e41f	feat: #173 — wrap_json_envelope() applied to all 13 clawable commands (LOOP CLOSED) Completes the coverage → enforcement → documentation → alignment cycle. Every clawable command now emits the canonical JSON envelope per SCHEMAS.md: Common fields (now real in output): - timestamp (ISO 8601 UTC) - command (argv[1]) - exit_code (0/1/2) - output_format ('json') - schema_version ('1.0') 13 commands wrapped: - list-sessions, delete-session, load-session, flush-transcript - show-command, show-tool - exec-command, exec-tool, route, bootstrap - command-graph, tool-pool, bootstrap-graph Implementation: - Added wrap_json_envelope() helper in src/main.py - Wrapped all 18 JSON output paths (13 success + 5 error paths) - Applied exit_code=1 to error/not-found envelopes - Kept text mode byte-identical (backward compat preserved) Test updates: - 3 skipped common-field tests now pass automatically - 3 existing tests updated to verify common envelope fields while preserving command-specific field checks - test_list_sessions_cli_runs, test_delete_session_cli_idempotent, test_load_session_cli::test_json_mode_on_success Full suite: 179 → 182 passing (+3 activated from skipped), zero regression. Loop completion: Coverage (#167-#170) ✅ All 13 commands accept --output-format Enforcement (#171) ✅ CI blocks new commands without --output-format Documentation (#172) ✅ SCHEMAS.md defines envelope contract Alignment (#173 this) ✅ Actual output matches SCHEMAS.md contract Example output now: $ claw list-sessions --output-format json { "timestamp": "2026-04-22T10:34:12Z", "command": "list-sessions", "exit_code": 0, "output_format": "json", "schema_version": "1.0", "sessions": ["alpha", "bravo"], "count": 2 } Closes ROADMAP #173. Protocol is now documented AND real. Claws can build ONE error handler, ONE timestamp parser, ONE version check instead of 13 special cases.	2026-04-22 19:35:37 +09:00
YeonGyu-Kim	5a18e3aa1a	fix: #170 — bootstrap-graph now accepts --output-format; diagnostic surface parity complete Final diagnostic surface in the JSON parity sweep: bootstrap-graph (the runtime bootstrap/prefetch visualization) now supports --output-format. Concrete addition: - bootstrap-graph: --output-format {text,json} JSON envelope: {stages: [str], note: 'bootstrap-graph is markdown-only in this version'} Envelope explanation: bootstrap-graph's Markdown output is rich and textual; raw JSON embedding maintains the markdown format (split into lines array) rather than attempting lossy structural extraction that would lose information. This is an honest limitation in this cycle; full JSON schema can be added in a future audit if claws require structured bootstrap data (dependency graphs, prefetch timing, etc.). Backward compatibility: - Default is 'text' (Markdown unchanged) Closes ROADMAP #170. Related: #167, #168, #169. Diagnostic/inventory surface family is now uniformly JSON-capable. Summary, manifest, parity-audit, setup-report, command-graph, tool-pool, bootstrap-graph all accept --output-format.	2026-04-22 18:49:26 +09:00
YeonGyu-Kim	7fb95e95f6	fix: #169 — command-graph and tool-pool now accept --output-format; diagnostic inventory JSON parity Extends the diagnostic surface audit with the two inventory-structure commands: command-graph (command family segmentation) and tool-pool (assembled tool inventory). Both now expose their underlying rich datastructures via JSON envelope. Concrete additions: - command-graph: --output-format {text,json} - tool-pool: --output-format {text,json} JSON envelope shapes: command-graph: {builtins_count, plugin_like_count, skill_like_count, total_count, builtins: [{name, source_hint}], plugin_like: [{name, source_hint}], skill_like: [{name, source_hint}]} tool-pool: {simple_mode, include_mcp, tool_count, tools: [{name, source_hint}]} Backward compatibility: - Default is 'text' (Markdown unchanged) - Text output byte-identical to pre-#169 Tests (4 new, test_command_graph_tool_pool_output_format.py): - TestCommandGraphOutputFormat (2): JSON structure + text compat - TestToolPoolOutputFormat (2): JSON structure + text compat Full suite: 137 → 141 passing, zero regression. Closes ROADMAP #169. Why this matters: Claws auditing the codebase can now ask 'what commands exist' and 'what tools exist' and get structured, parseable answers instead of regex-parsing Markdown headers and counting list items. Related clusters: - Diagnostic surfaces (#169 adds to #167/#168 work-verb parity) - Inventory introspection (command-graph + tool-pool are the two foundational 'what do we have?' queries)	2026-04-22 18:47:34 +09:00
YeonGyu-Kim	60925fa9f7	fix: #168 — exec-command / exec-tool / route / bootstrap now accept --output-format; CLI family JSON parity COMPLETE Extends the #167 inspect-surface parity fix to the four remaining CLI outliers: the commands claws actually invoke to DO work, not just inspect state. After this commit, the entire claw-code CLI family speaks a unified JSON envelope contract. Concrete additions: - exec-command: --output-format {text,json} - exec-tool: --output-format {text,json} - route: --output-format {text,json} - bootstrap: --output-format {text,json} JSON envelope shapes: exec-command (handled): {name, prompt, source_hint, handled: true, message} exec-command (not-found): {name, prompt, handled: false, error: {kind:'command_not_found', message, retryable: false}} exec-tool (handled): {name, payload, source_hint, handled: true, message} exec-tool (not-found): {name, payload, handled: false, error: {kind:'tool_not_found', message, retryable: false}} route: {prompt, limit, match_count, matches: [{kind, name, score, source_hint}]} bootstrap: {prompt, limit, setup: {python_version, implementation, platform_name, test_command}, routed_matches: [{kind, name, score, source_hint}], command_execution_messages: [str], tool_execution_messages: [str], turn: {prompt, output, stop_reason}, persisted_session_path} Exit codes (unchanged from pre-#168): 0 = success 1 = exec not-found (exec-command, exec-tool only) Backward compatibility: - Default (no --output-format) is 'text' - exec-command/exec-tool text output byte-identical - route text output: unchanged tab-separated kind/name/score/source_hint - bootstrap text output: unchanged Markdown runtime session report Tests (13 new, test_exec_route_bootstrap_output_format.py): - TestExecCommandOutputFormat (3): handled + not-found JSON; text compat - TestExecToolOutputFormat (3): handled + not-found JSON; text compat - TestRouteOutputFormat (3): JSON envelope; zero-matches case; text compat - TestBootstrapOutputFormat (2): JSON envelope; text-mode Markdown compat - TestFamilyWideJsonParity (2): parametrised over ALL 6 family commands (show-command, show-tool, exec-command, exec-tool, route, bootstrap) — every one accepts --output-format json and emits parseable JSON; every one defaults to text mode without a leading {. One future regression on any family member breaks this test. Full suite: 124 → 137 passing, zero regression. Closes ROADMAP #168. This completes the CLI-wide JSON parity sweep: - Session-lifecycle family: #160 (list/delete), #165 (load), #166 (flush) - Inspect family: #167 (show-command, show-tool) - Work-verb family: #168 (exec-command, exec-tool, route, bootstrap) ENTIRE CLI SURFACE is now machine-readable via --output-format json with typed errors, deterministic exit codes, and consistent envelope shape. Claws no longer need to regex-parse any CLI output. Related clusters: - Clawability principle: 'machine-readable in state and failure modes' (ROADMAP top-level). 9 pinpoints in this cluster; all now landed. - Typed-error envelope consistency: command_not_found / tool_not_found / session_not_found / session_load_failed all share {kind, message, retryable} shape. - Work-verb semantics: exec-* surfaces expose 'handled' boolean (not 'found') because 'not handled' is the operational signal — claws dispatch on whether the work was performed, not whether the entry exists in the inventory.	2026-04-22 18:34:26 +09:00
YeonGyu-Kim	01dca90e95	fix: #167 — show-command and show-tool now accept --output-format flag; CLI parity with session-lifecycle family Closes the inspect-capability parity gap: show-command and show-tool were the only discovery/inspection CLI commands lacking --output-format support, making them outliers in the ecosystem that already had unified JSON contracts across list-sessions, load-session, delete-session, and flush-transcript (#160/#165/#166). Concrete additions: - show-command: --output-format {text,json} - show-tool: --output-format {text,json} JSON envelope shape (found case): {name, found: true, source_hint, responsibility} JSON envelope shape (not-found case): {name, found: false, error: {kind:'command_not_found'\|'tool_not_found', message, retryable: false}} Exit codes: 0 = success 1 = not found Backward compatibility: - Default (no --output-format) is 'text' (unchanged) - Text output byte-identical to pre-#167 (three newline-separated lines) Tests (10 new, test_show_command_tool_output_format.py): - TestShowCommandOutputFormat (5): found + not-found in JSON; text mode backward compat; text is default - TestShowToolOutputFormat (3): found + not-found in JSON; text mode backward compat - TestShowCommandToolFormatParity (2): both accept same flag choices; consistent JSON envelope shape Full suite: 114 → 124 passing, zero regression. Closes ROADMAP #167. Why this matters: Before: Claws calling show-command/show-tool had to parse human-readable prose output via regex, with no structured error signal. After: Same envelope contract as load-session and friends: JSON-first, typed errors, machine-parseable. Related clusters: - Session-lifecycle CLI parity family (#160, #165, #166, #167) - Machine-readable error contracts (same vein as #162 atomicity + #164 cancellation state-safety: structured boundaries for orchestration)	2026-04-22 18:21:38 +09:00
YeonGyu-Kim	524edb2b2e	fix: #164 Stage A — cooperative cancellation via cancel_event in submit_message Closes the #161 follow-up gap identified in review: wall-clock timeout bounded caller-facing wait but did not cancel the underlying provider thread, which could silently mutate mutable_messages / transcript_store / permission_denials / total_usage after the caller had already observed stop_reason='timeout'. A ghost turn committed post-deadline would poison any session that got persisted afterwards. Stage A scope (this commit): runtime + engine layer cooperative cancel. Engine layer (src/query_engine.py): - submit_message now accepts cancel_event: threading.Event \| None = None - Two safe checkpoints: 1. Entry (before max_turns / budget projection) — earliest possible return 2. Post-budget (after output synthesis, before mutation) — catches cancel that arrives while output was being computed - Both checkpoints return stop_reason='cancelled' with state UNCHANGED (mutable_messages, transcript_store, permission_denials, total_usage all preserved exactly as on entry) - cancel_event=None preserves legacy behaviour with zero overhead (no checkpoint checks at all) Runtime layer (src/runtime.py): - run_turn_loop creates one cancel_event per invocation when a deadline is in play (and None otherwise, preserving legacy fast path) - Passes the same event to every submit_message call across turns, so a late cancel on turn N-1 affects turn N - On timeout (either pre-call or mid-call), runtime explicitly calls cancel_event.set() before future.cancel() + synthesizing the timeout TurnResult. This upgrades #161's best-effort future.cancel() (which only cancels not-yet-started futures) to cooperative mid-flight cancel. Stop reason taxonomy after Stage A: 'completed' — turn committed, state mutated exactly once 'max_budget_reached' — overflow, state unchanged (#162) 'max_turns_reached' — capacity exceeded, state unchanged 'cancelled' — cancel_event observed, state unchanged (#164 Stage A) 'timeout' — synthesised by runtime, not engine (#161) The 'cancelled' vs 'timeout' split matters: - 'timeout' is the runtime's best-effort signal to the caller: deadline hit - 'cancelled' is the engine's confirmation: cancel was observed + honoured If the provider call wedges entirely (never reaches a checkpoint), the caller still sees 'timeout' and the thread is leaked — but any NEXT submit_message call on the same engine observes the event at entry and returns 'cancelled' immediately, preventing ghost-turn accumulation. This is the honest cooperative limit in Python threading land; true preemption requires async-native provider IO (future work, not Stage A). Tests (29 new tests, tests/test_submit_message_cancellation.py + tests/ test_run_turn_loop_cancellation.py): Engine-layer (12 tests): - TestCancellationBeforeCall (5): pre-set event returns 'cancelled' immediately; mutable_messages, transcript_store, usage, permission_denials all preserved - TestCancellationAfterBudgetCheck (1): cancel set mid-call (after projection, before commit) still honoured; output synthesised but state untouched - TestCancellationAfterCommit (2): post-commit cancel not observable (honest limit) BUT next call on same engine observes it + returns 'cancelled' - TestLegacyCallersUnchanged (3): cancel_event=None preserves #162 atomicity + max_turns contract with zero behaviour change - TestCancellationVsOtherStopReasons (2): cancel precedes max_turns check; cancel does not retroactively override a completed turn Runtime-layer (5 tests): - TestTimeoutPropagatesCancelEvent (3): submit_message receives a real Event object when deadline is set; None in legacy mode; timeout actually calls event.set() so in-flight threads observe at their next checkpoint - TestCancelEventSharedAcrossTurns (1): same event object passed to every turn (object identity check) — late cancel on turn N-1 must affect turn N Regression: 3 existing timeout test mocks updated to accept cancel_event kwarg (mocks that previously had signature (prompt, commands, tools, denials) now have (prompt, commands, tools, denials, cancel_event=None) since runtime passes cancel_event positionally on the timeout path). Full suite: 97 → 114 passing, zero regression. Closes ROADMAP #164 Stage A. What's explicitly NOT in Stage A: - Preemptive cancellation of wedged provider IO (requires asyncio-native provider path; larger refactor) - Timeout on the legacy unbounded run_turn_loop path (by design: legacy callers opt out of cancellation entirely) - CLI exposure of 'cancelled' as a distinct exit code (currently 'cancelled' maps to the same stop_reason != 'completed' break condition as others; CLI surface for cancel is a separate pinpoint if warranted)	2026-04-22 18:14:14 +09:00
YeonGyu-Kim	85de7f9814	fix: #166 — flush-transcript now accepts --directory / --output-format / --session-id; session-creation command parity with #160/#165 lifecycle triplet	2026-04-22 18:04:25 +09:00
YeonGyu-Kim	178c8fac28	fix: #159 — run_turn_loop no longer hardcodes empty denied_tools; permission denials now parity-match bootstrap_session #159: multi-turn sessions had a silent security asymmetry: denied_tools were always empty in run_turn_loop, even though bootstrap_session inferred them from the routed matches. Result: any tool gated as 'destructive' (bash-family commands, rm, etc) would silently appear unblocked across all turns in multi-turn mode, giving a false 'clean' permission picture to any claw consuming TurnResult.permission_denials. Fix: compute denied_tools once at loop start via _infer_permission_denials, then pass the same denials to every submit_message call (both timeout and legacy unbounded paths). This mirrors the existing bootstrap_session pattern. Acceptance: run_turn_loop('run bash ls').permission_denials now matches what bootstrap_session returns — both infer the same denials from the routed matches. Multi-turn security posture is symmetric. Tests (tests/test_run_turn_loop_permissions.py, 2 tests): - test_turn_loop_surfaces_permission_denials_like_bootstrap: Symmetry check confirming both paths infer identical denials for destructive tools - test_turn_loop_with_continuation_preserves_denials: Denials inferred at loop start are passed consistently to all turns; captured via mock and verified non-empty Full suite: 82/82 passing, zero regression. Closes ROADMAP #159.	2026-04-22 17:50:21 +09:00
YeonGyu-Kim	d453eedae6	fix: #165 — load-session CLI now parity-matches list/delete (--directory, --output-format, typed JSON errors) The #160 session-lifecycle CLI triplet was asymmetric: list-sessions and delete-session accepted --directory + --output-format and emitted typed JSON error envelopes, but load-session had neither flag and dumped a raw Python traceback (including the SessionNotFoundError class name) on a missing session. Three concrete impacts this fix closes: 1. Alternate session-store locations (e.g. /tmp/claw-run-XXX/.port_sessions) were unreachable via load-session; claws had to chdir or monkeypatch DEFAULT_SESSION_DIR to work around it. 2. Not-found emitted a multi-line Python stack, not a parseable envelope. Claws deciding retry/escalate/give-up had only exit code 1 to work with. 3. The traceback leaked 'src.session_store.SessionNotFoundError' verbatim, coupling version-pinned claws to our internal exception class name. Now all three triplet commands accept the same flag pair and emit the same JSON error shape: Success (json mode): {"session_id": "alpha", "loaded": true, "messages_count": 3, "input_tokens": 42, "output_tokens": 99} Not-found: {"session_id": "missing", "loaded": false, "error": {"kind": "session_not_found", "message": "session 'missing' not found in /path", "directory": "/path", "retryable": false}} Corrupted file: {"session_id": "broken", "loaded": false, "error": {"kind": "session_load_failed", "message": "...", "directory": "/path", "retryable": true}} Exit code contract: - 0 on successful load - 1 on not-found (preserves existing $?) - 1 on OSError/JSONDecodeError (distinct 'kind' in JSON) Backward compat: legacy 'claw load-session ID' text output unchanged byte-for-byte. Only new behaviour is the flags and structured error path. Tests (tests/test_load_session_cli.py, 13 tests): - TestDirectoryFlagParity (2): --directory works + fallback to CWD/.port_sessions - TestOutputFormatFlagParity (2): json schema + text-mode backward compat - TestNotFoundTypedError (2): JSON envelope on not-found; no traceback in either mode; no internal class name leak - TestLoadFailedDistinctFromNotFound (1): corrupted file = session_load_failed with retryable=true, distinct from session_not_found - TestTripletParityConsistency (6): parametrised over [list, delete, load] * [--directory, --output-format] — explicit parity guard for future regressions Full suite: 80/80 passing, zero regression. Discovered via Jobdori dogfood sweep 2026-04-22 17:44 KST — ran 'claw load-session nonexistent' expecting a clean error, got a Python traceback. Filed #165 + fixed in same commit. Closes ROADMAP #165.	2026-04-22 17:44:48 +09:00
YeonGyu-Kim	79a9f0e6f6	fix: #163 — remove [turn N] suffix pollution from run_turn_loop; file #164 timeout-cancellation followup #163: run_turn_loop no longer injects f'{prompt} [turn N]' into follow-up prompts. The suffix was never defined or interpreted anywhere — not by the engine, not by the system prompt, not by any LLM. It looked like a real user-typed annotation in the transcript and made replay/analysis fragile. New behaviour: - turn 0 submits the original prompt (unchanged) - turn > 0 submits caller-supplied continuation_prompt if provided, else the loop stops cleanly — no fabricated user turn - added continuation_prompt: str \| None = None parameter to run_turn_loop - added --continuation-prompt CLI flag for claws scripting multi-turn loops - zero '[turn' strings ever appear in mutable_messages or stdout now Behaviour change for existing callers: - Before: run_turn_loop(prompt, max_turns=3) submitted 3 turns ('prompt', 'prompt [turn 2]', 'prompt [turn 3]') - After: run_turn_loop(prompt, max_turns=3) submits 1 turn ('prompt') - To preserve old multi-turn behaviour, pass continuation_prompt='Continue.' or any structured follow-up text One existing timeout test (test_budget_is_cumulative_across_turns) updated to pass continuation_prompt so the cumulative-budget contract is actually exercised across turns instead of trivially satisfied by a one-turn loop. #164 filed: addresses reviewer feedback on #161. The wall-clock timeout bounds the caller-facing wait, but the underlying submit_message worker thread keeps running and can mutate engine state after the timeout TurnResult is returned. A cooperative cancel_event pattern is sketched in the pinpoint; real asyncio.Task.cancel() support will come once provider IO is async-native (larger refactor). Tests (tests/test_run_turn_loop_continuation.py, 8 tests): - TestNoTurnSuffixInjection (2): zero '[turn' strings in any submitted prompt, both default and explicit-continuation paths - TestContinuationDefaultStopsAfterTurnZero (2): default loops run exactly one turn; engine.submit_message called exactly once despite max_turns=10 - TestExplicitContinuationBehaviour (2): turn 0 = original, turn N = continuation verbatim; max_turns still respected - TestCLIContinuationFlag (2): CLI default emits only '## Turn 1'; --continuation-prompt wires through to multi-turn behaviour Full suite: 67/67 passing. Closes ROADMAP #163. Files #164.	2026-04-22 17:37:22 +09:00
YeonGyu-Kim	4813a2b351	fix: #162 — budget-overflow no longer corrupts session state in submit_message Previously, QueryEnginePort.submit_message() checked the token budget AFTER appending the prompt to mutable_messages, transcript_store, and permission_denials, and AFTER calling compact_messages_if_needed(). On overflow it set stop_reason='max_budget_reached' but the overflow turn was already committed. Any caller that persisted the session afterwards wrote the rejected prompt to disk — the session was silently poisoned even though the TurnResult said the turn never completed. Fix: - Restructure submit_message so the budget check early-returns BEFORE any mutation of mutable_messages, transcript_store, permission_denials, or total_usage. - The returned TurnResult.usage reflects pre-call state (overflow never advanced the usage counter). - Normal (in-budget) path unchanged: mutation happens exactly once, at the end, only on 'completed' results. This closes the atomicity gap: submit_message is now either 'turn committed' (stop_reason='completed') or 'turn rejected, state untouched' (stop_reason in {'max_budget_reached', 'max_turns_reached'}). Callers can safely retry with a fresh budget or a smaller prompt without worrying about phantom committed turns from prior rejections. Tests (tests/test_submit_message_budget.py, 10 tests): - TestBudgetOverflowDoesNotMutate (5): mutable_messages / transcript / permission_denials / total_usage / TurnResult.usage all pre-mutation after overflow - TestOverflowPersistence (2): first-turn overflow persists empty session; successful-turn-then-overflow persists only the successful turn - TestEngineUsableAfterOverflow (2): subsequent in-budget call still works with no residue; repeated overflows don't accumulate hidden state - TestNormalPathStillCommits (1): regression guard — non-overflow path still commits mutable_messages/transcript/usage as expected Full suite: 59/59 passing, zero regression. Blocker: none. Closes ROADMAP #162.	2026-04-22 17:29:55 +09:00
YeonGyu-Kim	3f4d46d7b4	fix: #161 — wall-clock timeout for run_turn_loop; stalled turns now abort with stop_reason='timeout' Previously, run_turn_loop was bounded only by max_turns (turn count). If engine.submit_message stalled — slow provider, hung network, infinite stream — the loop blocked indefinitely with no cancellation path. Claws calling run_turn_loop in CI or orchestration had no reliable way to enforce a deadline; the loop would hang until OS kill or human intervention. Fix: - Add timeout_seconds parameter to run_turn_loop (default None = legacy unbounded). - When set, each submit_message call runs inside a ThreadPoolExecutor and is bounded by the remaining wall-clock budget (total across all turns, not per-turn). - On timeout, synthesize a TurnResult with stop_reason='timeout' carrying the turn's prompt and routed matches so transcripts preserve orchestration context. - Exhausted/negative budget short-circuits before calling submit_message. - Legacy path (timeout_seconds=None) bypasses the executor entirely — zero overhead for callers that don't opt in. CLI: - Added --timeout-seconds flag to 'turn-loop' command. - Exit code 2 when the loop terminated on timeout (vs 0 for completed), so shell scripts can distinguish 'done' from 'budget exhausted'. Tests (tests/test_run_turn_loop_timeout.py, 6 tests): - Legacy unbounded path unchanged (timeout_seconds=None never emits 'timeout') - Hung submit_message aborted within budget (0.3s budget, 5s mock hang → exit <1.5s) - Budget is cumulative across turns (0.6s budget, 0.4s per turn, not per-turn) - timeout_seconds=0 short-circuits first turn without calling submit_message - Negative timeout treated as exhausted (guard against caller bugs) - Timeout TurnResult carries correct prompt, matches, UsageSummary shape Full suite: 49/49 passing, zero regression. Blocker: none. Closes ROADMAP #161.	2026-04-22 17:23:43 +09:00
YeonGyu-Kim	6a76cc7c08	feat(#160 ): wire claw list-sessions and delete-session CLI commands Closes the last #160 gap: claws can now manage session lifecycle entirely through the CLI without filesystem hacks. New commands: - claw list-sessions [--directory DIR] [--output-format text\|json] Enumerates stored session IDs. JSON mode emits {sessions, count}. Missing/empty directories return empty list (exit 0), not an error. - claw delete-session SESSION_ID [--directory DIR] [--output-format text\|json] Idempotent: not-found is exit 0 with status='not_found' (no raise). Partial-failure: exit 1 with typed JSON error envelope: {session_id, deleted: false, error: {kind, message, retryable}} The 'session_delete_failed' kind is retryable=true so orchestrators know to retry vs escalate. Public API surface extended in src/__init__.py: - list_sessions, session_exists, delete_session - SessionNotFoundError, SessionDeleteError Tests added (tests/test_porting_workspace.py): - test_list_sessions_cli_runs: text + json modes against tempdir - test_delete_session_cli_idempotent: first call deleted=true, second call deleted=false (exit 0, status=not_found) - test_delete_session_cli_partial_failure_exit_1: permission error surfaces as exit 1 + typed JSON error with retryable=true All 43 tests pass. The session storage abstraction chapter is closed: - storage layer decoupled from claw code (#160 initial impl) - delete contract hardened + caller-audited (#160 hardening pass) - CLI wired with idempotency preserved at exit-code boundary (this commit)	2026-04-22 17:16:53 +09:00
YeonGyu-Kim	527c0f971c	fix(#160 ): harden delete_session contract — idempotency, race-safety, typed partial-failure Addresses review feedback on initial #160 implementation: 1. delete_session() contract now explicit: - Idempotent: delete(x); delete(x) is safe, second call returns False - Race-safe: TOCTOU between exists()/unlink() eliminated via unlink-then-catch - Partial-failure typed: permission/IO errors wrapped in SessionDeleteError (OSError subclass) so callers can distinguish 'not found' (return False) from 'could not delete' (raise) 2. New SessionDeleteError class for partial-failure surfacing. Distinct from SessionNotFoundError (KeyError subclass for missing loads). 3. Caller audit confirmed: no code outside session_store globs .port_sessions or imports DEFAULT_SESSION_DIR. Storage layout is fully encapsulated. 4. Added tests/test_session_store.py — 18 tests covering: - list_sessions: empty/missing/sorted/non-json filter - session_exists: true/false/missing-dir - load_session: SessionNotFoundError typing (KeyError subclass, not FileNotFoundError) - delete_session idempotency: first/second/never-existed calls - delete_session partial-failure: SessionDeleteError wraps OSError - delete_session race-safety: concurrent deletion returns False, not raise - Full save->list->exists->load->delete roundtrip All 18 tests pass. Merge-ready: contract documented, caller-audited, race-safe.	2026-04-22 17:11:26 +09:00
YeonGyu-Kim	504d238af1	fix: #160 — add list_sessions, session_exists, delete_session to session_store - list_sessions(directory=None) -> list[str]: enumerate stored session IDs - session_exists(session_id, directory=None) -> bool: check existence without FileNotFoundError - delete_session(session_id, directory=None) -> bool: unlink a session file - load_session now raises typed SessionNotFoundError (subclass of KeyError) instead of FileNotFoundError - Claws can now manage session lifecycle without reaching past the module to glob filesystem Closes ROADMAP #160. Acceptance: claw can call list_sessions(), session_exists(id), delete_session(id) without importing Path or knowing .port_sessions/<id>.json layout.	2026-04-22 17:08:01 +09:00
Jobdori	8cc7d4c641	chore: additional AI slop cleanup and enforcer wiring from sessions 1/5 Session 1 (ses_2ad65873): with_enforcer builders + 2 regression tests Session 5 (ses_2ad67e8e): continued AI slop cleanup pass — redundant comments, unused_self suppressions, unreachable! tightening Session cleanup (ses_2ad6b26c): Python placeholder centralization Workspace tests: 363+ passed, 0 failed.	2026-04-03 18:35:27 +09:00
instructkr	01bf54ad15	Rewriting Project Claw Code - Python port with Rust on the way	2026-03-31 08:16:20 -07:00
instructkr	507c2460b9	Make the repository's primary source tree genuinely Python The old tracked TypeScript snapshot has been removed from the repository history and the root directory is now a Python porting workspace. README and tests now describe and verify the Python-first layout instead of treating the exposed snapshot as the active source tree. A local archive can still exist outside Git, but the tracked repository now presents only the Python porting surface, related essay context, and OmX workflow artifacts. Constraint: Tracked history should collapse to a single commit while excluding the archived snapshot from Git Rejected: Keep the exposed TypeScript tree in tracked history under an archive path \| user explicitly wanted only the Python porting repo state in Git Confidence: medium Scope-risk: broad Reversibility: messy Directive: Keep future tracked additions focused on the Python port itself; do not reintroduce the exposed snapshot into Git history Tested: python3 -m unittest discover -s tests -v; python3 -m src.main summary; git diff --check Not-tested: Behavioral parity with the original TypeScript system beyond the current Python workspace surface	2026-03-31 07:17:34 -07:00

24 Commits