mirror of
https://github.com/ultraworkers/claw-code.git
synced 2026-04-24 13:08:11 +08:00
13 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
60925fa9f7 |
fix: #168 — exec-command / exec-tool / route / bootstrap now accept --output-format; CLI family JSON parity COMPLETE
Extends the #167 inspect-surface parity fix to the four remaining CLI outliers: the commands claws actually invoke to DO work, not just inspect state. After this commit, the entire claw-code CLI family speaks a unified JSON envelope contract. Concrete additions: - exec-command: --output-format {text,json} - exec-tool: --output-format {text,json} - route: --output-format {text,json} - bootstrap: --output-format {text,json} JSON envelope shapes: exec-command (handled): {name, prompt, source_hint, handled: true, message} exec-command (not-found): {name, prompt, handled: false, error: {kind:'command_not_found', message, retryable: false}} exec-tool (handled): {name, payload, source_hint, handled: true, message} exec-tool (not-found): {name, payload, handled: false, error: {kind:'tool_not_found', message, retryable: false}} route: {prompt, limit, match_count, matches: [{kind, name, score, source_hint}]} bootstrap: {prompt, limit, setup: {python_version, implementation, platform_name, test_command}, routed_matches: [{kind, name, score, source_hint}], command_execution_messages: [str], tool_execution_messages: [str], turn: {prompt, output, stop_reason}, persisted_session_path} Exit codes (unchanged from pre-#168): 0 = success 1 = exec not-found (exec-command, exec-tool only) Backward compatibility: - Default (no --output-format) is 'text' - exec-command/exec-tool text output byte-identical - route text output: unchanged tab-separated kind/name/score/source_hint - bootstrap text output: unchanged Markdown runtime session report Tests (13 new, test_exec_route_bootstrap_output_format.py): - TestExecCommandOutputFormat (3): handled + not-found JSON; text compat - TestExecToolOutputFormat (3): handled + not-found JSON; text compat - TestRouteOutputFormat (3): JSON envelope; zero-matches case; text compat - TestBootstrapOutputFormat (2): JSON envelope; text-mode Markdown compat - TestFamilyWideJsonParity (2): parametrised over ALL 6 family commands (show-command, show-tool, exec-command, exec-tool, route, bootstrap) — every one accepts --output-format json and emits parseable JSON; every one defaults to text mode without a leading {. One future regression on any family member breaks this test. Full suite: 124 → 137 passing, zero regression. Closes ROADMAP #168. This completes the CLI-wide JSON parity sweep: - Session-lifecycle family: #160 (list/delete), #165 (load), #166 (flush) - Inspect family: #167 (show-command, show-tool) - Work-verb family: #168 (exec-command, exec-tool, route, bootstrap) ENTIRE CLI SURFACE is now machine-readable via --output-format json with typed errors, deterministic exit codes, and consistent envelope shape. Claws no longer need to regex-parse any CLI output. Related clusters: - Clawability principle: 'machine-readable in state and failure modes' (ROADMAP top-level). 9 pinpoints in this cluster; all now landed. - Typed-error envelope consistency: command_not_found / tool_not_found / session_not_found / session_load_failed all share {kind, message, retryable} shape. - Work-verb semantics: exec-* surfaces expose 'handled' boolean (not 'found') because 'not handled' is the operational signal — claws dispatch on whether the work was performed, not whether the entry exists in the inventory. |
||
|
|
01dca90e95 |
fix: #167 — show-command and show-tool now accept --output-format flag; CLI parity with session-lifecycle family
Closes the inspect-capability parity gap: show-command and show-tool were
the only discovery/inspection CLI commands lacking --output-format support,
making them outliers in the ecosystem that already had unified JSON
contracts across list-sessions, load-session, delete-session, and
flush-transcript (#160/#165/#166).
Concrete additions:
- show-command: --output-format {text,json}
- show-tool: --output-format {text,json}
JSON envelope shape (found case):
{name, found: true, source_hint, responsibility}
JSON envelope shape (not-found case):
{name, found: false, error: {kind:'command_not_found'|'tool_not_found',
message, retryable: false}}
Exit codes:
0 = success
1 = not found
Backward compatibility:
- Default (no --output-format) is 'text' (unchanged)
- Text output byte-identical to pre-#167 (three newline-separated lines)
Tests (10 new, test_show_command_tool_output_format.py):
- TestShowCommandOutputFormat (5): found + not-found in JSON; text mode
backward compat; text is default
- TestShowToolOutputFormat (3): found + not-found in JSON; text mode
backward compat
- TestShowCommandToolFormatParity (2): both accept same flag choices;
consistent JSON envelope shape
Full suite: 114 → 124 passing, zero regression.
Closes ROADMAP #167.
Why this matters:
Before: Claws calling show-command/show-tool had to parse human-readable
prose output via regex, with no structured error signal.
After: Same envelope contract as load-session and friends: JSON-first,
typed errors, machine-parseable.
Related clusters:
- Session-lifecycle CLI parity family (#160, #165, #166, #167)
- Machine-readable error contracts (same vein as #162 atomicity + #164
cancellation state-safety: structured boundaries for orchestration)
|
||
|
|
524edb2b2e |
fix: #164 Stage A — cooperative cancellation via cancel_event in submit_message
Closes the #161 follow-up gap identified in review: wall-clock timeout bounded caller-facing wait but did not cancel the underlying provider thread, which could silently mutate mutable_messages / transcript_store / permission_denials / total_usage after the caller had already observed stop_reason='timeout'. A ghost turn committed post-deadline would poison any session that got persisted afterwards. Stage A scope (this commit): runtime + engine layer cooperative cancel. Engine layer (src/query_engine.py): - submit_message now accepts cancel_event: threading.Event | None = None - Two safe checkpoints: 1. Entry (before max_turns / budget projection) — earliest possible return 2. Post-budget (after output synthesis, before mutation) — catches cancel that arrives while output was being computed - Both checkpoints return stop_reason='cancelled' with state UNCHANGED (mutable_messages, transcript_store, permission_denials, total_usage all preserved exactly as on entry) - cancel_event=None preserves legacy behaviour with zero overhead (no checkpoint checks at all) Runtime layer (src/runtime.py): - run_turn_loop creates one cancel_event per invocation when a deadline is in play (and None otherwise, preserving legacy fast path) - Passes the same event to every submit_message call across turns, so a late cancel on turn N-1 affects turn N - On timeout (either pre-call or mid-call), runtime explicitly calls cancel_event.set() before future.cancel() + synthesizing the timeout TurnResult. This upgrades #161's best-effort future.cancel() (which only cancels not-yet-started futures) to cooperative mid-flight cancel. Stop reason taxonomy after Stage A: 'completed' — turn committed, state mutated exactly once 'max_budget_reached' — overflow, state unchanged (#162) 'max_turns_reached' — capacity exceeded, state unchanged 'cancelled' — cancel_event observed, state unchanged (#164 Stage A) 'timeout' — synthesised by runtime, not engine (#161) The 'cancelled' vs 'timeout' split matters: - 'timeout' is the runtime's best-effort signal to the caller: deadline hit - 'cancelled' is the engine's confirmation: cancel was observed + honoured If the provider call wedges entirely (never reaches a checkpoint), the caller still sees 'timeout' and the thread is leaked — but any NEXT submit_message call on the same engine observes the event at entry and returns 'cancelled' immediately, preventing ghost-turn accumulation. This is the honest cooperative limit in Python threading land; true preemption requires async-native provider IO (future work, not Stage A). Tests (29 new tests, tests/test_submit_message_cancellation.py + tests/ test_run_turn_loop_cancellation.py): Engine-layer (12 tests): - TestCancellationBeforeCall (5): pre-set event returns 'cancelled' immediately; mutable_messages, transcript_store, usage, permission_denials all preserved - TestCancellationAfterBudgetCheck (1): cancel set mid-call (after projection, before commit) still honoured; output synthesised but state untouched - TestCancellationAfterCommit (2): post-commit cancel not observable (honest limit) BUT next call on same engine observes it + returns 'cancelled' - TestLegacyCallersUnchanged (3): cancel_event=None preserves #162 atomicity + max_turns contract with zero behaviour change - TestCancellationVsOtherStopReasons (2): cancel precedes max_turns check; cancel does not retroactively override a completed turn Runtime-layer (5 tests): - TestTimeoutPropagatesCancelEvent (3): submit_message receives a real Event object when deadline is set; None in legacy mode; timeout actually calls event.set() so in-flight threads observe at their next checkpoint - TestCancelEventSharedAcrossTurns (1): same event object passed to every turn (object identity check) — late cancel on turn N-1 must affect turn N Regression: 3 existing timeout test mocks updated to accept cancel_event kwarg (mocks that previously had signature (prompt, commands, tools, denials) now have (prompt, commands, tools, denials, cancel_event=None) since runtime passes cancel_event positionally on the timeout path). Full suite: 97 → 114 passing, zero regression. Closes ROADMAP #164 Stage A. What's explicitly NOT in Stage A: - Preemptive cancellation of wedged provider IO (requires asyncio-native provider path; larger refactor) - Timeout on the legacy unbounded run_turn_loop path (by design: legacy callers opt out of cancellation entirely) - CLI exposure of 'cancelled' as a distinct exit code (currently 'cancelled' maps to the same stop_reason != 'completed' break condition as others; CLI surface for cancel is a separate pinpoint if warranted) |
||
|
|
85de7f9814 | fix: #166 — flush-transcript now accepts --directory / --output-format / --session-id; session-creation command parity with #160/#165 lifecycle triplet | ||
|
|
178c8fac28 |
fix: #159 — run_turn_loop no longer hardcodes empty denied_tools; permission denials now parity-match bootstrap_session
#159: multi-turn sessions had a silent security asymmetry: denied_tools were always empty in run_turn_loop, even though bootstrap_session inferred them from the routed matches. Result: any tool gated as 'destructive' (bash-family commands, rm, etc) would silently appear unblocked across all turns in multi-turn mode, giving a false 'clean' permission picture to any claw consuming TurnResult.permission_denials. Fix: compute denied_tools once at loop start via _infer_permission_denials, then pass the same denials to every submit_message call (both timeout and legacy unbounded paths). This mirrors the existing bootstrap_session pattern. Acceptance: run_turn_loop('run bash ls').permission_denials now matches what bootstrap_session returns — both infer the same denials from the routed matches. Multi-turn security posture is symmetric. Tests (tests/test_run_turn_loop_permissions.py, 2 tests): - test_turn_loop_surfaces_permission_denials_like_bootstrap: Symmetry check confirming both paths infer identical denials for destructive tools - test_turn_loop_with_continuation_preserves_denials: Denials inferred at loop start are passed consistently to all turns; captured via mock and verified non-empty Full suite: 82/82 passing, zero regression. Closes ROADMAP #159. |
||
|
|
d453eedae6 |
fix: #165 — load-session CLI now parity-matches list/delete (--directory, --output-format, typed JSON errors)
The #160 session-lifecycle CLI triplet was asymmetric: list-sessions and delete-session accepted --directory + --output-format and emitted typed JSON error envelopes, but load-session had neither flag and dumped a raw Python traceback (including the SessionNotFoundError class name) on a missing session. Three concrete impacts this fix closes: 1. Alternate session-store locations (e.g. /tmp/claw-run-XXX/.port_sessions) were unreachable via load-session; claws had to chdir or monkeypatch DEFAULT_SESSION_DIR to work around it. 2. Not-found emitted a multi-line Python stack, not a parseable envelope. Claws deciding retry/escalate/give-up had only exit code 1 to work with. 3. The traceback leaked 'src.session_store.SessionNotFoundError' verbatim, coupling version-pinned claws to our internal exception class name. Now all three triplet commands accept the same flag pair and emit the same JSON error shape: Success (json mode): {"session_id": "alpha", "loaded": true, "messages_count": 3, "input_tokens": 42, "output_tokens": 99} Not-found: {"session_id": "missing", "loaded": false, "error": {"kind": "session_not_found", "message": "session 'missing' not found in /path", "directory": "/path", "retryable": false}} Corrupted file: {"session_id": "broken", "loaded": false, "error": {"kind": "session_load_failed", "message": "...", "directory": "/path", "retryable": true}} Exit code contract: - 0 on successful load - 1 on not-found (preserves existing $?) - 1 on OSError/JSONDecodeError (distinct 'kind' in JSON) Backward compat: legacy 'claw load-session ID' text output unchanged byte-for-byte. Only new behaviour is the flags and structured error path. Tests (tests/test_load_session_cli.py, 13 tests): - TestDirectoryFlagParity (2): --directory works + fallback to CWD/.port_sessions - TestOutputFormatFlagParity (2): json schema + text-mode backward compat - TestNotFoundTypedError (2): JSON envelope on not-found; no traceback in either mode; no internal class name leak - TestLoadFailedDistinctFromNotFound (1): corrupted file = session_load_failed with retryable=true, distinct from session_not_found - TestTripletParityConsistency (6): parametrised over [list, delete, load] * [--directory, --output-format] — explicit parity guard for future regressions Full suite: 80/80 passing, zero regression. Discovered via Jobdori dogfood sweep 2026-04-22 17:44 KST — ran 'claw load-session nonexistent' expecting a clean error, got a Python traceback. Filed #165 + fixed in same commit. Closes ROADMAP #165. |
||
|
|
79a9f0e6f6 |
fix: #163 — remove [turn N] suffix pollution from run_turn_loop; file #164 timeout-cancellation followup
#163: run_turn_loop no longer injects f'{prompt} [turn N]' into follow-up prompts. The suffix was never defined or interpreted anywhere — not by the engine, not by the system prompt, not by any LLM. It looked like a real user-typed annotation in the transcript and made replay/analysis fragile. New behaviour: - turn 0 submits the original prompt (unchanged) - turn > 0 submits caller-supplied continuation_prompt if provided, else the loop stops cleanly — no fabricated user turn - added continuation_prompt: str | None = None parameter to run_turn_loop - added --continuation-prompt CLI flag for claws scripting multi-turn loops - zero '[turn' strings ever appear in mutable_messages or stdout now Behaviour change for existing callers: - Before: run_turn_loop(prompt, max_turns=3) submitted 3 turns ('prompt', 'prompt [turn 2]', 'prompt [turn 3]') - After: run_turn_loop(prompt, max_turns=3) submits 1 turn ('prompt') - To preserve old multi-turn behaviour, pass continuation_prompt='Continue.' or any structured follow-up text One existing timeout test (test_budget_is_cumulative_across_turns) updated to pass continuation_prompt so the cumulative-budget contract is actually exercised across turns instead of trivially satisfied by a one-turn loop. #164 filed: addresses reviewer feedback on #161. The wall-clock timeout bounds the caller-facing wait, but the underlying submit_message worker thread keeps running and can mutate engine state after the timeout TurnResult is returned. A cooperative cancel_event pattern is sketched in the pinpoint; real asyncio.Task.cancel() support will come once provider IO is async-native (larger refactor). Tests (tests/test_run_turn_loop_continuation.py, 8 tests): - TestNoTurnSuffixInjection (2): zero '[turn' strings in any submitted prompt, both default and explicit-continuation paths - TestContinuationDefaultStopsAfterTurnZero (2): default loops run exactly one turn; engine.submit_message called exactly once despite max_turns=10 - TestExplicitContinuationBehaviour (2): turn 0 = original, turn N = continuation verbatim; max_turns still respected - TestCLIContinuationFlag (2): CLI default emits only '## Turn 1'; --continuation-prompt wires through to multi-turn behaviour Full suite: 67/67 passing. Closes ROADMAP #163. Files #164. |
||
|
|
4813a2b351 |
fix: #162 — budget-overflow no longer corrupts session state in submit_message
Previously, QueryEnginePort.submit_message() checked the token budget AFTER
appending the prompt to mutable_messages, transcript_store, and permission_denials,
and AFTER calling compact_messages_if_needed(). On overflow it set
stop_reason='max_budget_reached' but the overflow turn was already committed.
Any caller that persisted the session afterwards wrote the rejected prompt to
disk — the session was silently poisoned even though the TurnResult said the
turn never completed.
Fix:
- Restructure submit_message so the budget check early-returns BEFORE any
mutation of mutable_messages, transcript_store, permission_denials, or
total_usage.
- The returned TurnResult.usage reflects pre-call state (overflow never
advanced the usage counter).
- Normal (in-budget) path unchanged: mutation happens exactly once, at the
end, only on 'completed' results.
This closes the atomicity gap: submit_message is now either 'turn committed'
(stop_reason='completed') or 'turn rejected, state untouched'
(stop_reason in {'max_budget_reached', 'max_turns_reached'}). Callers can
safely retry with a fresh budget or a smaller prompt without worrying about
phantom committed turns from prior rejections.
Tests (tests/test_submit_message_budget.py, 10 tests):
- TestBudgetOverflowDoesNotMutate (5): mutable_messages / transcript /
permission_denials / total_usage / TurnResult.usage all pre-mutation after overflow
- TestOverflowPersistence (2): first-turn overflow persists empty session;
successful-turn-then-overflow persists only the successful turn
- TestEngineUsableAfterOverflow (2): subsequent in-budget call still works
with no residue; repeated overflows don't accumulate hidden state
- TestNormalPathStillCommits (1): regression guard — non-overflow path still
commits mutable_messages/transcript/usage as expected
Full suite: 59/59 passing, zero regression.
Blocker: none. Closes ROADMAP #162.
|
||
|
|
3f4d46d7b4 |
fix: #161 — wall-clock timeout for run_turn_loop; stalled turns now abort with stop_reason='timeout'
Previously, run_turn_loop was bounded only by max_turns (turn count). If engine.submit_message stalled — slow provider, hung network, infinite stream — the loop blocked indefinitely with no cancellation path. Claws calling run_turn_loop in CI or orchestration had no reliable way to enforce a deadline; the loop would hang until OS kill or human intervention. Fix: - Add timeout_seconds parameter to run_turn_loop (default None = legacy unbounded). - When set, each submit_message call runs inside a ThreadPoolExecutor and is bounded by the remaining wall-clock budget (total across all turns, not per-turn). - On timeout, synthesize a TurnResult with stop_reason='timeout' carrying the turn's prompt and routed matches so transcripts preserve orchestration context. - Exhausted/negative budget short-circuits before calling submit_message. - Legacy path (timeout_seconds=None) bypasses the executor entirely — zero overhead for callers that don't opt in. CLI: - Added --timeout-seconds flag to 'turn-loop' command. - Exit code 2 when the loop terminated on timeout (vs 0 for completed), so shell scripts can distinguish 'done' from 'budget exhausted'. Tests (tests/test_run_turn_loop_timeout.py, 6 tests): - Legacy unbounded path unchanged (timeout_seconds=None never emits 'timeout') - Hung submit_message aborted within budget (0.3s budget, 5s mock hang → exit <1.5s) - Budget is cumulative across turns (0.6s budget, 0.4s per turn, not per-turn) - timeout_seconds=0 short-circuits first turn without calling submit_message - Negative timeout treated as exhausted (guard against caller bugs) - Timeout TurnResult carries correct prompt, matches, UsageSummary shape Full suite: 49/49 passing, zero regression. Blocker: none. Closes ROADMAP #161. |
||
|
|
6a76cc7c08 |
feat(#160): wire claw list-sessions and delete-session CLI commands
Closes the last #160 gap: claws can now manage session lifecycle entirely through the CLI without filesystem hacks. New commands: - claw list-sessions [--directory DIR] [--output-format text|json] Enumerates stored session IDs. JSON mode emits {sessions, count}. Missing/empty directories return empty list (exit 0), not an error. - claw delete-session SESSION_ID [--directory DIR] [--output-format text|json] Idempotent: not-found is exit 0 with status='not_found' (no raise). Partial-failure: exit 1 with typed JSON error envelope: {session_id, deleted: false, error: {kind, message, retryable}} The 'session_delete_failed' kind is retryable=true so orchestrators know to retry vs escalate. Public API surface extended in src/__init__.py: - list_sessions, session_exists, delete_session - SessionNotFoundError, SessionDeleteError Tests added (tests/test_porting_workspace.py): - test_list_sessions_cli_runs: text + json modes against tempdir - test_delete_session_cli_idempotent: first call deleted=true, second call deleted=false (exit 0, status=not_found) - test_delete_session_cli_partial_failure_exit_1: permission error surfaces as exit 1 + typed JSON error with retryable=true All 43 tests pass. The session storage abstraction chapter is closed: - storage layer decoupled from claw code (#160 initial impl) - delete contract hardened + caller-audited (#160 hardening pass) - CLI wired with idempotency preserved at exit-code boundary (this commit) |
||
|
|
527c0f971c |
fix(#160): harden delete_session contract — idempotency, race-safety, typed partial-failure
Addresses review feedback on initial #160 implementation: 1. delete_session() contract now explicit: - Idempotent: delete(x); delete(x) is safe, second call returns False - Race-safe: TOCTOU between exists()/unlink() eliminated via unlink-then-catch - Partial-failure typed: permission/IO errors wrapped in SessionDeleteError (OSError subclass) so callers can distinguish 'not found' (return False) from 'could not delete' (raise) 2. New SessionDeleteError class for partial-failure surfacing. Distinct from SessionNotFoundError (KeyError subclass for missing loads). 3. Caller audit confirmed: no code outside session_store globs .port_sessions or imports DEFAULT_SESSION_DIR. Storage layout is fully encapsulated. 4. Added tests/test_session_store.py — 18 tests covering: - list_sessions: empty/missing/sorted/non-json filter - session_exists: true/false/missing-dir - load_session: SessionNotFoundError typing (KeyError subclass, not FileNotFoundError) - delete_session idempotency: first/second/never-existed calls - delete_session partial-failure: SessionDeleteError wraps OSError - delete_session race-safety: concurrent deletion returns False, not raise - Full save->list->exists->load->delete roundtrip All 18 tests pass. Merge-ready: contract documented, caller-audited, race-safe. |
||
|
|
01bf54ad15 | Rewriting Project Claw Code - Python port with Rust on the way | ||
|
|
507c2460b9 |
Make the repository's primary source tree genuinely Python
The old tracked TypeScript snapshot has been removed from the repository history and the root directory is now a Python porting workspace. README and tests now describe and verify the Python-first layout instead of treating the exposed snapshot as the active source tree. A local archive can still exist outside Git, but the tracked repository now presents only the Python porting surface, related essay context, and OmX workflow artifacts. Constraint: Tracked history should collapse to a single commit while excluding the archived snapshot from Git Rejected: Keep the exposed TypeScript tree in tracked history under an archive path | user explicitly wanted only the Python porting repo state in Git Confidence: medium Scope-risk: broad Reversibility: messy Directive: Keep future tracked additions focused on the Python port itself; do not reintroduce the exposed snapshot into Git history Tested: python3 -m unittest discover -s tests -v; python3 -m src.main summary; git diff --check Not-tested: Behavioral parity with the original TypeScript system beyond the current Python workspace surface |