claw-code

elias/claw-code

Fork 0

mirror of https://github.com/ultraworkers/claw-code.git synced 2026-04-25 13:44:06 +08:00

Commit Graph

Author	SHA1	Message	Date
YeonGyu-Kim	524edb2b2e	fix: #164 Stage A — cooperative cancellation via cancel_event in submit_message Closes the #161 follow-up gap identified in review: wall-clock timeout bounded caller-facing wait but did not cancel the underlying provider thread, which could silently mutate mutable_messages / transcript_store / permission_denials / total_usage after the caller had already observed stop_reason='timeout'. A ghost turn committed post-deadline would poison any session that got persisted afterwards. Stage A scope (this commit): runtime + engine layer cooperative cancel. Engine layer (src/query_engine.py): - submit_message now accepts cancel_event: threading.Event \| None = None - Two safe checkpoints: 1. Entry (before max_turns / budget projection) — earliest possible return 2. Post-budget (after output synthesis, before mutation) — catches cancel that arrives while output was being computed - Both checkpoints return stop_reason='cancelled' with state UNCHANGED (mutable_messages, transcript_store, permission_denials, total_usage all preserved exactly as on entry) - cancel_event=None preserves legacy behaviour with zero overhead (no checkpoint checks at all) Runtime layer (src/runtime.py): - run_turn_loop creates one cancel_event per invocation when a deadline is in play (and None otherwise, preserving legacy fast path) - Passes the same event to every submit_message call across turns, so a late cancel on turn N-1 affects turn N - On timeout (either pre-call or mid-call), runtime explicitly calls cancel_event.set() before future.cancel() + synthesizing the timeout TurnResult. This upgrades #161's best-effort future.cancel() (which only cancels not-yet-started futures) to cooperative mid-flight cancel. Stop reason taxonomy after Stage A: 'completed' — turn committed, state mutated exactly once 'max_budget_reached' — overflow, state unchanged (#162) 'max_turns_reached' — capacity exceeded, state unchanged 'cancelled' — cancel_event observed, state unchanged (#164 Stage A) 'timeout' — synthesised by runtime, not engine (#161) The 'cancelled' vs 'timeout' split matters: - 'timeout' is the runtime's best-effort signal to the caller: deadline hit - 'cancelled' is the engine's confirmation: cancel was observed + honoured If the provider call wedges entirely (never reaches a checkpoint), the caller still sees 'timeout' and the thread is leaked — but any NEXT submit_message call on the same engine observes the event at entry and returns 'cancelled' immediately, preventing ghost-turn accumulation. This is the honest cooperative limit in Python threading land; true preemption requires async-native provider IO (future work, not Stage A). Tests (29 new tests, tests/test_submit_message_cancellation.py + tests/ test_run_turn_loop_cancellation.py): Engine-layer (12 tests): - TestCancellationBeforeCall (5): pre-set event returns 'cancelled' immediately; mutable_messages, transcript_store, usage, permission_denials all preserved - TestCancellationAfterBudgetCheck (1): cancel set mid-call (after projection, before commit) still honoured; output synthesised but state untouched - TestCancellationAfterCommit (2): post-commit cancel not observable (honest limit) BUT next call on same engine observes it + returns 'cancelled' - TestLegacyCallersUnchanged (3): cancel_event=None preserves #162 atomicity + max_turns contract with zero behaviour change - TestCancellationVsOtherStopReasons (2): cancel precedes max_turns check; cancel does not retroactively override a completed turn Runtime-layer (5 tests): - TestTimeoutPropagatesCancelEvent (3): submit_message receives a real Event object when deadline is set; None in legacy mode; timeout actually calls event.set() so in-flight threads observe at their next checkpoint - TestCancelEventSharedAcrossTurns (1): same event object passed to every turn (object identity check) — late cancel on turn N-1 must affect turn N Regression: 3 existing timeout test mocks updated to accept cancel_event kwarg (mocks that previously had signature (prompt, commands, tools, denials) now have (prompt, commands, tools, denials, cancel_event=None) since runtime passes cancel_event positionally on the timeout path). Full suite: 97 → 114 passing, zero regression. Closes ROADMAP #164 Stage A. What's explicitly NOT in Stage A: - Preemptive cancellation of wedged provider IO (requires asyncio-native provider path; larger refactor) - Timeout on the legacy unbounded run_turn_loop path (by design: legacy callers opt out of cancellation entirely) - CLI exposure of 'cancelled' as a distinct exit code (currently 'cancelled' maps to the same stop_reason != 'completed' break condition as others; CLI surface for cancel is a separate pinpoint if warranted)	2026-04-22 18:14:14 +09:00
YeonGyu-Kim	79a9f0e6f6	fix: #163 — remove [turn N] suffix pollution from run_turn_loop; file #164 timeout-cancellation followup #163: run_turn_loop no longer injects f'{prompt} [turn N]' into follow-up prompts. The suffix was never defined or interpreted anywhere — not by the engine, not by the system prompt, not by any LLM. It looked like a real user-typed annotation in the transcript and made replay/analysis fragile. New behaviour: - turn 0 submits the original prompt (unchanged) - turn > 0 submits caller-supplied continuation_prompt if provided, else the loop stops cleanly — no fabricated user turn - added continuation_prompt: str \| None = None parameter to run_turn_loop - added --continuation-prompt CLI flag for claws scripting multi-turn loops - zero '[turn' strings ever appear in mutable_messages or stdout now Behaviour change for existing callers: - Before: run_turn_loop(prompt, max_turns=3) submitted 3 turns ('prompt', 'prompt [turn 2]', 'prompt [turn 3]') - After: run_turn_loop(prompt, max_turns=3) submits 1 turn ('prompt') - To preserve old multi-turn behaviour, pass continuation_prompt='Continue.' or any structured follow-up text One existing timeout test (test_budget_is_cumulative_across_turns) updated to pass continuation_prompt so the cumulative-budget contract is actually exercised across turns instead of trivially satisfied by a one-turn loop. #164 filed: addresses reviewer feedback on #161. The wall-clock timeout bounds the caller-facing wait, but the underlying submit_message worker thread keeps running and can mutate engine state after the timeout TurnResult is returned. A cooperative cancel_event pattern is sketched in the pinpoint; real asyncio.Task.cancel() support will come once provider IO is async-native (larger refactor). Tests (tests/test_run_turn_loop_continuation.py, 8 tests): - TestNoTurnSuffixInjection (2): zero '[turn' strings in any submitted prompt, both default and explicit-continuation paths - TestContinuationDefaultStopsAfterTurnZero (2): default loops run exactly one turn; engine.submit_message called exactly once despite max_turns=10 - TestExplicitContinuationBehaviour (2): turn 0 = original, turn N = continuation verbatim; max_turns still respected - TestCLIContinuationFlag (2): CLI default emits only '## Turn 1'; --continuation-prompt wires through to multi-turn behaviour Full suite: 67/67 passing. Closes ROADMAP #163. Files #164.	2026-04-22 17:37:22 +09:00
YeonGyu-Kim	3f4d46d7b4	fix: #161 — wall-clock timeout for run_turn_loop; stalled turns now abort with stop_reason='timeout' Previously, run_turn_loop was bounded only by max_turns (turn count). If engine.submit_message stalled — slow provider, hung network, infinite stream — the loop blocked indefinitely with no cancellation path. Claws calling run_turn_loop in CI or orchestration had no reliable way to enforce a deadline; the loop would hang until OS kill or human intervention. Fix: - Add timeout_seconds parameter to run_turn_loop (default None = legacy unbounded). - When set, each submit_message call runs inside a ThreadPoolExecutor and is bounded by the remaining wall-clock budget (total across all turns, not per-turn). - On timeout, synthesize a TurnResult with stop_reason='timeout' carrying the turn's prompt and routed matches so transcripts preserve orchestration context. - Exhausted/negative budget short-circuits before calling submit_message. - Legacy path (timeout_seconds=None) bypasses the executor entirely — zero overhead for callers that don't opt in. CLI: - Added --timeout-seconds flag to 'turn-loop' command. - Exit code 2 when the loop terminated on timeout (vs 0 for completed), so shell scripts can distinguish 'done' from 'budget exhausted'. Tests (tests/test_run_turn_loop_timeout.py, 6 tests): - Legacy unbounded path unchanged (timeout_seconds=None never emits 'timeout') - Hung submit_message aborted within budget (0.3s budget, 5s mock hang → exit <1.5s) - Budget is cumulative across turns (0.6s budget, 0.4s per turn, not per-turn) - timeout_seconds=0 short-circuits first turn without calling submit_message - Negative timeout treated as exhausted (guard against caller bugs) - Timeout TurnResult carries correct prompt, matches, UsageSummary shape Full suite: 49/49 passing, zero regression. Blocker: none. Closes ROADMAP #161.	2026-04-22 17:23:43 +09:00

Author

SHA1

Message

Date

YeonGyu-Kim

524edb2b2e

fix: #164 Stage A — cooperative cancellation via cancel_event in submit_message

Closes the #161 follow-up gap identified in review: wall-clock timeout
bounded caller-facing wait but did not cancel the underlying provider
thread, which could silently mutate mutable_messages / transcript_store /
permission_denials / total_usage after the caller had already observed
stop_reason='timeout'. A ghost turn committed post-deadline would poison
any session that got persisted afterwards.

Stage A scope (this commit): runtime + engine layer cooperative cancel.

Engine layer (src/query_engine.py):
- submit_message now accepts cancel_event: threading.Event | None = None
- Two safe checkpoints:
  1. Entry (before max_turns / budget projection) — earliest possible return
  2. Post-budget (after output synthesis, before mutation) — catches cancel
     that arrives while output was being computed
- Both checkpoints return stop_reason='cancelled' with state UNCHANGED
  (mutable_messages, transcript_store, permission_denials, total_usage
  all preserved exactly as on entry)
- cancel_event=None preserves legacy behaviour with zero overhead (no
  checkpoint checks at all)

Runtime layer (src/runtime.py):
- run_turn_loop creates one cancel_event per invocation when a deadline
  is in play (and None otherwise, preserving legacy fast path)
- Passes the same event to every submit_message call across turns, so a
  late cancel on turn N-1 affects turn N
- On timeout (either pre-call or mid-call), runtime explicitly calls
  cancel_event.set() before future.cancel() + synthesizing the timeout
  TurnResult. This upgrades #161's best-effort future.cancel() (which
  only cancels not-yet-started futures) to cooperative mid-flight cancel.

Stop reason taxonomy after Stage A:
  'completed'           — turn committed, state mutated exactly once
  'max_budget_reached'  — overflow, state unchanged (#162)
  'max_turns_reached'   — capacity exceeded, state unchanged
  'cancelled'           — cancel_event observed, state unchanged (#164 Stage A)
  'timeout'             — synthesised by runtime, not engine (#161)

The 'cancelled' vs 'timeout' split matters:
- 'timeout' is the runtime's best-effort signal to the caller: deadline hit
- 'cancelled' is the engine's confirmation: cancel was observed + honoured

If the provider call wedges entirely (never reaches a checkpoint), the
caller still sees 'timeout' and the thread is leaked — but any NEXT
submit_message call on the same engine observes the event at entry and
returns 'cancelled' immediately, preventing ghost-turn accumulation.
This is the honest cooperative limit in Python threading land; true
preemption requires async-native provider IO (future work, not Stage A).

Tests (29 new tests, tests/test_submit_message_cancellation.py + tests/
test_run_turn_loop_cancellation.py):

Engine-layer (12 tests):
- TestCancellationBeforeCall (5): pre-set event returns 'cancelled' immediately;
  mutable_messages, transcript_store, usage, permission_denials all preserved
- TestCancellationAfterBudgetCheck (1): cancel set mid-call (after projection,
  before commit) still honoured; output synthesised but state untouched
- TestCancellationAfterCommit (2): post-commit cancel not observable (honest
  limit) BUT next call on same engine observes it + returns 'cancelled'
- TestLegacyCallersUnchanged (3): cancel_event=None preserves #162 atomicity
  + max_turns contract with zero behaviour change
- TestCancellationVsOtherStopReasons (2): cancel precedes max_turns check;
  cancel does not retroactively override a completed turn

Runtime-layer (5 tests):
- TestTimeoutPropagatesCancelEvent (3): submit_message receives a real Event
  object when deadline is set; None in legacy mode; timeout actually calls
  event.set() so in-flight threads observe at their next checkpoint
- TestCancelEventSharedAcrossTurns (1): same event object passed to every
  turn (object identity check) — late cancel on turn N-1 must affect turn N

Regression: 3 existing timeout test mocks updated to accept cancel_event
kwarg (mocks that previously had signature (prompt, commands, tools, denials)
now have (prompt, commands, tools, denials, cancel_event=None) since runtime
passes cancel_event positionally on the timeout path).

Full suite: 97 → 114 passing, zero regression.

Closes ROADMAP #164 Stage A.

What's explicitly NOT in Stage A:
- Preemptive cancellation of wedged provider IO (requires asyncio-native
  provider path; larger refactor)
- Timeout on the legacy unbounded run_turn_loop path (by design: legacy
  callers opt out of cancellation entirely)
- CLI exposure of 'cancelled' as a distinct exit code (currently 'cancelled'
  maps to the same stop_reason != 'completed' break condition as others;
  CLI surface for cancel is a separate pinpoint if warranted)

2026-04-22 18:14:14 +09:00

YeonGyu-Kim

79a9f0e6f6

fix: #163 — remove [turn N] suffix pollution from run_turn_loop; file #164 timeout-cancellation followup

#163: run_turn_loop no longer injects f'{prompt} [turn N]' into follow-up
prompts. The suffix was never defined or interpreted anywhere — not by the
engine, not by the system prompt, not by any LLM. It looked like a real
user-typed annotation in the transcript and made replay/analysis fragile.

New behaviour:
- turn 0 submits the original prompt (unchanged)
- turn > 0 submits caller-supplied continuation_prompt if provided, else
  the loop stops cleanly — no fabricated user turn
- added continuation_prompt: str | None = None parameter to run_turn_loop
- added --continuation-prompt CLI flag for claws scripting multi-turn loops
- zero '[turn' strings ever appear in mutable_messages or stdout now

Behaviour change for existing callers:
- Before: run_turn_loop(prompt, max_turns=3) submitted 3 turns
  ('prompt', 'prompt [turn 2]', 'prompt [turn 3]')
- After:  run_turn_loop(prompt, max_turns=3) submits 1 turn ('prompt')
- To preserve old multi-turn behaviour, pass continuation_prompt='Continue.'
  or any structured follow-up text

One existing timeout test (test_budget_is_cumulative_across_turns) updated
to pass continuation_prompt so the cumulative-budget contract is actually
exercised across turns instead of trivially satisfied by a one-turn loop.

#164 filed: addresses reviewer feedback on #161. The wall-clock timeout
bounds the caller-facing wait, but the underlying submit_message worker
thread keeps running and can mutate engine state after the timeout
TurnResult is returned. A cooperative cancel_event pattern is sketched in
the pinpoint; real asyncio.Task.cancel() support will come once provider
IO is async-native (larger refactor).

Tests (tests/test_run_turn_loop_continuation.py, 8 tests):
- TestNoTurnSuffixInjection (2): zero '[turn' strings in any submitted
  prompt, both default and explicit-continuation paths
- TestContinuationDefaultStopsAfterTurnZero (2): default loops run exactly
  one turn; engine.submit_message called exactly once despite max_turns=10
- TestExplicitContinuationBehaviour (2): turn 0 = original, turn N = continuation
  verbatim; max_turns still respected
- TestCLIContinuationFlag (2): CLI default emits only '## Turn 1';
  --continuation-prompt wires through to multi-turn behaviour

Full suite: 67/67 passing.

Closes ROADMAP #163. Files #164.

2026-04-22 17:37:22 +09:00

YeonGyu-Kim

3f4d46d7b4

fix: #161 — wall-clock timeout for run_turn_loop; stalled turns now abort with stop_reason='timeout'

Previously, run_turn_loop was bounded only by max_turns (turn count). If
engine.submit_message stalled — slow provider, hung network, infinite
stream — the loop blocked indefinitely with no cancellation path. Claws
calling run_turn_loop in CI or orchestration had no reliable way to
enforce a deadline; the loop would hang until OS kill or human intervention.

Fix:
- Add timeout_seconds parameter to run_turn_loop (default None = legacy unbounded).
- When set, each submit_message call runs inside a ThreadPoolExecutor and is
  bounded by the remaining wall-clock budget (total across all turns, not per-turn).
- On timeout, synthesize a TurnResult with stop_reason='timeout' carrying the
  turn's prompt and routed matches so transcripts preserve orchestration context.
- Exhausted/negative budget short-circuits before calling submit_message.
- Legacy path (timeout_seconds=None) bypasses the executor entirely — zero
  overhead for callers that don't opt in.

CLI:
- Added --timeout-seconds flag to 'turn-loop' command.
- Exit code 2 when the loop terminated on timeout (vs 0 for completed),
  so shell scripts can distinguish 'done' from 'budget exhausted'.

Tests (tests/test_run_turn_loop_timeout.py, 6 tests):
- Legacy unbounded path unchanged (timeout_seconds=None never emits 'timeout')
- Hung submit_message aborted within budget (0.3s budget, 5s mock hang → exit <1.5s)
- Budget is cumulative across turns (0.6s budget, 0.4s per turn, not per-turn)
- timeout_seconds=0 short-circuits first turn without calling submit_message
- Negative timeout treated as exhausted (guard against caller bugs)
- Timeout TurnResult carries correct prompt, matches, UsageSummary shape

Full suite: 49/49 passing, zero regression.

Blocker: none. Closes ROADMAP #161.

2026-04-22 17:23:43 +09:00

3 Commits