mirror of
https://github.com/ultraworkers/claw-code.git
synced 2026-04-27 15:10:54 +08:00
roadmap: #287 filed
This commit is contained in:
parent
79eeaaeaf6
commit
9b06c98bd6
12
ROADMAP.md
12
ROADMAP.md
@ -17239,3 +17239,15 @@ Gap. Agent parallelism has a fire-and-forget in-process thread model but reports
|
|||||||
Required fix shape: (a) persist a durable agent job record with `agent_id`, owner process id/start time, heartbeat timestamp, and phase before spawning; (b) either retain/track `JoinHandle`s in a supervisor or move execution to a durable worker queue; (c) update heartbeat during long `run_turn` execution; (d) on startup/tool access, scan manifests stuck in `running` beyond a lease and classify them as `orphaned_worker` / `needs_recovery` instead of `working`; (e) expose stale/orphaned lane state in Agent/Team status and lane events; (f) regression-test crash-after-manifest-before-terminal-state by creating a running manifest with stale heartbeat and verifying the reaper emits a typed blocker. Acceptance: a parallel Agent lane cannot remain silently `running` forever after its executor disappears.
|
Required fix shape: (a) persist a durable agent job record with `agent_id`, owner process id/start time, heartbeat timestamp, and phase before spawning; (b) either retain/track `JoinHandle`s in a supervisor or move execution to a durable worker queue; (c) update heartbeat during long `run_turn` execution; (d) on startup/tool access, scan manifests stuck in `running` beyond a lease and classify them as `orphaned_worker` / `needs_recovery` instead of `working`; (e) expose stale/orphaned lane state in Agent/Team status and lane events; (f) regression-test crash-after-manifest-before-terminal-state by creating a running manifest with stale heartbeat and verifying the reaper emits a typed blocker. Acceptance: a parallel Agent lane cannot remain silently `running` forever after its executor disappears.
|
||||||
|
|
||||||
**Status:** Open. No source code changed. Filed 2026-04-26 18:33 KST. Branch: feat/jobdori-168c-emission-routing. HEAD: `639e1e3` before filing. Cluster delta: parallel-agent-lifecycle-durability +1; concrete user-signal source: Sigrid request to dogfood parallel/async execution mistakes. Concrete delta this cycle: ROADMAP-only pinpoint appended from Agent spawn/lifecycle audit.
|
**Status:** Open. No source code changed. Filed 2026-04-26 18:33 KST. Branch: feat/jobdori-168c-emission-routing. HEAD: `639e1e3` before filing. Cluster delta: parallel-agent-lifecycle-durability +1; concrete user-signal source: Sigrid request to dogfood parallel/async execution mistakes. Concrete delta this cycle: ROADMAP-only pinpoint appended from Agent spawn/lifecycle audit.
|
||||||
|
|
||||||
|
## Pinpoint #287 — Auto-compaction is reactive-after-success instead of preflight-before-request, so oversized resumed sessions can hit context-window failure and “session broke / auto-compact did not work” before compaction ever runs
|
||||||
|
|
||||||
|
Dogfooded 2026-04-26 18:38 KST after Sigrid reported frequent session breakage where sessions are not maintained and auto-compaction does not appear to work. Static audit of `rust/crates/runtime/src/conversation.rs` shows `run_turn` calls `maybe_auto_compact()` only after the assistant/tool loop completes successfully and after provider usage has been recorded. `maybe_auto_compact` checks `self.usage_tracker.cumulative_usage().input_tokens` against `auto_compaction_input_tokens_threshold`; that usage is reconstructed from prior assistant message usage and updated from successful provider events, not from a preflight estimate of the prompt/session that is about to be sent. If the next request is already too large and the provider returns `context_window_blocked` before a successful usage event, `maybe_auto_compact` is never reached. CLI error formatting then tells the user to run `/compact` manually, which is exactly the visible failure mode: session continuity breaks first, auto-compact never fires.
|
||||||
|
|
||||||
|
Concrete failure mode: a long/resumed session grows near or beyond model context. The next turn is sent without preflight compaction because current auto-compaction is only post-turn. The provider rejects the request for context window size, `run_turn` returns `Err`, the runtime shuts down plugins, and no compaction is persisted. The user sees a broken session/context-window error and must manually recover with `/compact`, despite auto-compaction being advertised as protecting long sessions.
|
||||||
|
|
||||||
|
Gap. Auto-compaction lacks a pre-request guard based on `estimate_session_tokens(&session) + estimated_new_prompt_tokens + requested_output_tokens` and lacks a retry path that compacts and resends after a typed context-window failure. This is distinct from #283 (threshold config is env-only): #287 is the timing/trigger semantics that make auto-compaction fail in the exact oversized-session case users expect it to handle. It also intersects with session-maintenance complaints because failed turns do not persist a compacted recovery state.
|
||||||
|
|
||||||
|
Required fix shape: (a) add a preflight auto-compact phase before provider dispatch using estimated session/request size and model context metadata; (b) include the threshold, estimated session tokens, estimated request tokens, and context window in a typed `auto_compaction_preflight` event/status surface; (c) after `context_window_blocked`, optionally run a safe compact-and-retry once, with an explicit receipt; (d) persist the compacted session before retry so session continuity is recoverable even if the retry fails; (e) surface whether compaction was skipped because the session was below threshold, no messages were removable, or compaction would not fit; (f) add regression coverage where a resumed oversized session compacts before request and does not hit provider context-window rejection first. Acceptance: an oversized maintained session gets compacted or fails with a typed “not compactable” reason before provider context-window failure, never with silent “auto-compact did not run.”
|
||||||
|
|
||||||
|
**Status:** Open. No source code changed. Filed 2026-04-26 18:39 KST. Branch: feat/jobdori-168c-emission-routing. HEAD: `79eeaae` before filing. Cluster delta: session-continuity-auto-compaction-semantics +1; concrete user-signal source: Sigrid report of frequent session breakage and auto-compaction not working. Concrete delta this cycle: ROADMAP-only pinpoint appended from auto-compaction trigger audit.
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user