From 6db68a2baa85dc7853d7220a75e778eecfd14b2b Mon Sep 17 00:00:00 2001 From: Yeachan-Heo Date: Mon, 27 Apr 2026 09:28:09 +0000 Subject: [PATCH] Expose tool permission gates as structured worker blockers Worker boot could previously stall on an interactive MCP/tool permission prompt while readiness and startup-timeout surfaces only had generic idle/no-evidence shapes. This adds a first-class blocked lifecycle state, structured event payload, startup evidence fields, and regression coverage so callers can report the exact server/tool gate instead of pane-scraping. Constraint: ROADMAP #200 requires tool/server identity, prompt age, and session-only versus always-allow capability in status/evidence surfaces Rejected: Treat MCP/tool prompts as trust gates | conflates distinct prompts and loses tool identity Rejected: Leave allow-scope as pane text only | clawhip still could not classify the blocker without scraping Confidence: high Scope-risk: moderate Directive: Keep tool_permission_required distinct from trust_required; downstream claws rely on server/tool payload plus allow-scope metadata Tested: cargo test -p runtime tool_permission Tested: cargo fmt -p runtime -- --check && cargo clippy -p runtime --all-targets -- -D warnings && cargo test -p runtime Tested: cargo test --workspace Not-tested: live interactive MCP permission prompt in tmux --- ROADMAP.md | 3 +- progress.txt | 12 + rust/crates/runtime/src/recovery_recipes.rs | 4 +- rust/crates/runtime/src/worker_boot.rs | 320 +++++++++++++++++++- 4 files changed, 336 insertions(+), 3 deletions(-) diff --git a/ROADMAP.md b/ROADMAP.md index 0f8a4c3..688fe16 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -6158,7 +6158,8 @@ load_session('nonexistent') # raises FileNotFoundError with no structured error **Blocker.** None. **Source.** Jobdori dogfood sweep 2026-04-22 08:46 KST — inspected `src/session_store.py` public API, confirmed only `save_session` + `load_session` present, no list/delete/exists surface. -200. **Interactive MCP/tool permission prompts are invisible blockers** — dogfooded 2026-04-18 from `clawcode-human`. The session emitted `SessionStart hook (completed)` and `UserPromptSubmit hook (completed)`, then stalled on an interactive MCP permission gate (`Allow the omx_memory MCP server to run tool "project_memory_read"?`). From the outside this looks like a ready-but-quiet lane even though the real state is `blocked waiting for permission`. **Required fix shape:** (a) detect interactive MCP/tool permission prompts as a first-class blocked state instead of generic idle; (b) emit a typed event such as `blocked.mcp_permission` / `blocked.tool_permission` with tool/server name, prompt age, and whether the gate is session-only vs always-allow capable; (c) include this gate in startup/no-evidence evidence bundles and lane status surfaces so clawhip can say "blocked at MCP permission prompt" without pane scraping; (d) add a regression proving a prompt-gated session does not get misclassified as stale/idle/ready. **Why this matters:** prompt acceptance and startup telemetry are still incomplete if an interactive MCP gate can eat the first real action after hooks report success. Source: live dogfood session `clawcode-human` on 2026-04-18. +200. **Interactive MCP/tool permission prompts are invisible blockers** — **done (verified 2026-04-27):** worker boot observation now detects interactive tool permission gates such as `Allow the omx_memory MCP server to run tool "project_memory_read"?` before generic readiness/idle handling, records `tool_permission_required` status, emits a structured `ToolPermissionPrompt` payload with server/tool identity, prompt age, allow-scope capability, and prompt preview, marks readiness snapshots as blocked, and carries `tool_permission_prompt_detected` through startup timeout evidence so the classifier returns `tool_permission_required` instead of a vague stale/idle/ready outcome. Regression coverage locks both the structured prompt-gate event metadata and startup-timeout classification paths. **Original filing below.** +Original filing (2026-04-18): the session emitted `SessionStart hook (completed)` and `UserPromptSubmit hook (completed)`, then stalled on an interactive MCP permission gate (`Allow the omx_memory MCP server to run tool "project_memory_read"?`). From the outside this looks like a ready-but-quiet lane even though the real state is `blocked waiting for permission`. **Required fix shape:** (a) detect interactive MCP/tool permission prompts as a first-class blocked state instead of generic idle; (b) emit a typed event such as `blocked.mcp_permission` / `blocked.tool_permission` with tool/server name, prompt age, and whether the gate is session-only vs always-allow capable; (c) include this gate in startup/no-evidence evidence bundles and lane status surfaces so clawhip can say "blocked at MCP permission prompt" without pane scraping; (d) add a regression proving a prompt-gated session does not get misclassified as stale/idle/ready. **Why this matters:** prompt acceptance and startup telemetry are still incomplete if an interactive MCP gate can eat the first real action after hooks report success. Source: live dogfood session `clawcode-human` on 2026-04-18. 201. **`extract --model-payload` is not inspectable enough for deterministic dogfood: forced mode selection missing, and hybrid/no-snippet cases are opaque** — dogfooded 2026-04-19 from `dogfood-1776184671` against three real-repo files. `node dist/cli/index.js extract --model-payload` succeeded and auto-selected `raw`, `raw`, and `hybrid`, but there is currently no CLI surface to force `raw` / `compressed` / `hybrid` for A/B comparison: `--mode raw` and `--mode compressed` both fail immediately with `Error: Unexpected extract argument: --mode`. That turns payload-shaping validation into guesswork because operators cannot ask the extractor to render the same file through each mode and compare the exact output. The opacity is worse in the observed hybrid case: the Formbricks checkbox file produced a hybrid payload with no snippets, leaving no visible explanation for why the extractor chose hybrid, what evidence it kept vs dropped, or whether the result is correct vs a silent fallback. **Required fix shape:** (a) add an explicit debug/inspection flag that forces extraction mode (`--mode raw|compressed|hybrid` or equivalent) without changing default auto-selection; (b) print/report the chosen mode and the decision reason in a machine-readable field when `--model-payload` is used; (c) when hybrid emits zero snippets, surface an explicit reason/count summary instead of making "no snippets" indistinguishable from silent loss; (d) add regression coverage on at least one real-world hybrid fixture so mode choice and snippet accounting stay stable. **Why this matters:** direct claw-code dogfood needs deterministic payload comparison to debug startup/context quality; without forced-mode inspection and snippet accounting, operators can see the outcome but not the extraction decision that produced it. Source: live dogfood session `dogfood-1776184671` on 2026-04-19. diff --git a/progress.txt b/progress.txt index 8849c95..3914a31 100644 --- a/progress.txt +++ b/progress.txt @@ -74,6 +74,18 @@ US-007 COMPLETE (Phase 5 - Plugin/MCP lifecycle maturity) - DegradedMode behavior - Tests: 11 unit tests passing + +Iteration 2026-04-27 - ROADMAP #200 COMPLETED +------------------------------------------------ +- Selected next actionable backlog item because no active task was in progress. +- ROADMAP #200: Interactive MCP/tool permission prompts are invisible blockers. +- Files: rust/crates/runtime/src/worker_boot.rs, rust/crates/runtime/src/recovery_recipes.rs, ROADMAP.md, progress.txt. +- Added tool_permission_required worker status and event classification for interactive MCP/tool permission gates. +- Added structured ToolPermissionPrompt payload with server/tool identity and prompt preview. +- Startup evidence now records tool_permission_prompt_detected and classifies timeout evidence as tool_permission_required. +- Readiness snapshots now mark tool-permission-gated workers as blocked, not ready/idle. +- Tests: targeted tool_permission regressions, full runtime test/clippy/fmt pending in Ralph verification loop. + VERIFICATION STATUS: ------------------ - cargo build --workspace: PASSED diff --git a/rust/crates/runtime/src/recovery_recipes.rs b/rust/crates/runtime/src/recovery_recipes.rs index a98716a..8189d81 100644 --- a/rust/crates/runtime/src/recovery_recipes.rs +++ b/rust/crates/runtime/src/recovery_recipes.rs @@ -45,7 +45,9 @@ impl FailureScenario { #[must_use] pub fn from_worker_failure_kind(kind: WorkerFailureKind) -> Self { match kind { - WorkerFailureKind::TrustGate => Self::TrustPromptUnresolved, + WorkerFailureKind::TrustGate | WorkerFailureKind::ToolPermissionGate => { + Self::TrustPromptUnresolved + } WorkerFailureKind::PromptDelivery => Self::PromptMisdelivery, WorkerFailureKind::Protocol => Self::McpHandshakeFailure, WorkerFailureKind::Provider | WorkerFailureKind::StartupNoEvidence => { diff --git a/rust/crates/runtime/src/worker_boot.rs b/rust/crates/runtime/src/worker_boot.rs index 9096990..2d9ea07 100644 --- a/rust/crates/runtime/src/worker_boot.rs +++ b/rust/crates/runtime/src/worker_boot.rs @@ -30,6 +30,7 @@ fn now_secs() -> u64 { pub enum WorkerStatus { Spawning, TrustRequired, + ToolPermissionRequired, ReadyForPrompt, Running, Finished, @@ -41,6 +42,7 @@ impl std::fmt::Display for WorkerStatus { match self { Self::Spawning => write!(f, "spawning"), Self::TrustRequired => write!(f, "trust_required"), + Self::ToolPermissionRequired => write!(f, "tool_permission_required"), Self::ReadyForPrompt => write!(f, "ready_for_prompt"), Self::Running => write!(f, "running"), Self::Finished => write!(f, "finished"), @@ -53,6 +55,7 @@ impl std::fmt::Display for WorkerStatus { #[serde(rename_all = "snake_case")] pub enum WorkerFailureKind { TrustGate, + ToolPermissionGate, PromptDelivery, Protocol, Provider, @@ -71,6 +74,7 @@ pub struct WorkerFailure { pub enum WorkerEventKind { Spawning, TrustRequired, + ToolPermissionRequired, TrustResolved, ReadyForPrompt, PromptMisdelivery, @@ -104,6 +108,8 @@ pub enum WorkerPromptTarget { pub enum StartupFailureClassification { /// Trust prompt is required but not detected/resolved TrustRequired, + /// Tool permission prompt is required before startup can continue + ToolPermissionRequired, /// Prompt was delivered to wrong target (shell misdelivery) PromptMisdelivery, /// Prompt was sent but acceptance timed out @@ -130,6 +136,14 @@ pub struct StartupEvidenceBundle { pub prompt_acceptance_state: bool, /// Result of trust prompt detection at timeout pub trust_prompt_detected: bool, + /// Result of tool permission prompt detection at timeout + pub tool_permission_prompt_detected: bool, + /// Age in seconds of the latest tool permission prompt, when observed + #[serde(skip_serializing_if = "Option::is_none")] + pub tool_permission_prompt_age_seconds: Option, + /// Whether the prompt surface exposed only a session allow path or also an always-allow path + #[serde(skip_serializing_if = "Option::is_none")] + pub tool_permission_allow_scope: Option, /// Transport health summary (true = healthy/responsive) pub transport_healthy: bool, /// MCP health summary (true = all servers healthy) @@ -146,6 +160,15 @@ pub enum WorkerEventPayload { #[serde(skip_serializing_if = "Option::is_none")] resolution: Option, }, + ToolPermissionPrompt { + #[serde(skip_serializing_if = "Option::is_none")] + server_name: Option, + #[serde(skip_serializing_if = "Option::is_none")] + tool_name: Option, + prompt_age_seconds: u64, + allow_scope: ToolPermissionAllowScope, + prompt_preview: String, + }, PromptDelivery { prompt_preview: String, observed_target: WorkerPromptTarget, @@ -163,6 +186,14 @@ pub enum WorkerEventPayload { }, } +#[derive(Debug, Clone, Copy, Serialize, Deserialize, PartialEq, Eq)] +#[serde(rename_all = "snake_case")] +pub enum ToolPermissionAllowScope { + SessionOnly, + SessionOrAlways, + Unknown, +} + #[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Eq)] pub struct WorkerTaskReceipt { pub repo: String, @@ -276,6 +307,29 @@ impl WorkerRegistry { .ok_or_else(|| format!("worker not found: {worker_id}"))?; let lowered = screen_text.to_ascii_lowercase(); + if let Some(tool_prompt) = detect_tool_permission_prompt(screen_text, &lowered) { + worker.status = WorkerStatus::ToolPermissionRequired; + worker.last_error = Some(WorkerFailure { + kind: WorkerFailureKind::ToolPermissionGate, + message: tool_prompt.message(), + created_at: now_secs(), + }); + push_event( + worker, + WorkerEventKind::ToolPermissionRequired, + WorkerStatus::ToolPermissionRequired, + Some("tool permission prompt detected".to_string()), + Some(WorkerEventPayload::ToolPermissionPrompt { + server_name: tool_prompt.server_name, + tool_name: tool_prompt.tool_name, + prompt_age_seconds: 0, + allow_scope: tool_prompt.allow_scope, + prompt_preview: tool_prompt.prompt_preview, + }), + ); + return Ok(worker.clone()); + } + if !worker.trust_gate_cleared && detect_trust_prompt(&lowered) { worker.status = WorkerStatus::TrustRequired; worker.last_error = Some(WorkerFailure { @@ -503,7 +557,9 @@ impl WorkerRegistry { ready: worker.status == WorkerStatus::ReadyForPrompt, blocked: matches!( worker.status, - WorkerStatus::TrustRequired | WorkerStatus::Failed + WorkerStatus::TrustRequired + | WorkerStatus::ToolPermissionRequired + | WorkerStatus::Failed ), replay_prompt_ready: worker.replay_prompt.is_some(), last_error: worker.last_error.clone(), @@ -624,6 +680,18 @@ impl WorkerRegistry { let now = now_secs(); let elapsed = now.saturating_sub(worker.created_at); + let latest_tool_permission_event = worker + .events + .iter() + .rev() + .find(|event| event.kind == WorkerEventKind::ToolPermissionRequired); + let tool_permission_allow_scope = + latest_tool_permission_event.and_then(|event| match &event.payload { + Some(WorkerEventPayload::ToolPermissionPrompt { allow_scope, .. }) => { + Some(*allow_scope) + } + _ => None, + }); // Build evidence bundle let evidence = StartupEvidenceBundle { @@ -640,6 +708,13 @@ impl WorkerRegistry { .events .iter() .any(|e| e.kind == WorkerEventKind::TrustRequired), + tool_permission_prompt_detected: worker + .events + .iter() + .any(|e| e.kind == WorkerEventKind::ToolPermissionRequired), + tool_permission_prompt_age_seconds: latest_tool_permission_event + .map(|event| now.saturating_sub(event.timestamp)), + tool_permission_allow_scope, transport_healthy, mcp_healthy, elapsed_seconds: elapsed, @@ -694,6 +769,13 @@ fn classify_startup_failure(evidence: &StartupEvidenceBundle) -> StartupFailureC return StartupFailureClassification::TrustRequired; } + // Check for tool permission prompts that were not resolved + if evidence.tool_permission_prompt_detected + && evidence.last_lifecycle_state == WorkerStatus::ToolPermissionRequired + { + return StartupFailureClassification::ToolPermissionRequired; + } + // Check for prompt acceptance timeout if evidence.prompt_sent_at.is_some() && !evidence.prompt_acceptance_state @@ -815,6 +897,140 @@ fn normalize_path(path: &str) -> PathBuf { std::fs::canonicalize(path).unwrap_or_else(|_| Path::new(path).to_path_buf()) } +#[derive(Debug, Clone, PartialEq, Eq)] +struct ToolPermissionPromptObservation { + server_name: Option, + tool_name: Option, + allow_scope: ToolPermissionAllowScope, + prompt_preview: String, +} + +impl ToolPermissionPromptObservation { + fn message(&self) -> String { + match (&self.server_name, &self.tool_name) { + (Some(server), Some(tool)) => { + format!("worker boot blocked on tool permission prompt for {server}.{tool}") + } + (Some(server), None) => { + format!("worker boot blocked on tool permission prompt for {server}") + } + (None, Some(tool)) => { + format!("worker boot blocked on tool permission prompt for {tool}") + } + (None, None) => "worker boot blocked on tool permission prompt".to_string(), + } + } +} + +fn detect_tool_permission_prompt( + screen_text: &str, + lowered: &str, +) -> Option { + let looks_like_prompt = lowered.contains("allow the") + && lowered.contains("server") + && lowered.contains("tool") + && lowered.contains("run"); + let looks_like_tool_gate = lowered.contains("allow tool") && lowered.contains("run"); + if !looks_like_prompt && !looks_like_tool_gate { + return None; + } + + let prompt_line = screen_text + .lines() + .rev() + .find(|line| { + let lowered_line = line.to_ascii_lowercase(); + lowered_line.contains("allow") + && lowered_line.contains("tool") + && (lowered_line.contains("run") || lowered_line.contains("server")) + }) + .unwrap_or(screen_text) + .trim(); + + let tool_name = extract_quoted_value(prompt_line) + .or_else(|| extract_after(prompt_line, "tool ").map(|token| normalize_tool_token(&token))); + let server_name = extract_between(prompt_line, "the ", " server") + .map(|server| server.trim_end_matches(" MCP").to_string()) + .or_else(|| { + tool_name + .as_deref() + .and_then(extract_server_from_qualified_tool) + }); + + Some(ToolPermissionPromptObservation { + server_name, + tool_name, + allow_scope: detect_tool_permission_allow_scope(lowered), + prompt_preview: prompt_preview(prompt_line), + }) +} + +fn detect_tool_permission_allow_scope(lowered: &str) -> ToolPermissionAllowScope { + let always_allow_capable = [ + "always allow", + "allow always", + "allow this tool always", + "allow for all sessions", + ] + .iter() + .any(|needle| lowered.contains(needle)); + + if always_allow_capable { + return ToolPermissionAllowScope::SessionOrAlways; + } + + let session_allow_capable = [ + "allow once", + "allow for this session", + "allow this session", + "yes, allow", + ] + .iter() + .any(|needle| lowered.contains(needle)); + + if session_allow_capable { + ToolPermissionAllowScope::SessionOnly + } else { + ToolPermissionAllowScope::Unknown + } +} + +fn extract_quoted_value(text: &str) -> Option { + let start = text.find('"')? + 1; + let rest = &text[start..]; + let end = rest.find('"')?; + Some(rest[..end].to_string()) +} + +fn extract_between(text: &str, prefix: &str, suffix: &str) -> Option { + let start = text.find(prefix)? + prefix.len(); + let rest = &text[start..]; + let end = rest.find(suffix)?; + let value = rest[..end].trim(); + (!value.is_empty()).then(|| value.to_string()) +} + +fn extract_after(text: &str, prefix: &str) -> Option { + let start = text.to_ascii_lowercase().find(prefix)? + prefix.len(); + let value = text[start..] + .split_whitespace() + .next()? + .trim_matches(|ch: char| ch == '?' || ch == ':' || ch == '"' || ch == '\''); + (!value.is_empty()).then(|| value.to_string()) +} + +fn normalize_tool_token(token: &str) -> String { + token + .trim_matches(|ch: char| ch == '?' || ch == ':' || ch == '"' || ch == '\'') + .to_string() +} + +fn extract_server_from_qualified_tool(tool: &str) -> Option { + let rest = tool.strip_prefix("mcp__")?; + let (server, _) = rest.split_once("__")?; + (!server.is_empty()).then(|| server.to_string()) +} + fn detect_trust_prompt(lowered: &str) -> bool { [ "do you trust the files in this folder", @@ -1134,6 +1350,96 @@ mod tests { assert!(detect_ready_for_prompt("│ >", "│ >")); } + #[test] + fn tool_permission_prompt_blocks_worker_with_structured_event() { + let registry = WorkerRegistry::new(); + let worker = registry.create("/tmp/repo-mcp", &[], true); + + let blocked = registry + .observe( + &worker.worker_id, + "Allow the omx_memory MCP server to run tool \"project_memory_read\"?\n\ + 1. Yes, allow once\n\ + 2. Always allow this tool", + ) + .expect("tool permission observe should succeed"); + + assert_eq!(blocked.status, WorkerStatus::ToolPermissionRequired); + assert_eq!( + blocked + .last_error + .as_ref() + .expect("tool permission error should exist") + .kind, + WorkerFailureKind::ToolPermissionGate + ); + let event = blocked + .events + .iter() + .find(|event| event.kind == WorkerEventKind::ToolPermissionRequired) + .expect("tool permission event should exist"); + assert_eq!( + event.payload, + Some(WorkerEventPayload::ToolPermissionPrompt { + server_name: Some("omx_memory".to_string()), + tool_name: Some("project_memory_read".to_string()), + prompt_age_seconds: 0, + allow_scope: ToolPermissionAllowScope::SessionOrAlways, + prompt_preview: prompt_preview( + "Allow the omx_memory MCP server to run tool \"project_memory_read\"?", + ), + }) + ); + + let readiness = registry + .await_ready(&worker.worker_id) + .expect("ready snapshot should load"); + assert!(readiness.blocked); + assert!(!readiness.ready); + } + + #[test] + fn startup_timeout_classifies_tool_permission_prompt() { + let registry = WorkerRegistry::new(); + let worker = registry.create("/tmp/repo-mcp-timeout", &[], true); + + registry + .observe( + &worker.worker_id, + "Allow the omx_memory MCP server to run tool \"notepad_read\"?\n\ + 1. Yes, allow once", + ) + .expect("tool permission observe should succeed"); + + let timed_out = registry + .observe_startup_timeout(&worker.worker_id, "claw prompt", true, true) + .expect("startup timeout observe should succeed"); + let event = timed_out + .events + .iter() + .find(|event| event.kind == WorkerEventKind::StartupNoEvidence) + .expect("startup no evidence event should exist"); + + match event.payload.as_ref() { + Some(WorkerEventPayload::StartupNoEvidence { + classification, + evidence, + }) => { + assert_eq!( + *classification, + StartupFailureClassification::ToolPermissionRequired + ); + assert!(evidence.tool_permission_prompt_detected); + assert_eq!( + evidence.tool_permission_allow_scope, + Some(ToolPermissionAllowScope::SessionOnly) + ); + assert!(evidence.tool_permission_prompt_age_seconds.is_some()); + } + _ => panic!("expected StartupNoEvidence payload"), + } + } + #[test] fn prompt_misdelivery_is_detected_and_replay_can_be_rearmed() { let registry = WorkerRegistry::new(); @@ -1634,6 +1940,9 @@ mod tests { prompt_sent_at: Some(1_234_567_890), prompt_acceptance_state: false, trust_prompt_detected: true, + tool_permission_prompt_detected: false, + tool_permission_prompt_age_seconds: None, + tool_permission_allow_scope: None, transport_healthy: true, mcp_healthy: false, elapsed_seconds: 60, @@ -1661,6 +1970,9 @@ mod tests { prompt_sent_at: None, prompt_acceptance_state: false, trust_prompt_detected: false, + tool_permission_prompt_detected: false, + tool_permission_prompt_age_seconds: None, + tool_permission_allow_scope: None, transport_healthy: false, mcp_healthy: true, elapsed_seconds: 30, @@ -1678,6 +1990,9 @@ mod tests { prompt_sent_at: None, prompt_acceptance_state: false, trust_prompt_detected: false, + tool_permission_prompt_detected: false, + tool_permission_prompt_age_seconds: None, + tool_permission_allow_scope: None, transport_healthy: true, mcp_healthy: true, elapsed_seconds: 10, @@ -1697,6 +2012,9 @@ mod tests { prompt_sent_at: None, // No prompt sent yet prompt_acceptance_state: false, trust_prompt_detected: false, + tool_permission_prompt_detected: false, + tool_permission_prompt_age_seconds: None, + tool_permission_allow_scope: None, transport_healthy: true, mcp_healthy: false, // MCP unhealthy but transport healthy suggests crash elapsed_seconds: 45,