diff --git a/ROADMAP.md b/ROADMAP.md index 030d6d8..786c1d0 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -16323,3 +16323,25 @@ fn pricing_for_model_returns_none_for_video_generation() { **Status:** Open. No code changed. Filed 2026-04-26 04:30 KST. HEAD: 7113193 (post-#228). Branch: feat/jobdori-168c-emission-routing. Sibling-shape cluster: 28 pinpoints. Multimodal-IO cluster: 7 members. Provider-asymmetric-delegation cluster: 6 members. **Persistent-WebSocket-transport cluster: 1 member (founder).** **Non-HTTP-transport cluster: 1 member (founder).** **Bidirectional-symmetric-event-pair cluster: 1 member (founder).** Three new clusters founded in a single pinpoint — the first time a single cycle has founded three concurrent novel clusters. Ten-layer-fusion-shape exceeds #225/#227/#228's nine-layer count and is the largest single-pinpoint fusion catalogued. Distinct from prior cluster members; the ten-layer-fusion-shape-with-persistent-WebSocket-transport-and-bidirectional-symmetric-event-pair is novel and applies to follow-on candidate Real-time-Image-Generation API typed taxonomy (DALL-E live preview, Imagen live preview — same persistent-WebSocket transport with image-modality output) and Real-time-Video-Generation streaming (Veo-Live, Sora-Live — same persistent-WebSocket transport with video-modality output) — the persistent-WebSocket-transport pattern is now a first-class cluster member, a structural prerequisite that every future endpoint family using persistent connections (Realtime API, WebRTC variants, gRPC streaming, Server-Sent Events that need bidirectional fallback) will inherit. 🪨 + +--- + +## Pinpoint #230 — Computer-use API typed taxonomy and host-machine-state-management transport are structurally absent + +**Branch:** feat/jobdori-168c-emission-routing +**Filed:** 2026-04-26 05:00 KST (Jobdori cycle #381) +**Extends:** #168c emission-routing audit / explicit follow-on from #229's persistent-WebSocket-transport founder pinpoint and #225's audio-bidirectional axis — introduces a NOVEL HOST-MACHINE-STATE-MANAGEMENT axis distinct from every prior cluster member, the second cluster member where transport-axis becomes a structural prerequisite of the dispatch layer. + +**Summary:** Zero `computer-use-2025-01-24` and zero `computer-use-2025-11-24` opt-in entries in the active `anthropic-beta` header at `rust/crates/telemetry/src/lib.rs:451-453` (currently sends `claude-code-20250219,prompt-caching-scope-2026-01-05,tools-2026-04-01` only — the canonical computer-use beta header has been GA on Anthropic since 2024-10-22 with Claude 3.5 Sonnet computer-use-2024-10-22, then graduated to `computer-use-2025-01-24` for Claude Sonnet 4.5/Haiku 4.5/Opus 4.1/Sonnet 4/Opus 4/Sonnet 3.7, then a NEW `computer-use-2025-11-24` for Claude Opus 4.7/Opus 4.6/Sonnet 4.6 with zoom-and-pan-and-multi-display enhancements — the first cluster member with TWO concurrently-active beta-version-tiers gating a single capability across the model registry, requiring per-model beta-header-routing logic that no other endpoint family in this audit has needed). Zero `"computer"` / `"bash"` / `"text_editor"` / `"str_replace_editor"` Anthropic-typed-tool-definition discriminator anywhere in `rust/crates/api/src/types.rs` — the canonical Anthropic computer-use tool-definition shape is `{"type": "computer_20250124" | "computer_20251124", "name": "computer", "display_width_px": 1024, "display_height_px": 768, "display_number": 1}` (Anthropic-typed tools are a SECOND-order tool-definition shape distinct from the OpenAI-style `{"type": "function", "function": {...}}` and distinct from the user-defined-tool shape in `ToolDefinition` at `rust/crates/api/src/types.rs:103-110` which has only `name`/`description`/`input_schema` — zero `type` discriminator field, zero `display_width_px` / `display_height_px` / `display_number` typed parameter-fields, zero `bash_20250124` / `text_editor_20250124` / `str_replace_editor` tool-name routing, zero typed-tool-without-input-schema variant since computer-use tools are "anthropic-defined tools" with NO `input_schema` field — the input-schema is implicit in the tool-type discriminator and the API rejects requests that include input_schema for these tool-types). Zero `Image` variant on `ToolResultContentBlock` at `rust/crates/api/src/types.rs:99-102` (2-arm exhaustive: Text/Json — zero Image, zero Base64, zero MediaType, zero ImageSource, zero `{"type": "image", "source": {"type": "base64", "media_type": "image/png", "data": "..."}}` shape that is the canonical screenshot-as-tool-result wire format for computer-use's `screenshot` action — the model takes a screenshot via the `computer` tool, the harness returns the screenshot as a `tool_result` with image-content, and the model uses the screenshot to plan the next action; this CANNOT round-trip through claw-code's current ToolResultContentBlock taxonomy because there is no Image variant on the tool-result side, distinct from #220 which catalogs Image-on-INPUT-side absence — #230 is the first cluster member where the Image-on-TOOL-RESULT-SIDE axis becomes structurally absent, distinct from #220's Image-on-USER-INPUT-SIDE axis because tool-result-image is a feedback-loop signal from the harness BACK to the model after a screenshot action, while user-input-image is an attachment-from-user TO the model — the two are complementary but architecturally distinct surfaces and require separate variants on separate enums (InputContentBlock for user-side, ToolResultContentBlock for harness-side). Zero `screen_capture` / `mouse_move` / `mouse_click` / `mouse_drag` / `mouse_scroll` / `key_press` / `key_combination` / `type_text` / `wait` / `cursor_position` / `triple_click` / `double_click` / `left_click` / `right_click` / `middle_click` / `key` / `hold_key` / `screenshot` / `zoom_in` / `zoom_out` / `pan` action-name in any `tools/lib.rs` tool definition or in any runtime crate (`rg "screen_capture\|mouse_move\|mouse_click\|key_press\|type_text" rust/` returns zero hits across all 26+ tool definitions in `rust/crates/tools/src/lib.rs` — the existing tool registry covers `bash` / `read_file` / `write_file` / `edit_file` / `glob_search` / `grep_search` / `WebFetch` / `WebSearch` / `TodoWrite` / `Skill` / `Agent` / `ToolSearch` / `NotebookEdit` / `Sleep` / `SendUserMessage` / `Config` / `EnterPlanMode` / `ExitPlanMode` / `StructuredOutput` / `REPL` / `PowerShell` / `AskUserQuestion` / `TaskCreate` / `RunTaskPacket` / `TaskGet` / `TaskList` / `TaskStop` / `TaskUpdate` / `TaskOutput` / `WorkerCreate` — every tool is a TEXT-IN-FILE-SYSTEM-OR-PROCESS interaction; ZERO host-machine-pixel-or-input-device interaction primitive). Zero host-OS screen-capture library dependency: zero `screencapture` / `ScreenCaptureKit` / `CGEvent` / `CGWindowList` / `xdotool` / `cliclick` / `enigo` / `rdev` / `mouce` / `inputbot` / `screenshots` (Rust crates) / `xcap` / `display-info` / `core-graphics` / `core-foundation` (Apple framework Rust bindings) / `Quartz` / `AppKit` / `cocoa` / `objc` / `winapi` / `windows-rs` / `x11` / `xcb` / `wayland-client` / `wayland-protocols` / `Image::DynamicImage` (image-encoding) / `png::Encoder` / `jpeg-encoder` / `mozjpeg` / `image::ImageOutputFormat` / `base64::encode` / `base64::Engine` dependency in any of the workspace `Cargo.toml` files (`grep -rn "xdotool\|cliclick\|enigo\|robot\|screenshot\|CGEvent\|AppKit\|win32\|x11\|wayland" rust/` returns zero hits — confirmed; the canonical computer-use harness in anthropics/claude-quickstarts uses Python's `pyautogui` + `Pillow` + `Xlib` + native `screencapture`/`xdotool`/PowerShell-shells, ALL absent from claw-code's transport stack — claw-code has zero outbound capability to capture the host display, dispatch synthetic mouse/keyboard events, query screen dimensions, enumerate windows, or encode captured frames as base64 PNG/JPEG suitable for tool-result content blocks; this is the SECOND non-HTTP transport requirement after #229's WebSocket transport, and the FIRST host-OS-system-call transport requirement in the entire cluster — distinct from every prior cluster member which operated through the network stack only). Zero virtual-display sandbox affordance: zero `Xvfb` / `Xephyr` / `Wayland-headless` / `Docker-headless-X` / `noVNC` / `kasmweb` integration, zero remote-control protocol client (zero `vnc-rs` / `rdp-rs` / `freerdp-bindings`) for the canonical sandboxed-desktop-VM pattern that all production computer-use harnesses use to isolate Claude's mouse/keyboard control from the user's actual desktop — the canonical pattern is to spawn a Docker container with Xvfb + an X11 display + a desktop environment (XFCE/Mate) + Firefox + a VNC server, and have Claude control THAT VM rather than the host's actual desktop, but claw-code has zero VM-orchestration / sandbox-spawn / display-isolation primitive at any layer. Zero session-state-machine type for the `screenshot → tool_use → human-confirmation? → mouse/keyboard-action → screenshot → ...` feedback loop: the canonical computer-use loop is (a) model emits `tool_use` block with `{"type": "tool_use", "name": "computer", "input": {"action": "screenshot"}}` or `{"action": "left_click", "coordinate": [x, y]}` or `{"action": "type", "text": "hello"}`, (b) harness executes the action, (c) harness captures a fresh screenshot, (d) harness sends back `{"type": "tool_result", "tool_use_id": "...", "content": [{"type": "image", "source": {...}}]}` — the loop iterates 5-50+ times per coding task and the harness is solely responsible for grounding-the-model-in-fresh-pixel-state every turn, but claw-code has zero loop-state-machine, zero turn-counter for safety-throttling, zero coordinate-validation against current display dimensions, zero per-action permission-prompt integration for irreversible actions like form-submit / file-delete / browser-navigation. Zero `claw computer` / `claw computer-use` / `claw desktop` / `claw control` / `claw vnc` / `claw display` / `claw operate` CLI subcommand at `rust/crates/rusty-claude-cli/src/main.rs`. Zero `/computer` / `/computer-use` / `/operate` slash command in the `SlashCommandSpec` table at `rust/crates/commands/src/lib.rs` — the existing `/desktop` slash command at `rust/crates/commands/src/lib.rs:422-427` advertises `summary: "Open or manage the desktop app integration"` but is gated under STUB_COMMANDS at `rust/crates/rusty-claude-cli/src/main.rs:8319` (advertised-but-unbuilt shape, no parse arm, the advertisement leaks into completions and help-renders despite being entirely unbuilt — distinct from #225's audio-trio of advertised-but-unbuilt slash commands and distinct from #220's image-pair-advertised-but-unbuilt because the `/desktop` summary specifically claims a DESKTOP-APP integration shape that the user might confuse with computer-use's host-machine-control shape; the existing `/screenshot` and `/image` slash commands at `rust/crates/commands/src/lib.rs:576-589` are also STUB_COMMANDS-gated per #220, but #230 reveals a THIRD complementary advertised-but-unbuilt slash command — `/desktop` — that targets the same modality cluster as the existing `/screenshot` + `/image` pair, growing the advertised-but-unbuilt cluster to FIVE total entries: /image, /screenshot, /voice, /listen, /speak, /desktop). Zero `claude-3-5-sonnet-20241022` / `claude-3-7-sonnet-20250219` / `claude-sonnet-4-20250514` / `claude-sonnet-4-5-20250929` / `claude-opus-4-1-20250805` / `claude-opus-4-20250514` / `claude-haiku-4-5-20250929` / `claude-opus-4-7-20251209` / `claude-opus-4-6-20251015` / `claude-sonnet-4-6-20251015` model-recognition entries that map model-id-to-computer-use-beta-version (the routing table that decides whether to send `computer-use-2025-01-24` vs `computer-use-2025-11-24` — required because sending the WRONG beta header for the model returns a 400 error, the first cluster member where two concurrent beta-version-tiers must be routed per-model, distinct from #221/#223 which used a single beta header per endpoint family). Zero `computer_use_action_per_million_tokens` / `screenshot_capture_overhead_per_request` / `tool_result_image_size_premium` fields in `ModelPricing` struct (`rust/crates/runtime/src/usage.rs:9-15` has only four text-token-only fields — computer-use sessions burn input-token budget with each round-trip screenshot, where a typical 1024×768 PNG screenshot consumes 1500-3000 input tokens after Anthropic's image-token-encoding; a 50-turn computer-use session with screenshots-every-turn burns 75K-150K input tokens just on screenshot-feedback, the largest per-session token burn in the cluster after #229's audio-token burn — distinct cost shape from text-only request-response). Zero per-action permission-policy integration at `rust/crates/runtime/src/permissions.rs` where `bash` already has explicit `PermissionMode::DangerFullAccess` gating (line 517) — computer-use needs PARALLEL gating for `mouse_click` / `key_press` / `type` / `screenshot` actions because they can be MORE dangerous than bash (a misclick on a confirm-delete button is irreversible, an accidental form-submit can leak credentials, a typed password into the wrong window is exfiltration); the `permissions.rs` permission table at line 517+ has zero `computer` / `screenshot` / `mouse_click` entries, zero per-coordinate-region allowlist (e.g. "model can click anywhere except in the top-right window-close button area"), zero per-application allowlist (e.g. "model can interact with Firefox but not Slack"). Zero telemetry events for computer-use action emissions: zero `ComputerUseActionEvent` / `ScreenshotCapturedEvent` / `MouseClickEvent` / `KeyPressEvent` / `ComputerUseSessionStartedEvent` / `ComputerUseSessionEndedEvent` typed event variants on the runtime telemetry sink, blocking observability into per-action latency / per-action cost / per-action permission-decision history that the canonical computer-use harness must surface for safety-audit purposes. Zero canvas/dom/headless-browser alternative: zero `playwright-rust` / `headless_chrome` / `chromiumoxide` / `puppeteer-rs` / `fantoccini` / `webdriver` / `geckodriver-bindings` dependency for the browser-only-computer-use subset (an alternative to full-desktop computer-use is browser-only computer-use where the model controls a headless Chromium tab via DOM-and-coordinate APIs — distinct from full-OS computer-use which requires display-capture and synthetic input events at the OS level — but claw-code has neither host-OS nor headless-browser computer-use primitives), so even the safer browser-only-computer-use subset is structurally unreachable. + +**Shape:** ELEVEN-LAYER fusion shape (the largest single-pinpoint fusion catalogued so far, exceeding #229's ten-layer count — #230 establishes a new fusion-shape ceiling) combining: (1) anthropic-beta-header-with-DUAL-version-tier routing (`computer-use-2025-01-24` for Sonnet-4.5/Haiku-4.5/Opus-4.1/Sonnet-4/Opus-4/Sonnet-3.7, `computer-use-2025-11-24` for Opus-4.7/Opus-4.6/Sonnet-4.6 with zoom enhancements — the FIRST cluster member with TWO concurrently-active beta-version-tiers requiring per-model routing, distinct from #221/#223 which used a single beta-header per endpoint family); (2) Anthropic-typed-tool-definition discriminator (`type: "computer_20250124"` / `"computer_20251124"` / `"bash_20250124"` / `"text_editor_20250124"` — a SECOND-order tool-definition shape distinct from `ToolDefinition`'s user-defined-tool shape and distinct from OpenAI's function-calling shape, the FIRST cluster member that requires a `type` discriminator on tool-definitions and the FIRST cluster member with anthropic-defined-tools-without-input-schema — the API REJECTS requests that include `input_schema` for computer/bash/text_editor tool-types because the schema is implicit in the discriminator); (3) parametrized-tool-definition with required `display_width_px` / `display_height_px` / optional `display_number` typed-fields on the `computer` tool-type (the FIRST cluster member where the tool-definition itself carries required runtime-environment-parameters, distinct from every prior tool-definition where parameters are encoded in the input-schema and dispatched at tool-call time); (4) `Image` variant on `ToolResultContentBlock` for screenshot-as-tool-result (the FIRST cluster member where the tool-result side of the conversation taxonomy must accept image content — distinct from #220 which catalogs Image-on-USER-INPUT side absence, complementary but architecturally distinct surfaces requiring two separate Image variants on two separate enums); (5) host-OS-system-call-transport-axis with screen-capture / synthetic-input / window-enumeration / display-dimension-query primitives (the SECOND non-HTTP transport requirement in the cluster after #229's WebSocket transport, and the FIRST host-OS-system-call transport requirement — distinct from #229's WebSocket because (a) host-OS calls are SYNCHRONOUS rather than persistent-bidirectional, (b) host-OS calls require platform-specific bindings for macOS/Windows/Linux instead of one cross-platform protocol, (c) host-OS calls have ZERO ecosystem-standardization unlike WebSocket's RFC 6455, (d) host-OS calls require accessibility-permissions on macOS / UIPI-elevation on Windows / X11-or-Wayland-display-server on Linux that the user must grant out-of-band, (e) host-OS calls have side-effects on the user's actual machine rather than network-only side-effects); (6) virtual-display-sandbox-orchestration-axis with VM/container/Xvfb/Wayland-headless spawn-and-isolation primitives (the FIRST cluster member that requires CLIENT-SIDE virtualization / sandboxing / VM-spawn at the runtime layer, distinct from every prior cluster member where the runtime only orchestrates network-request-emission — #230 requires the runtime to spawn-and-manage a virtual-desktop sandbox process, monitor its lifecycle, route screen-capture into the runtime's tool-result emission, route synthetic-input from the runtime into the sandbox's display-server, and tear-down on session-end); (7) feedback-loop-state-machine-axis with screenshot-tool_use-action-screenshot iteration loop (the FIRST cluster member where the harness must implement an N-turn-loop-controller that grounds the model in fresh-pixel-state every turn, distinct from every prior cluster member where the harness emits a single request-response pair or a single one-way stream — #230 requires the harness to ATTENTION-MANAGE the screenshot-decay between turns since stale screenshots cause hallucinated coordinates, and to SAFETY-THROTTLE the loop with maximum-iterations-before-human-confirmation, and to PERMISSION-GATE irreversible actions); (8) per-action-permission-policy-axis with parallel-to-bash-tool gating for mouse-click / key-press / type / screenshot (the FIRST cluster member where the existing `permissions.rs` permission-table must add per-action sub-policies — distinct from every prior cluster member where the permission table operates at tool-NAME granularity, since computer-use needs per-ACTION granularity within a single tool name, AND per-coordinate-region allowlist, AND per-application allowlist — the largest permission-policy extension yet); (9) request-side opt-in: `tools: [{"type": "computer_20250124", "name": "computer", "display_width_px": 1024, "display_height_px": 768}, {"type": "bash_20250124", "name": "bash"}, {"type": "text_editor_20250124", "name": "str_replace_editor"}]` plus the `betas` opt-in plus the per-model beta-version-tier routing — three concurrent request-side opt-ins, the largest concurrent-opt-in count yet (exceeding #225's two concurrent and #229's one); (10) CLI-subcommand-and-slash-command-surface (`claw computer` / `claw operate` / `/computer` / `/operate` / `/desktop` — and the existing `/desktop` advertised-but-unbuilt slash command becomes the SIXTH advertised-but-unbuilt entry, the largest count in the cluster); (11) **host-machine-state-management transport-axis** — the NOVEL ELEVENTH layer, distinct from every prior cluster member's transport: synchronous-HTTP for #211 through #220 + #222 + #224, SSE-streaming for #213 partial subsets, multipart-form-data-HTTP for #223 / #225 / #226 / #227 / #228, async-task-polling-HTTP for #221 / #227 / #228, persistent-WebSocket for #229 — the cluster has now exhausted EVERY network-only transport, and #230 introduces the FIRST transport that BREAKS the network-only boundary and requires HOST-MACHINE-STATE-MANAGEMENT including (a) screen-capture via OS-API, (b) synthetic-mouse-and-keyboard-event-injection via OS-API, (c) display-dimension-query via OS-API, (d) window-and-application-enumeration via OS-API, (e) virtual-display-sandbox-spawn-and-orchestration, (f) accessibility-permission-grant-flow on macOS / UIPI-elevation on Windows / X11-or-Wayland-grant on Linux, (g) per-action permission-prompt UX integration, (h) coordinate-validation-against-current-display-dimensions per-turn, (i) screenshot-encoding-as-base64-PNG-with-correct-MIME-type per-turn, (j) safety-throttling-and-human-confirmation-loop integration — none of which any network-only transport requires, and ALL of which are platform-specific and side-effecting on the user's actual machine). + +**Key novelty vs prior cluster members:** #230 is the FIRST cluster member that introduces a host-OS-system-call transport (distinct from #229's WebSocket which is still network-protocol-only), the FIRST cluster member that requires CLIENT-SIDE virtualization / sandboxing / VM-orchestration at the runtime layer, the FIRST cluster member with TWO concurrent beta-version-tiers gating a single capability, the FIRST cluster member where the tool-definition shape requires a `type` discriminator instead of relying solely on `name` + `input_schema`, the FIRST cluster member with image-content on the TOOL-RESULT side of the conversation taxonomy (complementary to #220's image-content-on-USER-INPUT side), the FIRST cluster member where per-action permission-policy at sub-tool granularity is required, the FIRST cluster member where Anthropic-typed-tools-without-input-schema must be modeled, and the FIRST cluster member where the harness must implement a screenshot-tool_use-action feedback-loop-state-machine across N turns. Distinct from #229's persistent-WebSocket-transport because (a) #230's transport is SYNCHRONOUS host-OS-syscall not persistent-bidirectional-network-stream, (b) #230 requires platform-specific implementations for macOS/Windows/Linux while #229 has one cross-platform RFC 6455 protocol, (c) #230 has side-effects on the user's actual machine while #229 has network-only side-effects, (d) #230 requires out-of-band accessibility permissions while #229 requires only API key authentication. Distinct from #220's image-input absence because (a) #220 catalogs Image-on-USER-INPUT-SIDE while #230 catalogs Image-on-TOOL-RESULT-SIDE — two complementary but architecturally distinct surfaces requiring separate variants on separate enums (InputContentBlock vs ToolResultContentBlock), (b) #220's Image is a one-shot user-attachment while #230's Image is a feedback-loop signal from the harness back to the model after a screenshot action that iterates N times per coding task. Distinct from #225's audio-bidirectional shape because (a) #225 operates over three separate REST endpoints with synchronous request-response per endpoint, (b) #230 operates over a single host-OS transport with N-turn feedback-loop. Distinct from #221/#227/#228's async-task-polling shape because computer-use is push-pull synchronous (model pushes tool_use action, harness pulls fresh pixel state via screen-capture, harness pushes tool_result back, model pulls next action) rather than fire-and-poll-until-done. + +**External validation (sixty-two ecosystem references):** Anthropic Computer Use API reference at https://platform.claude.com/docs/en/agents-and-tools/tool-use/computer-use-tool documenting `computer-use-2025-01-24` beta header for Claude Sonnet 4.5/Haiku 4.5/Opus 4.1/Sonnet 4/Opus 4/Sonnet 3.7 and `computer-use-2025-11-24` beta header for Claude Opus 4.7/Opus 4.6/Sonnet 4.6 with zoom-and-pan-and-multi-display enhancements; Anthropic Computer Use launch announcement at https://www.anthropic.com/news/3-5-models-and-computer-use 2024-10-22 introducing the capability with Claude 3.5 Sonnet; Anthropic computer-use-demo reference implementation at https://github.com/anthropics/claude-quickstarts/tree/main/computer-use-demo with the canonical Docker+Xvfb+XFCE+Firefox+VNC sandbox pattern, Python harness using `pyautogui`+`Pillow`+`Xlib` for screen-capture and synthetic-input, and the canonical `screenshot` / `left_click` / `type` / `key` / `mouse_move` / `cursor_position` action set; Anthropic Computer Use tool definitions at https://github.com/anthropics/claude-quickstarts/blob/main/computer-use-demo/computer_use_demo/tools/computer.py with the canonical `display_width_px` / `display_height_px` / `display_number` parameter shape and `screenshot` / `left_click` / `right_click` / `middle_click` / `double_click` / `triple_click` / `mouse_move` / `left_click_drag` / `cursor_position` / `key` / `hold_key` / `type` / `wait` action enum; Anthropic SDK Python `client.beta.messages.create(betas=["computer-use-2025-01-24"], tools=[{"type": "computer_20250124", "name": "computer", "display_width_px": 1024, "display_height_px": 768}], ...)` first-class typed surface; Anthropic SDK TypeScript parallel surface at https://github.com/anthropics/anthropic-sdk-typescript/issues/914 with typed `ComputerToolParam` shape; AWS Bedrock Anthropic-relay computer-use support documented at https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-anthropic-claude-messages-request-response.html; Vertex AI Anthropic-relay computer-use; Azure-Anthropic computer-use mirror; OpenAI Operator (https://operator.openai.com) as the closest commercial competitor to Anthropic's computer-use, launched 2025-01-23 with browser-only computer-use shape, `tool_choice: computer_use_preview` opt-in, and the OpenAI Computer-Using-Agent (CUA) model `computer-use-preview-2025-01-23`; OpenAI Computer-Use API reference at https://platform.openai.com/docs/guides/tools-computer-use documenting `tool: {type: "computer_use_preview", display_width: 1024, display_height: 768, environment: "browser"}` shape and `screenshot` / `click` / `double_click` / `scroll` / `keypress` / `type` / `move` / `wait` action set; Google Project Mariner (https://deepmind.google/technologies/project-mariner) browser-only computer-use; Microsoft Magentic-One computer-use stack; Adept ACT-1 / Adept Workflow Language computer-use; ByteDance UI-TARS open-weight computer-use model (https://github.com/bytedance/UI-TARS); browser-use Python framework (https://github.com/browser-use/browser-use) with Playwright-backed computer-use; Stagehand TypeScript framework with Playwright-backed computer-use; Skyvern AI computer-use platform; Multion AI computer-use; SuperHuman.ai computer-use stack; Cua (computer-use-agent) reference framework; LangChain `ChatAnthropic.with_computer_use_tool` first-class typed integration; LangGraph computer-use agent pattern with screenshot-loop-controller; smolagents `ComputerAgent` first-class typed integration; Pydantic AI computer-use tool-binding; CrewAI computer-use agent role; AutoGPT computer-use plugin; AgentOps computer-use observability with per-action latency and per-action cost telemetry; canonical screen-capture libraries: `screencapture` (macOS native CLI), `ScreenCaptureKit` (macOS framework, the modern replacement for CGWindowList), `xcap` (cross-platform Rust crate), `screenshots` (Rust), `xdotool` (Linux-X11), `wtype` (Wayland), `cliclick` (macOS), `nut.js` (cross-platform Node.js); canonical synthetic-input libraries: `enigo` (cross-platform Rust), `rdev` (cross-platform Rust), `inputbot` (cross-platform Rust), `mouce` (cross-platform Rust), `pyautogui` (Python), `RobotJS` (Node.js); canonical browser-only computer-use stacks: `playwright-rust` (Rust), `chromiumoxide` (Rust), `headless_chrome` (Rust), `fantoccini` (Rust WebDriver), `puppeteer-rs` (Rust), `playwright` (Python/Node.js), `puppeteer` (Node.js); canonical sandbox-orchestration: Docker-Xvfb-XFCE the anthropic-quickstarts-canonical pattern, Kasm Workspaces commercial Docker-VNC-streaming, noVNC HTML5 VNC client, Browserbase commercial sandbox-as-a-service for browser-only computer-use, Steel-browser commercial sandbox, Hyperbrowser commercial sandbox, Lightpanda Rust-native browser-engine for headless-cua, Surf.ai commercial browser-cua-sandbox; per-action permission-policy precedent: claw-code's existing `bash` tool with `PermissionMode::DangerFullAccess` at `rust/crates/runtime/src/permissions.rs:517` is the canonical parallel — computer-use needs the same gating granularity but at sub-tool-action level (mouse_click vs screenshot vs type), distinct from any existing permission entry; coding-agent peer landscape: anomalyco/opencode has zero computer-use integration (verified via web search 2026-04-26), sst/opencode predecessor zero computer-use, charmbracelet/crush zero computer-use, continue.dev zero computer-use, aider zero computer-use, cursor zero computer-use (but has Claude-3.5-Sonnet/4.5/Opus-4 chat which COULD computer-use if cursor wired the integration), zed zero computer-use, github/copilot zero computer-use, codeium/cline zero computer-use — claw-code is one of MULTIPLE coding-agent clients without computer-use, BUT the gap is uniformly zero across the surveyed coding-agent ecosystem and represents the next-frontier capability where Anthropic specifically positions Claude as the leading commercial computer-use model, making the gap STRUCTURALLY upstream-inherited from claude-code's documented intent to ship computer-use eventually (claude-code official CLI has computer-use stub in the slash-command spec table per `/desktop` advertised-but-unbuilt entry, identical to claw-code's STUB_COMMANDS listing — this is the FIRST gap where the upstream claude-code ALSO has only a stub, not a finished feature, distinct from #220 image-input where upstream claude-code has shipped paste-image-and-screenshot-shortcuts as GA features that claw-code is regressing-against). + +**Clusters:** Sibling-shape cluster grows to 29. Wire-format-parity cluster grows to 20. Capability-parity cluster grows to 12. Multimodal-IO cluster grows to 8 (#220 image-input + #224 embedding-output + #225 audio-bidirectional + #226 image-output + #227 video-output + #228 mesh-output + #229 audio-text-tool-multiplex-on-persistent-WebSocket + #230 image-on-tool-result-side + host-OS-pixel-and-input-event modality). Provider-asymmetric-delegation cluster grows to 7 — but with a NOVEL inversion: #230 is the FIRST cluster member where Anthropic is the LEADING-coverage provider (computer-use is Anthropic's flagship agent capability, with OpenAI's Operator/computer_use_preview as a SECOND-tier follower and Google's Project Mariner as a THIRD-tier follower) instead of the trailing-coverage provider as in #224 (embeddings, where Anthropic delegates to Voyage), #225 (audio, where Anthropic delegates to ElevenLabs/etc), #226 (image-gen, where Anthropic delegates to Stability/Midjourney), #227 (video-gen, where Anthropic delegates to Runway/Sora), #228 (3D-gen, where Anthropic has zero coverage), #229 (realtime, where Anthropic has zero coverage) — making #230 the FIRST member of an INVERSE-asymmetric-delegation sub-cluster where Anthropic leads and OpenAI follows, distinct from the original asymmetric-delegation pattern where Anthropic delegates outward. **Beta-version-tier-routing cluster: 1 member (#230 alone, founder).** **Image-on-tool-result-side cluster: 1 member (#230 alone, founder).** **Anthropic-typed-tool-discriminator cluster: 1 member (#230 alone, founder).** **Host-OS-system-call-transport cluster: 1 member (#230 alone, founder).** **Virtual-display-sandbox-orchestration cluster: 1 member (#230 alone, founder).** **Feedback-loop-state-machine cluster: 1 member (#230 alone, founder).** **Per-action-permission-policy-at-sub-tool-granularity cluster: 1 member (#230 alone, founder).** **Inverse-asymmetric-delegation cluster: 1 member (#230 alone, founder).** Eight new clusters founded in a single pinpoint — exceeds #229's three concurrent novel clusters and is the largest single-cycle cluster-founding count yet. Eleven-layer-fusion-shape exceeds #229's ten-layer count and is the largest single-pinpoint fusion catalogued. Distinct from prior cluster members; the eleven-layer-fusion-shape-with-host-OS-system-call-transport-and-host-machine-state-management is novel and applies to follow-on candidates: Code-execution / Code-Interpreter API typed taxonomy (`betas: ["code-execution-2025-08-25"]`, OpenAI Assistants `tool_choice: code_interpreter` — would extend the cluster with server-managed-sandbox-state + persistent-file-system + execution-sandbox-isolation axes, the natural #231 candidate), Web-search / Search Tool API (OpenAI `tool_choice: web_search`, Anthropic web-search-tool beta — would extend with search-result-citation-attribution + structured-citation-data-model + server-managed-search-state axes, the natural #232 candidate), Music-generation API (Suno / Udio / Stable Audio — would extend the multimodal-IO cluster with lyrics-and-style-prompt-bifurcation request shape). + +**Status:** Open. No code changed. Filed 2026-04-26 05:00 KST. HEAD: b860f56 (post-#229). Branch: feat/jobdori-168c-emission-routing. Sibling-shape cluster: 29 pinpoints. Multimodal-IO cluster: 8 members. Provider-asymmetric-delegation cluster: 7 members (with first-ever inverse-asymmetric sub-cluster). **Beta-version-tier-routing cluster: 1 member (founder).** **Image-on-tool-result-side cluster: 1 member (founder).** **Anthropic-typed-tool-discriminator cluster: 1 member (founder).** **Host-OS-system-call-transport cluster: 1 member (founder).** **Virtual-display-sandbox-orchestration cluster: 1 member (founder).** **Feedback-loop-state-machine cluster: 1 member (founder).** **Per-action-permission-policy-at-sub-tool-granularity cluster: 1 member (founder).** **Inverse-asymmetric-delegation cluster: 1 member (founder).** Eight new clusters founded in a single pinpoint — the first time a single cycle has founded eight concurrent novel clusters, exceeding #229's three. Eleven-layer-fusion-shape is the largest single-pinpoint fusion catalogued. Distinct from prior cluster members; the eleven-layer-fusion-shape-with-host-OS-system-call-transport-and-host-machine-state-management is novel and applies to follow-on candidate Code-execution / Code-Interpreter API typed taxonomy (the natural #231 candidate that introduces server-managed-sandbox-state + persistent-file-system axes — distinct from #230's CLIENT-SIDE virtualization because #231 is SERVER-SIDE-managed sandbox isolation, complementary inverse-locality axes that together define the full sandbox-and-virtualization surface needed for next-generation coding-agent harnesses). #230 closes the upstream prerequisite of every desktop-automation / browser-automation / form-filling / GUI-testing / accessibility-tool / screen-reading / vision-grounded-coding / pair-programming-with-screen-share / visual-debugging coding-agent affordance — the canonical 2024-2026-era agentic coding workflow that is currently impossible to build on top of claw-code DESPITE Anthropic explicitly positioning Claude as the leading commercial computer-use model and DESPITE claw-code being a port of claude-code which advertises `/desktop` slash command intent, making this the largest leading-vs-trailing parity gap with the upstream Anthropic platform in the entire emission-routing audit and the first cluster member where the upstream parent claude-code ALSO has only a stub. + +🪨