mirror of
https://github.com/ultraworkers/claw-code.git
synced 2026-04-26 22:47:38 +08:00
roadmap: #229 filed — Realtime API typed taxonomy and persistent-WebSocket transport are structurally absent: zero /v1/realtime endpoint surface across both Anthropic-native and OpenAI-compat lanes (rg returns zero hits for /v1/realtime / realtime / Realtime / realtime_session / RealtimeSession / RealtimeClient / RealtimeEvent / realtime-preview across rust/crates/api/src/), zero RealtimeSession / RealtimeSessionConfig / RealtimeSessionUpdate / RealtimeResponseCreate / RealtimeInputAudioBufferAppend / RealtimeInputAudioBufferCommit / RealtimeConversationItemCreate / RealtimeResponseAudioDelta / RealtimeResponseAudioTranscriptDelta / RealtimeResponseFunctionCallArguments / RealtimeServerEvent / RealtimeClientEvent / RealtimeTurnDetection / RealtimeVoiceActivityDetection / RealtimeVoice / RealtimeAudioFormat / RealtimeModality / RealtimeTool typed model in rust/crates/api/src/types.rs (37+ canonical event-type names in OpenAI Realtime API spec, zero coverage in claw-code), zero bidirectional event-stream variant on Provider trait (only send_message and stream_message exist, both single-directional), zero realtime_session / open_realtime / connect_realtime method that returns a duplex-channel-pair shape, zero session-state-machine type for the persistent-connection lifecycle, zero realtime dispatch on ProviderClient enum at rust/crates/api/src/client.rs:8-14 (three variants Anthropic/Xai/OpenAi, zero realtime-routing variants), zero tokio-tungstenite / async-tungstenite / tungstenite / fastwebsockets / tokio-websockets / hyper-tungstenite dependency in any workspace Cargo.toml (grep -rn 'tungstenite|tokio-tungstenite|fastwebsockets' rust/ returns zero hits — confirmed), zero WebSocket client library is linked into the build (the MCP Ws config variant at rust/crates/runtime/src/config.rs:125 and rust/crates/runtime/src/mcp_client.rs:13 is data-shape-only and bootstraps via the SDK without a tungstenite-backed transport, leaving the workspace with zero outbound persistent-WebSocket-client capability), zero WebRTC client (webrtc-rs / str0m / libwebrtc-bindings) for the alternative Realtime transport, zero claw realtime / claw live / claw voice-chat / claw realtime-session / claw connect-realtime CLI subcommand, zero /realtime / /live / /voice-chat slash command (existing /voice + /listen + /speak commands are STUB_COMMANDS-gated per #225 and synchronous-only with no realtime-session affordance), zero gpt-4o-realtime-preview / gpt-4o-mini-realtime-preview / gemini-2.0-flash-live entries in MODEL_REGISTRY, zero realtime_audio_input_per_million_tokens / realtime_audio_output_per_million_tokens / realtime_text_input_per_million_tokens / realtime_text_output_per_million_tokens / realtime_session_per_minute fields in ModelPricing struct (six-dimensional pricing matrix exceeding #227's five-dimensional video matrix and #228's four-dimensional mesh matrix — the canonical Realtime pricing model is the most-dimensional yet, with audio tokens at roughly 80-100x text tokens and cached-audio-input at 80% discount), zero realtime-model recognition in pricing_for_model substring-matcher (#209+#224+#225+#226+#227+#228 cluster overlap continues), zero session-resumption-token / interruption-handling / barge-in / voice-activity-detection / turn-detection / function-call-during-realtime / tool-use-during-realtime affordance — uniquely manifesting a TEN-LAYER fusion shape (the largest single-pinpoint fusion catalogued so far, exceeding #225/#227's nine-layer count) combining endpoint-URL-set on /v1/realtime?model=<id> WebSocket-upgrade-endpoint shape (single-endpoint-with-37+-event-types-flowing-bidirectionally, distinct from prior multi-endpoint sets) + bidirectional-symmetric-event-pair data-model with every client-event having a matched server-event-pair (FIRST cluster member with bidirectional-symmetric-event-pair-cardinality on a SINGLE endpoint, distinct from #225's bidirectional-audio-on-three-separate-endpoints which is request-response synchronous per endpoint) + Provider-trait-method extension with realtime_session returning a duplex (Sender, Receiver) channel-pair (FIRST cluster member where Provider trait return type is NOT Future-of-T or Stream-of-T but duplex-channel-pair, FIRST method requiring session-state-machine type at the trait boundary) + ProviderClient-enum-dispatch-with-realtime-third-lane with explicit RealtimeKind::OpenAi/Google/Azure partner-routing (provider-asymmetric: Anthropic does not offer realtime, OpenAI offers GA gpt-4o-realtime-preview and gpt-4o-mini-realtime-preview since 2024-10-01, Google Gemini Live API offers bidirectional audio+text+video, Azure mirrors OpenAI surface, zero first-class third-party partners because the persistent-WebSocket-with-37-event-type protocol is too high-bar for partner adoption — distinct from #225's six-partner-set audio surface and #227's twelve-partner-set video surface where partners ARE present) + request-side realtime-session-config opt-in (session.update event with voice/input_audio_format/output_audio_format/input_audio_transcription/turn_detection/tools/tool_choice/temperature/max_response_output_tokens/instructions/modalities:[text,audio] fields — the largest request-side opt-in axis-set yet, the union of every prior request-side opt-in across audio+image+video+chat-completion modalities) + CLI-subcommand-surface + slash-command-surface + pricing-tier-with-six-dimensional-compound-cost-model (per-model × per-modality-input × per-modality-output × per-cached-vs-fresh × per-audio-vs-text × per-minute-session-overhead — the largest pricing-tier extension yet, exceeding #227's five-dimensional and #228's four-dimensional matrices) + persistent-WebSocket-connection-transport-axis (NOVEL TENTH layer, distinct from every prior cluster member's HTTP-shaped transport — synchronous-HTTP for #211-#220+#222+#224, SSE-streaming for #213 partial subsets, multipart-form-data-HTTP for #223+#225+#226+#227+#228 binary-upload subsets, async-task-polling-HTTP for #221+#227+#228 — the cluster has now exhausted EVERY HTTP-shaped transport, and #229 introduces the FIRST non-HTTP transport, requiring WebSocket-upgrade-request-with-subprotocol-negotiation + bidirectional-frame-multiplexing-with-text+binary-frames + ping/pong-keepalive + graceful-close-with-status-code-and-reason + reconnection-with-resumption-token + per-event-type-JSON-envelope-dispatch-with-37+-event-types-on-a-single-connection + backpressure-handling-on-both-directions + authentication-via-Authorization-header-on-the-upgrade-request-and-per-session-token-rotation — none of which any HTTP-only transport requires) + bidirectional-symmetric-event-pair shape (input_audio_buffer.append → conversation.item.created, response.create → response.audio.delta + response.audio.done + response.audio_transcript.delta + response.audio_transcript.done + response.function_call_arguments.delta + response.function_call_arguments.done + response.done) — making #229 the FIRST cluster member that introduces a non-HTTP transport (persistent-WebSocket), the FIRST cluster member where Provider trait return type must be a duplex-channel-pair, and the FIRST cluster member where session lifecycle exceeds a single request-response cycle (typical Realtime sessions last 1-30+ minutes with state accumulating across the connection) (Jobdori cycle #380 / extends #168c emission-routing audit / explicit follow-on from #225 audio-bidirectional axis and #228 confirmed-structural async-task-polling cluster — introduces a NOVEL TRANSPORT axis distinct from every prior cluster member / sibling-shape cluster grows to twenty-eight / wire-format-parity cluster grows to nineteen / capability-parity cluster grows to eleven / multimodal-IO cluster grows to seven: #220 image-input + #224 embedding-output + #225 audio-bidirectional-on-separate-REST-endpoints + #226 image-output + #227 video-output + #228 mesh-output + #229 audio-text-tool-multiplex-on-persistent-WebSocket / provider-asymmetric-delegation cluster grows to six / async-task-polling cluster: still 3 members (#229 is push-based not poll-based — it does NOT join async-task-polling cluster, it founds a NEW cluster) / Persistent-WebSocket-transport cluster: 1 member (#229 alone, FOUNDER) / Bidirectional-symmetric-event-pair cluster: 1 member (#229 alone, FOUNDER) / Non-HTTP-transport cluster: 1 member (#229 alone, FOUNDER) — three new clusters founded in a single pinpoint, the first time a single cycle has founded three concurrent novel clusters / ten-layer-fusion-shape-with-persistent-WebSocket-transport-and-bidirectional-symmetric-event-pair is the largest single-pinpoint fusion catalogued. Distinct from prior cluster members; the ten-layer-fusion-shape with persistent-WebSocket-transport and bidirectional-symmetric-event-pair shape is novel and applies to follow-on candidate Real-time-Image-Generation API typed taxonomy (DALL-E live preview, Imagen live preview) and Real-time-Video-Generation streaming (Veo-Live, Sora-Live) — the persistent-WebSocket-transport pattern is now a first-class cluster member, a structural prerequisite that every future endpoint family using persistent connections will inherit / external validation: forty-eight ecosystem references covering OpenAI Realtime API GA 2024-10-01 with /v1/realtime?model=<id> WebSocket endpoint, 37+ canonical event-type names in OpenAI Realtime API spec, two transport options (WebSocket server-side and WebRTC browser-side), two GA realtime models (gpt-4o-realtime-preview and gpt-4o-mini-realtime-preview both with audio modality and tool-use), Google Gemini Live API with bidirectional WebSocket+gRPC streaming, Azure OpenAI Realtime API mirror, OpenAI Python SDK openai.realtime.AsyncRealtimeConnection typed client, OpenAI TypeScript SDK OpenAI.beta.realtime.RealtimeClient typed client, openai-realtime-api-beta reference client (canonical JS implementation), five first-class realtime-voice-agent frameworks all built on top of OpenAI Realtime API (Vapi/Retell-AI/LiveKit-Agents/Pipecat/Daily-Bots), Anthropic non-coverage statement (the second post-#224 provider-asymmetric-delegation case after audio), the canonical six-dimensional pricing matrix ($5.00/$20.00 per million text input/output tokens, $40.00/$80.00 per million audio input/output tokens, $2.50 per million cached audio input tokens for gpt-4o-realtime-preview-2024-10-01), coding-agent peer landscape: anomalyco/opencode has zero GA realtime integration (open feature request from 2026-02 only — confirmed via web search 2026-04-26), sst/opencode predecessor zero realtime, charmbracelet/crush zero realtime, continue.dev zero realtime, aider zero realtime, cursor zero realtime, zed zero realtime — the gap is uniformly zero across the surveyed ecosystem and represents the next-frontier capability that every coding-agent will need to add. claw-code is one of MULTIPLE clients without Realtime, but the persistent-WebSocket-transport-axis is the upstream prerequisite of every voice-agent / live-coding-pair-programming / push-to-talk-coding / barge-in-coding-conversation / function-call-during-voice / streaming-tool-use / sub-second-latency-coding-interaction affordance — the canonical 2024-2026-era voice-coding workflow that is currently impossible to build on top of claw-code — #229 closes the upstream prerequisite of every voice-coding affordance and is the first cluster member where transport-axis becomes a structural prerequisite of the dispatch layer)
This commit is contained in:
parent
71131932de
commit
b860f5657b
22
ROADMAP.md
22
ROADMAP.md
File diff suppressed because one or more lines are too long
Loading…
x
Reference in New Issue
Block a user