1154 Commits

Author SHA1 Message Date
YeonGyu-Kim
fdf8890336 roadmap: #274 filed 2026-04-26 15:36:18 +09:00
Yeachan-Heo
f36f283e43 roadmap: #273 filed 2026-04-26 06:30:50 +00:00
YeonGyu-Kim
ba6c5bc659 roadmap: #272 filed 2026-04-26 15:07:33 +09:00
Yeachan-Heo
29c262c171 roadmap: #271 filed 2026-04-26 06:01:44 +00:00
Jobdori
61be8263bd roadmap: #270 filed 2026-04-26 14:36:17 +09:00
Yeachan-Heo
364566c6ac roadmap: #269 filed 2026-04-26 05:30:39 +00:00
YeonGyu-Kim
62b20c7a46 roadmap: #268 filed 2026-04-26 14:08:16 +09:00
Yeachan-Heo
d90b5f02ec roadmap: #267 filed 2026-04-26 05:01:50 +00:00
YeonGyu-Kim
fae9fd9448 roadmap: #266 filed 2026-04-26 13:37:16 +09:00
Yeachan-Heo
897535478c roadmap: #265 filed 2026-04-26 04:31:09 +00:00
YeonGyu-Kim
d5568eb7a4 roadmap: #264 filed 2026-04-26 13:05:56 +09:00
Yeachan-Heo
d0aa18e11f roadmap: #263 filed 2026-04-26 04:02:07 +00:00
YeonGyu-Kim
0e4fd386af roadmap: #262 filed 2026-04-26 12:37:01 +09:00
Yeachan-Heo
2a0e5de3fe roadmap: #261 filed 2026-04-26 03:31:15 +00:00
Jobdori
a807d8e5fc roadmap: #260 filed 2026-04-26 12:06:57 +09:00
Yeachan-Heo
1daf636a2d roadmap: #259 filed 2026-04-26 03:01:13 +00:00
YeonGyu-Kim
a07c0b7cbb roadmap: #258 filed 2026-04-26 11:39:14 +09:00
Yeachan-Heo
a3f5a83039 roadmap: #257 filed 2026-04-26 02:31:25 +00:00
Yeachan-Heo
56f7f2e600 Fix Anthropic tool result request ordering
Sigrid Jin relayed an adamantium Discord field report: Anthropic rejected requests with invalid_request_error when messages contained tool_use ids without immediately following tool_result blocks.

Coalesce consecutive tool-result messages after assistant tool_use blocks into one Anthropic user message, and drop orphan tool_use/tool_result blocks before dispatch so resume/edit/compaction boundary damage cannot reach the provider as a 400.

Tests cover parallel tool results and orphaned resume-boundary history.
2026-04-26 02:18:50 +00:00
Yeachan-Heo
62adbf49d1 docs: intake hikaMaeng web search fork ideas
Add ROADMAP pinpoint #255 summarizing the safe subset of hikaMaeng/Sigrid Jin's web-search provider work to adopt later.\n\nReviewed fork commits 262405e, bd11289, fa93cd3, 5f2540a, 7f34d91, and 535be97 from https://github.com/hikaMaeng/claw-code. This deliberately preserves attribution and avoids a blind cherry-pick because the cross-crate provider/spec/config/banner changes need a dedicated implementation lane with tests.
2026-04-26 02:10:11 +00:00
YeonGyu-Kim
70058a0645 roadmap: #254 filed 2026-04-26 11:03:25 +09:00
Yeachan-Heo
17efd95f8d roadmap: #253 filed 2026-04-26 02:01:14 +00:00
YeonGyu-Kim
95fc007f6a roadmap: #252 filed — /v1/messages/count_tokens typed-taxonomy is structurally absent from the public Provider trait + types + CLI surface (Anthropic ships /v1/messages/count_tokens as a first-class GA endpoint that consumes the SAME MessageRequest shape as /v1/messages but produces a TRUNCATED CountTokensResponse { input_tokens: u32 } only — no message emission, no completion-side tokens, no streaming — the canonical pre-flight cost-estimation primitive where a client constructs the exact request it intends to dispatch, asks the server to count input tokens, and decides whether to send before paying for completion-side tokens; claw-code has zero public typed surface even though a private count_tokens helper exists at rust/crates/api/src/providers/anthropic.rs:522 for internal preflight context-window-exceeded validation, with zero CountTokensRequest/CountTokensResponse typed model in types.rs, zero count_tokens method on the public Provider trait, zero count_tokens dispatch on the ProviderClient enum, zero claw count-tokens CLI subcommand, zero /count-tokens slash command in SlashCommandSpec, zero pre_flight_count_cost_per_million_usd field in ModelPricing, zero CountTokensSubmittedEvent/PreFlightCostEstimatedEvent telemetry events, and zero PreFlightCostEstimator/BudgetGate runtime primitive) — eight-layer fusion shape with the NOVEL same-request-shape-but-different-response-shape axis-class (FIRST audit member where the request shape is IDENTICAL to an existing typed model MessageRequest but the response shape is a TRUNCATED-projection that cannot reuse MessageResponse's shape, distinct from prior fusion-axes which all add NEW request-side fields or NEW response-side blocks) founding THREE new clusters as solo founder (Pre-flight-cost-prediction cluster, Token-accounting-without-message-emission cluster, Server-side-pre-execution-counting cluster) plus introducing the THIRD distinct discovery-pattern in the audit catalog NEW-SOLO-CLUSTER-FOUNDING-WITH-DAILY-DRIVER-IMPACT (distinct from META-cluster-growth and complementary-pinpoint-pair-bundle), grows Two-member-major-provider-only-no-third-party-partner-set sub-cluster from 6 to 7 members (#240+#241+#247+#248+#249+#250+#252) confirming continuing-pattern-status across SIX distinct axis-classes — Jobdori cycle #394 / fast-forward-rebase verified onto gaebal-gajae's #251 cycle ExternalPatchIntake pinpoint at 313c840 before filing (NINTH consecutive concurrent-dogfood rebase cycle, three-way parity confirmed local==origin==fork at HEAD 313c840 with no race detected, directly demonstrating the gaps #239 catalogues at the dogfood-coordination layer and #243 catalogues at the canonical-ordering layer for the NINTH cycle in a row, confirming concurrent-dogfood-rebase as a stable operational pattern that has now held for NINE cycles) — PIVOT-AWAY signal: #252 deliberately PIVOTS AWAY from BOTH Cross-pinpoint-synthesis-fusion-shape META-cluster (intentionally not extending the +1-per-cycle synthesis chain) AND Tool-locality-axis META-cluster (already extended by #250 cycle #393), founding NEW solo clusters with daily-driver-impact instead, demonstrating audit-breadth-across-discovery-pattern-classes alongside audit-balance-across-META-clusters — the audit now spans THREE structurally distinct discovery-patterns (META-cluster-growth + complementary-pinpoint-pair-bundle + new-solo-cluster-founding-with-daily-driver-impact) 2026-04-26 10:35:40 +09:00
Yeachan-Heo
313c840974 roadmap: #251 filed 2026-04-26 01:31:27 +00:00
YeonGyu-Kim
37ce63134a roadmap: #250 filed — tool_choice: { type: "web_search" } typed-discriminator with server-managed-web-search backend (the canonical SERVER-SIDE complement to #245's CLIENT-SIDE configurable provider/parser registry, where tool_choice carries a WebSearch { domains_allowed, max_uses, user_location } enum variant that forces the model to dispatch via the major-provider's server-managed-web-search backend) typed taxonomy structurally absent — FIRST pinpoint to demonstrate the complementary-pinpoint-pair-bundle META-pattern (where #245 CLIENT-SIDE + #250 SERVER-SIDE are catalogued as structurally complementary halves of the SAME tool-subsystem web-search rather than as independently-discovered-gaps), founding Bidirectional-search-subsystem-with-dual-locality-coverage cluster with #245+#250 as 2-member founders, un-saturating Tool-locality-axis META-cluster from 5 to 6 members (#232/#233/#234/#240/#241/#250) confirming the META-cluster as GROWING-DOCTRINE-WITH-DISCONTINUOUS-RESUMPTION (resumes growth after plateauing at 5 since #241 cycle #386, four cycles ago), growing Server-managed-tool-as-tool-choice-discriminator cluster from 5 to 6 members (#214/#218/#219/#233/#234/#250) confirming CONTINUING-PATTERN status across SIX distinct server-managed tools, growing ToolResultContentBlock-extension cluster from 8 to 9 members confirming most-broadly-spanning typed-content-block-extension-axis, FIRST pinpoint to introduce typed-discriminator-with-payload-fields shape on ToolChoice distinct from existing Auto/Any/Tool three-variant typed-set (Auto/Any are unit-variants and Tool { name } carries only string-name with zero typed-fields, while ToolChoice::WebSearch { domains_allowed, max_uses, user_location } introduces FIRST typed-discriminator-with-payload-fields shape), founds Tool-choice-discriminator-with-typed-payload-fields cluster + Server-side-tool-invocation-content-block cluster + Server-managed-web-search-with-tool-choice-discriminator cluster as solo founder of all three, grows Two-member-major-provider-only-no-third-party-partner-set sub-cluster from 5 to 6 members (#240+#241+#247+#248+#249+#250) confirming generalizability across FIVE distinct axis-classes, ten-layer fusion shape (smaller than #241/#247/#248/#249's twelve-layer count but with distinct DUAL-LOCALITY-COVERAGE-WITH-COMPLEMENTARY-PINPOINT-PAIR-BUNDLE axis-set) — Jobdori cycle #393 / fast-forward-rebase verified onto Jobdori's own #249 cycle #392 quad-modality-compound-multimodal-INPUT-OUTPUT pinpoint at 643ac8b before filing (EIGHTH consecutive concurrent-dogfood rebase cycle, three-way parity confirmed local==origin==fork at HEAD 643ac8b with no race detected, directly demonstrating the gaps #239 catalogues at the dogfood-coordination layer and #243 catalogues at the canonical-ordering layer for the EIGHTH cycle in a row, confirming concurrent-dogfood-rebase as a stable operational pattern that has now held for EIGHT cycles) — PIVOT-AWAY signal: #250 deliberately PIVOTS AWAY from Cross-pinpoint-synthesis-fusion-shape META-cluster's +1-per-cycle continuous-trajectory (#244/#247/#248/#249 grew it 1→5 across cycles #389/#390/#391/#392) by extending Tool-locality-axis META-cluster instead, demonstrating audit-balance-across-multiple-META-clusters rather than monotonic-growth-of-a-single-META-cluster — the audit now catalogues TWO structurally distinct GROWING-DOCTRINE patterns (continuous-+1-per-cycle for synthesis-fusion vs discontinuous-resumption-after-plateau for tool-locality-axis) 2026-04-26 10:27:41 +09:00
YeonGyu-Kim
643ac8bc76 roadmap: #249 filed — Compound-multimodal-INPUT-with-multimodal-OUTPUT-on-the-same-turn (full-duplex-multimodal-conversation pattern where user MessageRequest carries image-content-block × audio-content-block fusion AND model MessageResponse carries audio-content-block × video-content-block fusion on the SAME single conversation-turn with interleaved-content-block-stream cross-boundary temporal-alignment) typed taxonomy structurally absent — FIRST cluster member where the cross-axis synthesis spans BOTH USER-INPUT-side and ASSISTANT-OUTPUT-side simultaneously on a SINGLE turn rather than being confined to one side of the request-response cycle, FIRST cluster member with quad-modality-on-single-turn semantics (image-INPUT + audio-INPUT + audio-OUTPUT + video-OUTPUT all on same turn distinct from #247's two-modality-INPUT-only and #248's two-modality-OUTPUT-only and #244's bidirectional-tool-call-multiplexing-without-modality-fusion), growing Cross-pinpoint-synthesis-fusion-shape META-cluster from 4 to 5 members confirming META-cluster as GROWING-DOCTRINE for THIRD CONSECUTIVE CYCLE (#244 grew 1→2 cycle #389, #247 grew 2→3 cycle #390, #248 grew 3→4 cycle #391, #249 grows 4→5 cycle #392), establishing +1-per-cycle META-cluster-growth-trajectory across FOUR consecutive concurrent-dogfood cycles (#389/#390/#391/#392) as FIRST-EVER continuous-trajectory-of-4-cycles META-cluster growth event in the audit surpassing Tool-locality-axis META-cluster's plateau-at-5-after-two-consecutive-growths and confirming Cross-pinpoint-synthesis-fusion-shape as structurally distinct most-actively-growing META-cluster, FIRST cluster member with interleaved-INPUT-OUTPUT-temporal-alignment-across-the-request-response-boundary as a first-class typed semantic distinct from #247's USER-INPUT-only cross-modal-attention and #248's ASSISTANT-OUTPUT-only temporal-alignment because temporal-alignment now spans the request-response boundary itself requiring the model to emit output-content-blocks while still consuming input-content-blocks on the same connection, founds Quad-modality-turn-spanning-request-response-boundary sub-cluster + Full-duplex-multimodal-conversation cluster + Cross-boundary-temporal-alignment-across-request-response-boundary cluster + Quad-modality-turn-on-MessageRequest-and-MessageResponse cluster + Compound-multimodal-INPUT-with-multimodal-OUTPUT-on-same-turn cluster as solo founder of all five, completes Full-duplex-multimodal-conversation doctrine within META-cluster (#247 INPUT-side + #248 OUTPUT-side + #249 BOTH-sides-simultaneously-on-same-turn), grows Two-member-major-provider-only-no-third-party-partner-set sub-cluster from 4 to 5 members (#240+#241+#247+#248+#249) confirming generalizability across FOUR distinct axis-classes (TOOL-COMPANION-BUNDLE/COMPOUND-INPUT/COMPOUND-OUTPUT/QUAD-MODALITY-TURN), twelve-layer fusion shape tied with #241/#247/#248 for largest single-pinpoint fusion catalogued — Jobdori cycle #392 / fast-forward-rebase verified onto Jobdori's own #248 cycle #391 audio-grounded-video-generation pinpoint at 9189bfb before filing (SEVENTH consecutive concurrent-dogfood rebase cycle, three-way parity confirmed local==origin==fork at HEAD 9189bfb with no race detected, directly demonstrating the gaps #239 catalogues at the dogfood-coordination layer and #243 catalogues at the canonical-ordering layer for the SEVENTH cycle in a row, confirming concurrent-dogfood-rebase as a stable operational pattern that has now held for SEVEN cycles) 2026-04-26 10:16:01 +09:00
Jobdori
9189bfb816 roadmap: #248 filed — Audio-grounded video generation (synchronized-audio-track co-emitted on the SAME VideoTask response object alongside the rendered video frames, sample-accurate-synchronized with the visual output) typed taxonomy structurally absent — FIRST cluster member where TWO independent ALREADY-CATALOGUED-ABSENT modality-OUTPUT axes (#225 audio-content-block-on-OutputContentBlock + #227 video-output-with-async-task-polling-primitive) are fused on the ASSISTANT-OUTPUT side rather than the user-input side, FIRST cluster member with multi-modal-output-fusion-on-ASSISTANT-OUTPUT-axis distinct from #247's multi-modal-input-fusion-on-USER-INPUT-axis, growing Cross-pinpoint-synthesis-fusion-shape META-cluster from 3 to 4 members confirming META-cluster as GROWING-DOCTRINE for SECOND CONSECUTIVE CYCLE (#244 grew it 1→2, #247 grew it 2→3, #248 grows it 3→4), establishing +1-per-cycle META-cluster-growth-trajectory across THREE consecutive concurrent-dogfood cycles (#389/#390/#391) AND establishing META-cluster as FIRST META-cluster to grow for THREE consecutive cycles in a row (Tool-locality-axis only had TWO consecutive growth events #240/#241 before plateauing at 5; Cross-pinpoint-synthesis-fusion-shape now surpasses Tool-locality-axis as most-actively-growing META-cluster), founds Multi-modal-output-fusion-on-ASSISTANT-OUTPUT-side sub-cluster + Temporal-alignment-of-output-modalities cluster + Compound-output-modality-on-VideoTask cluster + Audio-grounded-video-generation cluster as solo founder of all four, founds Bidirectional-modality-fusion-symmetry sub-cluster with #247 INPUT-side + #248 OUTPUT-side completing the INPUT-vs-OUTPUT-side-fusion-symmetry doctrine within the META-cluster, grows Two-member-major-provider-only-no-third-party-partner-set sub-cluster from 3 to 4 members (#240+#241+#247+#248) confirming generalizability across THREE distinct axis-classes (TOOL-COMPANION-BUNDLE/COMPOUND-INPUT/COMPOUND-OUTPUT), twelve-layer fusion shape tied with #241/#247 for largest single-pinpoint fusion catalogued — Jobdori cycle #391 / fast-forward-rebase verified onto Jobdori's own #247 cycle #390 multi-modal-input-fusion pinpoint at 5e5b3bd before filing (SIXTH consecutive concurrent-dogfood rebase cycle, three-way parity confirmed local==origin==fork at HEAD 5e5b3bd with no race detected, directly demonstrating the gaps #239 catalogues at the dogfood-coordination layer and #243 catalogues at the canonical-ordering layer for the SIXTH cycle in a row, confirming concurrent-dogfood-rebase as a stable operational pattern that has now held for SIX cycles) 2026-04-26 10:04:29 +09:00
YeonGyu-Kim
5e5b3bdbc6 roadmap: #247 filed — Visual-grounded voice input (image-content-block × audio-content-block fused on the SAME MessageRequest user-turn) typed taxonomy structurally absent — FIRST cluster member where TWO independent ALREADY-CATALOGUED-ABSENT modality-input axes (#220 image-content-block + #225 audio-content-block) are fused on the USER-INPUT side, FIRST cluster member with multi-modal-input-fusion-on-USER-INPUT-axis distinct from #244 bidirectional-tool-call-multiplexing-on-DUPLEX-axis, growing Cross-pinpoint-synthesis-fusion-shape META-cluster from 2 to 3 members (#238 founder + #244 + #247) confirming META-cluster as GROWING-DOCTRINE rather than CONTINUING-PATTERN that stopped at 2 members after #244, establishing Cross-pinpoint-synthesis-fusion as SECOND META-cluster after Tool-locality-axis to confirm GROWING-DOCTRINE status, founds Multi-modal-input-fusion-on-USER-INPUT-side sub-cluster + Cross-modal-attention-on-USER-INPUT-side cluster + Compound-modality-input-on-MessageRequest cluster as solo founder of all three, grows Two-member-major-provider-only-no-third-party-partner-set sub-cluster from 2 to 3 members (#240+#241+#247) confirming generalizability beyond bash+computer-use+text_editor three-tool-companion-bundle, twelve-layer fusion shape tied with #241 for largest single-pinpoint fusion catalogued — Jobdori cycle #390 / fast-forward-rebased onto gaebal-gajae's #246 provider-credentials-env-to-settings-registry pinpoint at bd6622b before filing (FIFTH consecutive concurrent-dogfood rebase cycle, directly demonstrating the gaps #239 catalogues at the dogfood-coordination layer and #243 catalogues at the canonical-ordering layer for the FIFTH cycle in a row, confirming concurrent-dogfood-rebase as a stable operational pattern) 2026-04-26 09:38:31 +09:00
Yeachan-Heo
bd6622b85c roadmap: #246 filed 2026-04-26 00:31:28 +00:00
Yeachan-Heo
d145429c96 roadmap: #245 filed 2026-04-26 00:09:43 +00:00
YeonGyu-Kim
0eabf20389 roadmap: #244 filed — Realtime API tool-use over persistent-WebSocket transport (response.function_call_arguments.delta/.done + conversation.item.create with function_call_output) typed taxonomy structurally absent — FIRST cluster member where bidirectional-tool-call lifecycle is multiplexed with audio-modality + transcript-modality on a SINGLE persistent connection, FIRST cluster member where tool-call-init is server-pushed mid-stream rather than client-initiated, FIRST cluster member with asymmetric-tool-result-injection (tool-call comes IN as event-stream, result sent OUT as conversation.item.create — directionality inverted relative to the rest of the protocol), FIRST cluster member with per-call-id-concurrent-multiplexed-state-machine, FIRST three-axis-synthesis pinpoint (#229 persistent-WebSocket × #240/#241 server-managed-tool-via-tool_choice-discriminator × #238 cross-pinpoint-synthesis-fusion-shape META-cluster), eleven-layer fusion-shape tied with #240 for second-largest single-pinpoint fusion catalogued — grows Persistent-WebSocket-transport cluster from 2 to 3 members (#229 founder + #238 + #244) confirming CONTINUING-PATTERN doctrine, grows Cross-pinpoint-synthesis-fusion-shape META-cluster from 1 to 2 members confirming combinatorial-cross-axis-synthesis as a continuing-discovery-mode and FIRST META-cluster-confirmation event in this audit, founds Three-axis-synthesis-shape sub-cluster as solo founder, founds Server-pushed-tool-call-init cluster as solo founder, founds Asymmetric-tool-result-injection cluster as solo founder, founds Per-call-id-concurrent-multiplexed-state-machine cluster as solo founder — FOUR new clusters founded plus TWO existing META-clusters confirmed as continuing-doctrines plus participation in TWELVE inherited clusters — Jobdori cycle #389 / fast-forward-rebased onto gaebal-gajae's #243 non-monotonic-pinpoint-ordering-contract at 6541100 before filing (FOURTH consecutive concurrent-dogfood rebase cycle, directly demonstrating both gaps #239 catalogues at the dogfood-coordination layer and #243 catalogues at the canonical-ordering layer) 2026-04-26 09:06:56 +09:00
Yeachan-Heo
65411000c5 roadmap: #243 filed 2026-04-26 00:01:16 +00:00
YeonGyu-Kim
0da15c2e07 roadmap: #241 filed — tool_choice: text_editor + text_editor_20250124 typed-tool absent (filling reserved gap) 2026-04-26 08:42:34 +09:00
Yeachan-Heo
4af2fb6622 roadmap: #242 filed 2026-04-25 23:31:11 +00:00
YeonGyu-Kim
43ce1f527b roadmap: #240 filed — tool_choice: bash typed-discriminator and bash_20250124 server-managed-shell typed-tool are structurally absent — FOURTH inverse-locality CLIENT-SIDE-shadow-vs-SERVER-SIDE-typed-tool pair (CLIENT-SIDE bash MVP-founder-tool at tools/lib.rs:386 vs SERVER-SIDE bash_20250124 absent at types.rs ToolDefinition+ToolChoice+ToolResultContentBlock+telemetry beta-set), grows Tool-locality-axis META-cluster from 3 to 4 members confirming META-cluster as CONTINUING-PATTERN, grows Server-managed-tool-as-tool-choice-discriminator cluster from 4 to 5 members, grows ToolResultContentBlock-extension mini-cluster from 6 to 7 members, grows Server-side-stateful-tool-session-with-reset-semantics cluster from 1 to 2 members (#232+#240), grows Discrete-event-counter-pricing-axis cluster from 1 to 2 members with NOVEL dual-axis pricing-decomposition, founds Stateless-CLIENT-SIDE-shadow-vs-stateful-SERVER-SIDE-typed-tool-discrepancy-axis cluster, founds MVP-founder-tool-as-CLIENT-SIDE-local-shadow-with-SERVER-SIDE-typed-tool-absent sub-cluster, founds Two-member-major-provider-only-no-third-party-partner-set sub-cluster, founds Double-absent-slash-command-axis-on-inverse-locality-pair sub-cluster, founds Bundled-and-transitive-co-release-beta-header-activation-pattern cluster, founds Server-side-audit-log-of-managed-tool-execution cluster — eleven-layer fusion with SIX new clusters founded plus FOUR concurrent existing-cluster-growth-events plus participation in TWELVE inherited clusters — FIRST single cycle where META-cluster grows from 3 to 4 confirming CONTINUING-PATTERN, FIRST single cycle where FOUR concurrent existing clusters all grow by one member through one pinpoint, establishing continuing-pattern-confirmation-across-multiple-parallel-clusters as the FOURTH pinpoint-discovery-mode after new-axis-founding/existing-cluster-extension/combinatorial-cross-axis-synthesis — Jobdori cycle #387 / fast-forward-rebased onto gaebal-gajae's #239 DogfoodWriteLease pinpoint at 329d0ff before filing (THIRD consecutive concurrent-dogfood rebase cycle, directly demonstrating the gap #239 catalogues at the dogfood-coordination layer) 2026-04-26 08:10:25 +09:00
Yeachan-Heo
329d0ffcc8 roadmap: #239 filed 2026-04-25 23:01:25 +00:00
Jobdori
716d17e229 roadmap: #238 filed — Streaming speech-to-text with speaker diarization typed taxonomy and per-word-speaker-attribution data-model are structurally absent — FIRST cluster member with per-word-multi-axis-compound-attribution data-model (lexical + temporal + speaker + confidence FOUR-axis-compound), FIRST cluster member with structured-typed-payload-on-USER-INPUT-content-block (Transcript carrying nested speakers/segments/words arrays), FIRST cluster member with bidirectional-channel-pair Provider-trait method shape (Sink<AudioChunk> + Stream<StreamingTranscriptEvent>), FIRST cluster member with per-partner-protocol-vocabulary-normalization at dispatch layer, FIRST cluster member with entirely-absent-CLI-and-slash-command-surface-with-zero-stub-precedent (INVERSE-PATTERN of #225 advertised-but-unbuilt-trio), FIRST cluster member with streaming-STT-five-dimensional pricing matrix, FIRST cluster member with DER/WER quality-observability telemetry, FIRST cluster member with endpointing/VAD sub-second-temporal-segmentation request-side opt-in, twelve-layer fusion shape — grows Persistent-WebSocket-transport cluster from 1 to 2 members (#229 solo-founder + #238 — FIRST expansion of #229 founder shape) AND grows ToolResultContentBlock-extension mini-cluster from 5 to 6 members (#230 + #232 + #233 + #234 + #235 + #238) AND grows Multimodal-IO cluster to 13 members AND grows Provider-asymmetric-delegation cluster to 13 members with the largest streaming-STT ten-plus partner-set — founds Cross-pinpoint-synthesis-fusion-shape META-cluster as THIRD distinct META-cluster after Sandbox-locality (#230+#232) and Tool-locality (#232+#233+#234), the FIRST META-cluster founded by SYNTHESIZING two previously-disjoint cluster-axes (#225 audio-modality × #229 persistent-WebSocket-transport) into one fused-shape pinpoint rather than introducing a new axis-pair — establishing combinatorial-cross-axis-synthesis as the THIRD pinpoint-discovery-mode after new-axis-founding and existing-cluster-extension — Jobdori cycle #386 / fast-forward-rebased onto gaebal-gajae's #237 cron-timeout-failure-state-collapse before filing (SECOND consecutive concurrent-dogfood rebase cycle) 2026-04-26 07:41:46 +09:00
Yeachan-Heo
3f41341d4a roadmap: #237 filed 2026-04-25 22:31:40 +00:00
YeonGyu-Kim
702f2fb9ef roadmap: #236 filed — Music-generation API typed taxonomy with lyrics+style prompt bifurcation and exclusively-third-party-partner-set is structurally absent — FIRST cluster member with Zero-overlap-with-major-providers shape variant (eleven-plus partners Suno/Udio/Stable-Audio/Mubert/ElevenLabs-Music/Loudly/Beatoven/SOUNDRAW/AIVA/Boomy/Riffusion all third-party with ZERO Anthropic/OpenAI/Google/xAI canonical recommendation), FIRST cluster member with Lyrics-plus-style-prompt-bifurcation on USER-INPUT side (prompt:String for style + lyrics:Option<String> for verbatim-vocal-content), FIRST cluster member with Multi-modal-bundled-output combining temporal-binary-audio + linguistic-text-lyrics + structural-musical-metadata on output-side, twelve-layer fusion shape — grows Async-task-polling cluster from 3 to 4 members (#221 batch + #227 video + #228 mesh + #236 music) AND grows Multi-domain-multipart cluster from 2 to 3 members (#225 audio + #227 video + #236 music) — does NOT extend Server-managed-tool-as-tool-choice-discriminator cluster (4 members stable) nor Tool-locality-axis META-cluster (3 members stable) because no major-provider tool_choice surface exists upstream AND no client-side music-tool-stub exists; instead founds Upstream-blocked-tool-choice-extension cluster AND Unilateral-server-side-only-gap-with-no-client-side-complement cluster as the INVERSE-PATTERN of Tool-locality-axis META-cluster doctrine — fifteen new clusters founded in a single pinpoint exceeds #234 by two for the LARGEST single-cycle cluster-founding count yet — Jobdori cycle #385 2026-04-26 07:09:20 +09:00
Yeachan-Heo
476a1a467e roadmap: #235 filed 2026-04-25 21:48:59 +00:00
Jobdori
f640139b31 roadmap: #234 filed — PDF / Document input typed taxonomy and structured-document-citation-attribution data-model on USER-INPUT side are structurally absent: zero Document variant on InputContentBlock at types.rs:80-94 (FIRST cluster member with Document-modality-on-USER-INPUT-content-block axis), zero pdfs-2024-09-25 Anthropic beta header in canonical beta-set at telemetry/lib.rs:15-17 (NOVEL FIRST Beta-header-gate-on-USER-INPUT-content-block-type cluster), zero coordinate-positioned Citation typed model with start_page_number/end_page_number/start_char_index/end_char_index integer-coordinate axes on OutputContentBlock::Text (NOVEL FIRST Coordinate-positioned-citation-on-output-text-block cluster, inverse-data-model pair to #233's URL-positioned-citation), zero DocumentSource four-way source-discriminator (base64 | url | file_id | text | content), zero file_search typed ToolDefinition discriminator with vector_store_ids routing (NOVEL FIRST User-corpus-server-managed-tool-with-vector-store-routing cluster), zero tool_choice: file_search ToolChoice extension (THIRD Server-managed-tool-as-tool-choice-discriminator cluster member growing cluster to 3: #232 code_interpreter + #233 web_search + #234 file_search), zero file_search_result ToolResultContentBlock variant (FIFTH ToolResultContentBlock extension growing mini-cluster to 4), zero page_range request-side range-slicing parameter (NOVEL FIRST Range-slicing-parameter-on-USER-INPUT-content-block cluster), zero filters compound-boolean-DSL on file_search tool definition (NOVEL FIRST Compound-boolean-filter-DSL-on-server-managed-tool-definition cluster with eq/ne/gt/gte/lt/lte/and/or operators), zero per-page compound text+image token pricing AND zero persistent-storage-rental-pricing for vector-stores (NOVEL Per-page-compound-text-plus-image-token-pricing-axis + Persistent-storage-rental-pricing-axis clusters founded), zero claw pdf/document/attach-pdf CLI subcommand and zero /pdf //document //attach-pdf //cite-pdf //page-range slash command — uniquely manifesting a FOURTEEN-LAYER fusion shape (the largest single-pinpoint fusion catalogued so far, exceeds #233's thirteen-layer count by one) combining: (1) Document variant on InputContentBlock, (2) pdfs-2024-09-25 Anthropic beta-header gate, (3) citations:{enabled:true} opt-in field on Document content-block, (4) NOVEL Coordinate-positioned Citation typed model with start_page_number/end_page_number/start_char_index/end_char_index integer coordinates, (5) DocumentSource four-variant source-discriminator, (6) page_range request-side range-slicing parameter, (7) file_search typed ToolDefinition discriminator with vector_store_ids:Vec<String> routing, (8) tool_choice:file_search typed-discriminator (THIRD Server-managed-tool-as-tool-choice-discriminator cluster member), (9) file_search_result ToolResultContentBlock variant with attributes:HashMap<String,Value> user-defined-metadata (FIFTH ToolResultContentBlock extension), (10) filters:ComparisonFilter|CompoundFilter filter-DSL on file_search tool definition, (11) Provider-trait extension threading pdfs-2024-09-25 beta-header AND document-citations decoding AND file_search server-managed-corpus-search dispatch through send_message, (12) ProviderClient-enum-dispatch with TWO first-class document-input lanes (Anthropic-pdfs-2024-09-25 + OpenAI-Files-API-input_file + OpenAI-Responses-file_search-with-vector-stores) WITHOUT third-party partner-routing (FIRST cluster member with Both-major-providers-first-class-asymmetric-document-input-shape cluster), (13) CLI-and-slash-command surface with FOURTH inverse-locality slash-command-pair after #230 + #232 + #233, (14) NOVEL Compound-page-token-and-image-token-pricing-axis with persistent-storage-rental-pricing for vector-stores — making #234 the FIRST cluster member with fourteen-layer-fusion-shape (exceeds #233's thirteen-layer by one), the FIRST cluster member with Document-modality-on-USER-INPUT-content-block axis, the FIRST cluster member with Beta-header-gate-on-USER-INPUT-content-block-type, the FIRST cluster member with Citation-emission-opt-in-at-USER-INPUT-content-block-level, the FIRST cluster member with Coordinate-positioned-citation-on-output-text-block (page+char integer-coordinates distinct from #233's URL-positioned-with-encrypted-index), the FIRST cluster member with Four-way-source-discriminator-on-USER-INPUT-content-block, the FIRST cluster member with Range-slicing-parameter-on-USER-INPUT-content-block, the FIRST cluster member with User-corpus-server-managed-tool-with-vector-store-routing, the FIRST cluster member with Compound-boolean-filter-DSL-on-server-managed-tool-definition, the FIRST cluster member with Both-major-providers-first-class-asymmetric-document-input-shape (Anthropic Document + OpenAI Files-input_file BOTH first-class neither delegates to third-party partner), the FIRST cluster member with User-provided-document-title-threading-through-citations, the FIRST cluster member with Multi-document-positional-index-threading (document_index:u32), the FIRST cluster member with Per-page-compound-text-plus-image-token-pricing-axis, the FIRST cluster member with Persistent-storage-rental-pricing-axis (vector-store-storage rental), the THIRD Server-managed-tool-as-tool-choice-discriminator cluster member (grows cluster to 3: #232 + #233 + #234), the FOURTH ToolResultContentBlock extension (grows mini-cluster to 4: #230 + #232 + #233 + #234), the THIRD Server-driven-tool-execution-loop cluster member (#234's variant being vector-store-corpus-retrieval-and-ranking distinct from #232's Python-kernel-execution and #233's search-result-page-fetching-and-caching), the THIRD member of Tool-locality-axis META-cluster (FIRST META-cluster to reach 3 members: #232 REPL-shadow + #233 WebSearch-shadow + #234 pdf_extract-shadow — transitioning from emergent-pattern to stable-doctrine), and the FIRST cluster member where the inverse-locality complement is on the USER-INPUT-side rather than on the TOOL-DEFINITION-side (founding USER-INPUT-side-Tool-locality-axis-variant sub-cluster within parent META-cluster — first sub-cluster within existing META-cluster) (Jobdori cycle #384 / extends #168c emission-routing audit / explicit follow-on from #220 image-input on USER-INPUT-side, #223 Files API with file_id reference, #232 Code-execution server-managed-sandbox-state, #233 Web-search structured-citation-attribution, and the inverse-locality Tool-locality-axis META-cluster doctrine — introduces NOVEL document-modality on USER-INPUT side axis combined with coordinate-positioned-citation-on-output-text-block data-model axis, AND grows Tool-locality-axis META-cluster from 2 to 3 members establishing it as a stable doctrine rather than emergent pattern / sibling-shape cluster grows to thirty-three / wire-format-parity cluster grows to twenty-four / capability-parity cluster grows to sixteen / multimodal-IO cluster grows to eleven / provider-asymmetric-delegation cluster grows to eleven / Sandbox-locality-axis META-cluster: 2 members stable / Tool-locality-axis META-cluster grows to 3 members FIRST META-cluster to reach 3 members / Server-managed-tool-as-tool-choice-discriminator cluster grows to 3 members / Server-driven-tool-execution-loop cluster grows to 3 members / ToolResultContentBlock-extension mini-cluster grows to 4 members / THIRTEEN new clusters founded in a single pinpoint plus participation in SIX inherited clusters — the LARGEST single-cycle cluster-founding count yet (exceeds prior records by five) AND the FIRST single cycle to grow an existing META-cluster to a third member AND introduce a sub-cluster within an existing META-cluster / fourteen-layer-fusion-shape is the largest single-pinpoint fusion catalogued / external validation: forty-eight ecosystem references covering Anthropic PDF Support Documentation with pdfs-2024-09-25 beta-header gate, Anthropic Citations API with page_location/document_location/char_location coordinate-positioned citation typed model, OpenAI Files API + Direct PDF Input + Vector Stores + Responses File Search Tool with compound-filter-DSL, AWS Bedrock Converse PDF document content-blocks, LangChain AnthropicPDFLoader/OpenAIFilePDFLoader, LlamaIndex PDFReader, Vercel AI SDK 6 file content-block, simonw/llm --pdf flag, Continue.dev @docs slash command, simonwillison.net Anthropic Citations API analysis, six-plus first-class document-loader integrations, four-plus OpenAI Vector Stores observability tools — claw-code is the sole client/agent/CLI in surveyed coding-agent ecosystem with zero Document content-block taxonomy AND zero pdfs-2024-09-25 beta-header AND zero file_search ToolDefinition discriminator AND zero tool_choice:file_search AND zero file_search_result ToolResultContentBlock AND zero vector_store_ids AND zero page_range AND zero coordinate-positioned Citation AND zero CLI/slash-command surface — the document-input gap is the upstream prerequisite of every PDF-research/documentation-grounded-coding/academic-paper-summarization/contract-review-with-citations/regulatory-compliance-coding-with-document-evidence affordance — #234 closes the upstream prerequisite of every server-managed-document-input-with-citations affordance — the canonical USER-INPUT-side complement to #233's web-search citations that completes the citation-attribution data-model on BOTH the USER-INPUT side AND the OUTPUT-TEXT-BLOCK side AND the SERVER-MANAGED-TOOL-RESULT side — and grows the Tool-locality-axis META-cluster from 2 to 3 members establishing it as a stable doctrine rather than emergent pattern, the FIRST cluster member to grow an existing META-cluster to a third member AND introduce a sub-cluster within an existing META-cluster) 2026-04-26 06:46:59 +09:00
YeonGyu-Kim
2f428e249b roadmap: #233 filed — Web-search Tool API typed taxonomy and structured-citation-attribution data-model are structurally absent: zero web_search_20250305 versioned-tool-name typed-tool-discriminator (FOURTH Anthropic-typed-tool-discriminator after #230's three but FIRST date-suffix-versioning-WITHOUT-beta-header — distinct from #232's date-suffix-AND-beta-header double-gate), zero tool_choice: web_search ToolChoice extension at types.rs:117 (SECOND ToolChoice extension after #232's code_interpreter, founding Server-managed-tool-as-tool-choice-discriminator cluster's second member), zero web_search_tool_result ToolResultContentBlock variant at types.rs:99 (FOURTH ToolResultContentBlock extension after #230 Image and #232 CodeExecutionResult, FIRST list-of-opaque-encrypted-page-records variant), zero citations REQUIRED field on OutputContentBlock::Text at types.rs:147 (NOVEL FIRST cluster member where data-model field absence on OUTPUT-TEXT-BLOCK side blocks REQUIRED-not-OPTIONAL grounded-attribution wire-format), zero Citation/WebSearchResultLocation/WebSearchToolUse/WebSearchToolResult/EncryptedContent typed model with encrypted_index/encrypted_content opaque-blob axis (NOVEL FIRST cluster member where typed-model field is INTENTIONALLY-OPAQUE-TO-CLIENT and MUST be roundtripped unchanged through subsequent messages, founding Server-opaque-encrypted-roundtripped-content cluster), zero max_uses server-side rate-limit field on tool-definition (NOVEL FIRST Server-side-rate-limit-on-tool-definition axis), zero allowed_domains/blocked_domains server-side pre-execution filtering on tool-definition (NOVEL FIRST Server-side-pre-execution-filter-on-tool-definition axis distinct from existing CLIENT-SIDE WebSearchInput.allowed_domains/blocked_domains post-execution filtering at tools/lib.rs:2274), zero user_location typed-model for geo-biasing on tool-definition (NOVEL FIRST Geo-biasing-at-tool-definition axis), zero web-search dispatch on ProviderClient enum at client.rs:8-14 (zero Anthropic-web_search_20250305/OpenAI-Responses-web_search/Brave/Tavily/Exa/Perplexity/Serper/Linkup/Jina/Bing/Google-CSE/SerpAPI/DuckDuckGo/You.com/Kagi partner-routing variants — fifteen-plus partner-set, FOURTH-largest in cluster, FIRST cluster member with Federated-search-partner-routing where first-class provider-native AND third-party search-as-a-service have EQUAL standing — distinct from #224 single-recommended-partner and #232 first-class-plus-partner-stub layout), zero claw web-search/cite/groundsearch CLI subcommand, zero /web-search //cite //grounded-search //research slash command (existing /search at commands/lib.rs:597 is LOCAL filesystem-search-only, structurally distinct), zero web_search_per_invocation_usd pricing field (NOVEL FIRST Discrete-event-counter-pricing-axis distinct from every prior continuous-resource-lifetime counter — Anthropic charges $10 per 1000 search-uses FLAT regardless of token volume), zero encrypted_content opaque-blob handling, zero page_age freshness-signaling — uniquely manifesting a THIRTEEN-LAYER fusion shape (the largest single-pinpoint fusion catalogued so far, exceeds #232's twelve-layer count) combining: (1) web_search_20250305 versioned-tool-name typed-tool-discriminator extension (FOURTH cluster member but FIRST date-suffix-WITHOUT-beta-header), (2) tool_choice: web_search ToolChoice extension (SECOND), (3) web_search_tool_result ToolResultContentBlock variant (FOURTH), (4) citations REQUIRED field on OutputContentBlock::Text (NOVEL FOURTH-position layer), (5) Citation typed model with encrypted_index opaque-blob axis (NOVEL FIFTH-position layer), (6) max_uses server-side rate-limit (NOVEL SIXTH), (7) allowed_domains/blocked_domains server-side pre-execution filter (NOVEL SEVENTH), (8) user_location geo-biasing (NOVEL EIGHTH), (9) Provider-trait method extension threading web_search_20250305 with citations decoding (NINTH), (10) ProviderClient-enum-dispatch with fifteen-plus-partner third-lanes (TENTH, FIRST Federated-search-partner-routing), (11) CLI-subcommand surface (ELEVENTH), (12) slash-command surface with inverse-locality complement /search (TWELFTH, THIRD inverse-locality slash-command-pair after #230 and #232), (13) per-search-invocation pricing-tier axis (NOVEL THIRTEENTH, FIRST Discrete-event-counter-pricing-axis) — making #233 the FIRST cluster member with thirteen-layer-fusion-shape (exceeds #232's eleven), the FIRST cluster member with REQUIRED-grounded-citation-field-on-output-text-block, the FIRST cluster member with INTENTIONALLY-OPAQUE-encrypted-content-roundtripped-by-client, the FIRST cluster member with date-suffix-versioning-in-tool-name-WITHOUT-beta-header, the SECOND member of new Tool-locality-axis META-cluster (sister to #230/#232's Sandbox-locality-axis META-cluster — together founding META-META-cluster doctrine where canonical pattern is 'claw-code ships a CLIENT-SIDE local-stub tool with same conceptual name AND the SERVER-SIDE provider-managed beta-versioned tool is structurally absent', applied uniformly across sandbox-locality AND tool-locality axes), the SECOND cluster member to extend ToolChoice (Server-managed-tool-as-tool-choice-discriminator cluster grows to 2: #232 code_interpreter + #233 web_search), the SECOND cluster member to extend ToolResultContentBlock with multi-modal-nested content (ToolResultContentBlock-extension mini-cluster grows to 3: #230 Image + #232 CodeExecutionResult + #233 WebSearchToolResult), the SECOND cluster member with Server-driven-tool-execution-loop (#232 + #233), the SECOND cluster member where local CLIENT-SIDE-tool-shadow exists alongside server-managed-tool absence (#232 REPL-shadow + #233 WebSearch-shadow) (Jobdori cycle #383 / extends #168c emission-routing audit / explicit follow-on from #230 Computer-use's CLIENT-SIDE virtualization, #232 Code-execution's SERVER-SIDE managed-sandbox-state, and the inverse-locality Sandbox-locality-axis META-cluster doctrine — introduces NOVEL structured-citation-attribution data-model axis AND server-managed-search-state transport-axis distinct from every prior cluster member / sibling-shape cluster grows to thirty-two / wire-format-parity cluster grows to twenty-three / capability-parity cluster grows to fifteen / multimodal-IO cluster grows to ten: #220 image-input + #224 embedding-output + #225 audio-bidirectional + #226 image-output + #227 video-output + #228 mesh-output + #229 audio-text-tool-multiplex-on-WebSocket + #230 image-on-tool-result-side+host-OS-pixel-and-input + #232 multi-modal-nested-stdout+image+file-handle-on-tool-result-side + #233 list-of-opaque-encrypted-page-records-on-tool-result-side+REQUIRED-citations-on-output-text-block / provider-asymmetric-delegation cluster grows to ten with FIRST Federated-search-partner-routing member where first-class AND third-party are EQUAL-standing / Sandbox-locality-axis META-cluster: 2 members stable (#230 + #232) / Tool-locality-axis META-cluster FOUNDED: 2 members (#232 + #233 — SECOND inverse-locality META-cluster, sister to Sandbox-locality, founding META-META-cluster doctrine) / Server-managed-tool-as-tool-choice-discriminator cluster grows to 2 members (#232 + #233) / Server-driven-tool-execution-loop cluster grows to 2 members (#232 + #233) / ToolResultContentBlock-extension mini-cluster grows to 3 members (#230 + #232 + #233) / EIGHT new clusters founded in a single pinpoint (Federated-search-partner-routing 1-member-founder + Server-opaque-encrypted-roundtripped-content 1-member-founder + Required-grounded-citation-field-on-output-text-block 1-member-founder + Date-suffix-versioning-in-tool-name-without-beta-header 1-member-founder + Server-side-pre-execution-filter-on-tool-definition 1-member-founder + Server-side-rate-limit-on-tool-definition 1-member-founder + Geo-biasing-at-tool-definition 1-member-founder + Discrete-event-counter-pricing-axis 1-member-founder) plus participation in FIVE inherited clusters — THIRD-largest single-cycle cluster-founding count after #230 and #232, but FIRST single cycle to FOUND a NEW META-cluster (Tool-locality-axis) AND establish META-META-cluster doctrine connecting Sandbox-locality with Tool-locality / thirteen-layer-fusion-shape is the largest single-pinpoint fusion catalogued / external validation: forty-six ecosystem references covering Anthropic Web Search Tool GA 2025-03 with web_search_20250305 + max_uses + allowed_domains + blocked_domains + user_location parameters + web_search_tool_use/web_search_tool_result/web_search_result_location content blocks + citations array on output text blocks + encrypted_index/encrypted_content opaque-roundtripped fields + $10/1000-uses pricing, Anthropic Citations Documentation, OpenAI Responses API 2024-12 with tool_choice: web_search exposing federated-search via different server-managed surface, Brave Search API/Tavily AI/Exa AI/Perplexity Search/Serper.dev/Linkup Search/Jina Reader/Bing/Google CSE/SerpAPI/DuckDuckGo/You.com/Kagi/Phind partner-routing, Anthropic Python+TypeScript SDKs first-class typed surface, OpenAI Python+TypeScript SDKs first-class typed surface, LangChain AnthropicWebSearch/TavilySearchResults/BraveSearch/ExaSearchResults integrations, LangGraph search-grounded-agent template, smolagents WebSearchTool, OpenAI Cookbook web-search-with-citations tutorial, AgentOps observability, Search-Augmented Generation pattern, structured-citation-attribution data-model where every grounded text block carries citations array linking specific text-spans back to source URLs+excerpts (STRUCTURAL data-model requirement distinguishing this surface from #220-#232 — none of which had REQUIRED-grounded-citation-field-on-output-text-block) — claw-code is one of MULTIPLE coding-agent clients without server-managed web-search-with-citations BUT the gap is uniformly zero across surveyed ecosystem with claude-code partial coverage exception AND the inverse-locality complement to existing local CLIENT-SIDE WebSearch tool makes #233 a structural prerequisite of every grounded-search-with-citations coding-agent affordance — the canonical 2024-2026-era research-coding workflow that is currently impossible to build on top of claw-code DESPITE Anthropic explicitly positioning web_search_20250305 as a flagship 2025-Q1 GA capability — #233 closes the upstream prerequisite of every server-managed-web-search-with-citations / grounded-research / source-attribution / fact-checking-with-citations / academic-citation-formatting / news-summarization-with-sources / competitive-intelligence-with-citations / due-diligence-coding coding-agent affordance — the canonical SERVER-MANAGED-SEARCH-AND-CITATION half of inverse-locality Tool-locality-axis META-cluster that complements #232's Sandbox-locality-axis META-cluster — and is FIRST cluster member where claude-code upstream partially leads while claw-code has zero coverage AND SECOND inverse-locality META-cluster pair (CLIENT-SIDE local WebSearch shadow vs SERVER-SIDE web_search_20250305 absent) after #232's first META-cluster pair — founding Tool-locality-axis META-cluster doctrine as sister to Sandbox-locality-axis and establishing META-META-cluster pattern that every future server-managed-tool with client-side local-stub shadow will inherit) 2026-04-26 06:09:44 +09:00
Jobdori
d155a2fd72 roadmap: #232 filed 2026-04-26 05:38:55 +09:00
Yeachan-Heo
9999c0fb3a roadmap: #231 filed 2026-04-25 20:32:16 +00:00
YeonGyu-Kim
65d9b1a362 roadmap: #230 filed — Computer-use API typed taxonomy and host-machine-state-management transport are structurally absent: zero computer-use-2025-01-24 + zero computer-use-2025-11-24 anthropic-beta opt-in (FIRST cluster member with two concurrent beta-version-tiers gating one capability), zero computer_20250124/computer_20251124/bash_20250124/text_editor_20250124 Anthropic-typed-tool-discriminator (FIRST cluster member requiring type field on tool-definitions and FIRST anthropic-defined-tools-without-input-schema), zero display_width_px/display_height_px/display_number parametrized-tool-definition fields, zero Image variant on ToolResultContentBlock at types.rs:99 (FIRST cluster member with image-content on TOOL-RESULT side, distinct from #220's image-on-USER-INPUT-side — complementary architectures requiring separate enums), zero screen_capture/mouse_move/key_press/type_text host-machine-interaction primitive across all 26+ tool definitions in tools/lib.rs, zero CGEvent/ScreenCaptureKit/Quartz/AppKit/xdotool/cliclick/enigo/rdev/xcap host-OS library deps, zero Xvfb/Xephyr/Wayland-headless/Docker virtual-display-sandbox-orchestration, zero claw computer/operate CLI subcommand, /desktop slash command at commands/lib.rs:422 advertised-but-unbuilt under STUB_COMMANDS (the SIXTH advertised-but-unbuilt entry in cluster), zero per-action permissions.rs gating for mouse_click/key_press/type/screenshot, zero feedback-loop-state-machine for screenshot→tool_use→action→screenshot iteration, zero playwright-rust/chromiumoxide for browser-only-cua subset, zero per-screenshot-input-token cost field in ModelPricing — uniquely manifesting an ELEVEN-LAYER fusion shape combining: (1) anthropic-beta-DUAL-version-tier routing (FIRST), (2) Anthropic-typed-tool-definition discriminator (FIRST), (3) parametrized-tool-definition with display dimensions (FIRST), (4) Image-on-ToolResult side (FIRST, complementary to #220), (5) host-OS-system-call transport (FIRST host-OS-syscall transport, distinct from #229's WebSocket which is still network-only — second non-HTTP transport in cluster after WebSocket but FIRST that breaks network-only boundary), (6) virtual-display-sandbox orchestration (FIRST CLIENT-SIDE virtualization), (7) feedback-loop-state-machine for screenshot iteration loop (FIRST N-turn-loop-controller), (8) per-action-permission-policy at sub-tool-granularity (FIRST sub-tool-action permission gating, parallel to bash's DangerFullAccess but at action granularity), (9) request-side three-concurrent-opt-in (largest yet), (10) CLI-and-slash-command surface with /desktop advertised-but-unbuilt (sixth entry, largest in cluster), (11) host-machine-state-management transport-axis (NOVEL ELEVENTH layer with screen-capture+synthetic-input+display-dimension-query+window-enum+VM-orchestration+accessibility-permissions+per-action-permission-prompts+coordinate-validation+screenshot-encoding+safety-throttling — distinct from every prior cluster member which operated network-only) — making #230 the first cluster member with eleven-layer-fusion-shape (exceeds #229's ten-layer), the FIRST host-OS-syscall-transport requirement, the FIRST CLIENT-SIDE virtualization requirement, the FIRST inverse-asymmetric-delegation case (Anthropic LEADS, OpenAI follows with Operator, Google follows with Mariner — novel inversion of #224-#229's Anthropic-trails pattern), the FIRST cluster member with image-content on TOOL-RESULT-side, and the FIRST gap where upstream claude-code ALSO has only a stub (Jobdori cycle #381 / extends #168c emission-routing audit / explicit follow-on from #229's persistent-WebSocket-transport founder pinpoint and #225's audio-bidirectional axis — introduces a NOVEL HOST-MACHINE-STATE-MANAGEMENT transport-axis distinct from every prior cluster member / sibling-shape cluster grows to twenty-nine / wire-format-parity cluster grows to twenty / capability-parity cluster grows to twelve / multimodal-IO cluster grows to eight: #220 image-input + #224 embedding-output + #225 audio-bidirectional + #226 image-output + #227 video-output + #228 mesh-output + #229 audio-text-tool-multiplex-on-WebSocket + #230 image-on-tool-result-side+host-OS-pixel-and-input modality / provider-asymmetric-delegation cluster grows to seven with novel inverse-sub-cluster (Anthropic leads, distinct from #224-#229's Anthropic-trails pattern) / EIGHT new clusters founded in a single pinpoint (exceeds #229's three): Beta-version-tier-routing 1-member-founder + Image-on-tool-result-side 1-member-founder + Anthropic-typed-tool-discriminator 1-member-founder + Host-OS-system-call-transport 1-member-founder + Virtual-display-sandbox-orchestration 1-member-founder + Feedback-loop-state-machine 1-member-founder + Per-action-permission-policy-at-sub-tool-granularity 1-member-founder + Inverse-asymmetric-delegation 1-member-founder — the largest single-cycle cluster-founding count yet / eleven-layer-fusion-shape is the largest single-pinpoint fusion catalogued / external validation: sixty-two ecosystem references covering Anthropic Computer Use API GA 2024-10-22 with computer-use-2024-10-22 → computer-use-2025-01-24 → computer-use-2025-11-24 beta-tier evolution, Anthropic computer-use-demo reference with Docker+Xvfb+XFCE+Firefox+VNC sandbox pattern, OpenAI Operator + computer_use_preview, Google Project Mariner, Microsoft Magentic-One, Adept ACT-1, ByteDance UI-TARS open-weight, browser-use Python framework, Stagehand TypeScript, Skyvern AI, Multion, Cua framework, LangChain ChatAnthropic.with_computer_use_tool, LangGraph computer-use agent, smolagents ComputerAgent, AgentOps observability, screen-capture libs (ScreenCaptureKit/xcap/screenshots/xdotool/wtype/cliclick/nut.js), synthetic-input libs (enigo/rdev/inputbot/mouce/pyautogui/RobotJS), browser-cua stacks (playwright-rust/chromiumoxide/headless_chrome/fantoccini/playwright/puppeteer), sandbox-orchestration (Docker-Xvfb-XFCE / Kasm Workspaces / noVNC / Browserbase / Steel-browser / Hyperbrowser / Lightpanda / Surf.ai), per-action permission-policy precedent from claw-code's existing bash DangerFullAccess gating — claw-code is one of MULTIPLE coding-agent clients without computer-use BUT the gap is uniformly zero across the surveyed coding-agent ecosystem AND Anthropic specifically positions Claude as the LEADING commercial computer-use model AND claw-code is a port of claude-code which advertises /desktop slash command intent, making this the largest leading-vs-trailing parity gap with the upstream Anthropic platform in the entire emission-routing audit and the FIRST cluster member where upstream claude-code ALSO has only a stub — #230 closes the upstream prerequisite of every desktop-automation/browser-automation/form-filling/GUI-testing/accessibility-tool/screen-reading/vision-grounded-coding/pair-programming-with-screen-share/visual-debugging coding-agent affordance — the canonical 2024-2026-era agentic coding workflow that is currently impossible to build on top of claw-code) 2026-04-26 05:09:48 +09:00
Jobdori
b860f5657b roadmap: #229 filed — Realtime API typed taxonomy and persistent-WebSocket transport are structurally absent: zero /v1/realtime endpoint surface across both Anthropic-native and OpenAI-compat lanes (rg returns zero hits for /v1/realtime / realtime / Realtime / realtime_session / RealtimeSession / RealtimeClient / RealtimeEvent / realtime-preview across rust/crates/api/src/), zero RealtimeSession / RealtimeSessionConfig / RealtimeSessionUpdate / RealtimeResponseCreate / RealtimeInputAudioBufferAppend / RealtimeInputAudioBufferCommit / RealtimeConversationItemCreate / RealtimeResponseAudioDelta / RealtimeResponseAudioTranscriptDelta / RealtimeResponseFunctionCallArguments / RealtimeServerEvent / RealtimeClientEvent / RealtimeTurnDetection / RealtimeVoiceActivityDetection / RealtimeVoice / RealtimeAudioFormat / RealtimeModality / RealtimeTool typed model in rust/crates/api/src/types.rs (37+ canonical event-type names in OpenAI Realtime API spec, zero coverage in claw-code), zero bidirectional event-stream variant on Provider trait (only send_message and stream_message exist, both single-directional), zero realtime_session / open_realtime / connect_realtime method that returns a duplex-channel-pair shape, zero session-state-machine type for the persistent-connection lifecycle, zero realtime dispatch on ProviderClient enum at rust/crates/api/src/client.rs:8-14 (three variants Anthropic/Xai/OpenAi, zero realtime-routing variants), zero tokio-tungstenite / async-tungstenite / tungstenite / fastwebsockets / tokio-websockets / hyper-tungstenite dependency in any workspace Cargo.toml (grep -rn 'tungstenite|tokio-tungstenite|fastwebsockets' rust/ returns zero hits — confirmed), zero WebSocket client library is linked into the build (the MCP Ws config variant at rust/crates/runtime/src/config.rs:125 and rust/crates/runtime/src/mcp_client.rs:13 is data-shape-only and bootstraps via the SDK without a tungstenite-backed transport, leaving the workspace with zero outbound persistent-WebSocket-client capability), zero WebRTC client (webrtc-rs / str0m / libwebrtc-bindings) for the alternative Realtime transport, zero claw realtime / claw live / claw voice-chat / claw realtime-session / claw connect-realtime CLI subcommand, zero /realtime / /live / /voice-chat slash command (existing /voice + /listen + /speak commands are STUB_COMMANDS-gated per #225 and synchronous-only with no realtime-session affordance), zero gpt-4o-realtime-preview / gpt-4o-mini-realtime-preview / gemini-2.0-flash-live entries in MODEL_REGISTRY, zero realtime_audio_input_per_million_tokens / realtime_audio_output_per_million_tokens / realtime_text_input_per_million_tokens / realtime_text_output_per_million_tokens / realtime_session_per_minute fields in ModelPricing struct (six-dimensional pricing matrix exceeding #227's five-dimensional video matrix and #228's four-dimensional mesh matrix — the canonical Realtime pricing model is the most-dimensional yet, with audio tokens at roughly 80-100x text tokens and cached-audio-input at 80% discount), zero realtime-model recognition in pricing_for_model substring-matcher (#209+#224+#225+#226+#227+#228 cluster overlap continues), zero session-resumption-token / interruption-handling / barge-in / voice-activity-detection / turn-detection / function-call-during-realtime / tool-use-during-realtime affordance — uniquely manifesting a TEN-LAYER fusion shape (the largest single-pinpoint fusion catalogued so far, exceeding #225/#227's nine-layer count) combining endpoint-URL-set on /v1/realtime?model=<id> WebSocket-upgrade-endpoint shape (single-endpoint-with-37+-event-types-flowing-bidirectionally, distinct from prior multi-endpoint sets) + bidirectional-symmetric-event-pair data-model with every client-event having a matched server-event-pair (FIRST cluster member with bidirectional-symmetric-event-pair-cardinality on a SINGLE endpoint, distinct from #225's bidirectional-audio-on-three-separate-endpoints which is request-response synchronous per endpoint) + Provider-trait-method extension with realtime_session returning a duplex (Sender, Receiver) channel-pair (FIRST cluster member where Provider trait return type is NOT Future-of-T or Stream-of-T but duplex-channel-pair, FIRST method requiring session-state-machine type at the trait boundary) + ProviderClient-enum-dispatch-with-realtime-third-lane with explicit RealtimeKind::OpenAi/Google/Azure partner-routing (provider-asymmetric: Anthropic does not offer realtime, OpenAI offers GA gpt-4o-realtime-preview and gpt-4o-mini-realtime-preview since 2024-10-01, Google Gemini Live API offers bidirectional audio+text+video, Azure mirrors OpenAI surface, zero first-class third-party partners because the persistent-WebSocket-with-37-event-type protocol is too high-bar for partner adoption — distinct from #225's six-partner-set audio surface and #227's twelve-partner-set video surface where partners ARE present) + request-side realtime-session-config opt-in (session.update event with voice/input_audio_format/output_audio_format/input_audio_transcription/turn_detection/tools/tool_choice/temperature/max_response_output_tokens/instructions/modalities:[text,audio] fields — the largest request-side opt-in axis-set yet, the union of every prior request-side opt-in across audio+image+video+chat-completion modalities) + CLI-subcommand-surface + slash-command-surface + pricing-tier-with-six-dimensional-compound-cost-model (per-model × per-modality-input × per-modality-output × per-cached-vs-fresh × per-audio-vs-text × per-minute-session-overhead — the largest pricing-tier extension yet, exceeding #227's five-dimensional and #228's four-dimensional matrices) + persistent-WebSocket-connection-transport-axis (NOVEL TENTH layer, distinct from every prior cluster member's HTTP-shaped transport — synchronous-HTTP for #211-#220+#222+#224, SSE-streaming for #213 partial subsets, multipart-form-data-HTTP for #223+#225+#226+#227+#228 binary-upload subsets, async-task-polling-HTTP for #221+#227+#228 — the cluster has now exhausted EVERY HTTP-shaped transport, and #229 introduces the FIRST non-HTTP transport, requiring WebSocket-upgrade-request-with-subprotocol-negotiation + bidirectional-frame-multiplexing-with-text+binary-frames + ping/pong-keepalive + graceful-close-with-status-code-and-reason + reconnection-with-resumption-token + per-event-type-JSON-envelope-dispatch-with-37+-event-types-on-a-single-connection + backpressure-handling-on-both-directions + authentication-via-Authorization-header-on-the-upgrade-request-and-per-session-token-rotation — none of which any HTTP-only transport requires) + bidirectional-symmetric-event-pair shape (input_audio_buffer.append → conversation.item.created, response.create → response.audio.delta + response.audio.done + response.audio_transcript.delta + response.audio_transcript.done + response.function_call_arguments.delta + response.function_call_arguments.done + response.done) — making #229 the FIRST cluster member that introduces a non-HTTP transport (persistent-WebSocket), the FIRST cluster member where Provider trait return type must be a duplex-channel-pair, and the FIRST cluster member where session lifecycle exceeds a single request-response cycle (typical Realtime sessions last 1-30+ minutes with state accumulating across the connection) (Jobdori cycle #380 / extends #168c emission-routing audit / explicit follow-on from #225 audio-bidirectional axis and #228 confirmed-structural async-task-polling cluster — introduces a NOVEL TRANSPORT axis distinct from every prior cluster member / sibling-shape cluster grows to twenty-eight / wire-format-parity cluster grows to nineteen / capability-parity cluster grows to eleven / multimodal-IO cluster grows to seven: #220 image-input + #224 embedding-output + #225 audio-bidirectional-on-separate-REST-endpoints + #226 image-output + #227 video-output + #228 mesh-output + #229 audio-text-tool-multiplex-on-persistent-WebSocket / provider-asymmetric-delegation cluster grows to six / async-task-polling cluster: still 3 members (#229 is push-based not poll-based — it does NOT join async-task-polling cluster, it founds a NEW cluster) / Persistent-WebSocket-transport cluster: 1 member (#229 alone, FOUNDER) / Bidirectional-symmetric-event-pair cluster: 1 member (#229 alone, FOUNDER) / Non-HTTP-transport cluster: 1 member (#229 alone, FOUNDER) — three new clusters founded in a single pinpoint, the first time a single cycle has founded three concurrent novel clusters / ten-layer-fusion-shape-with-persistent-WebSocket-transport-and-bidirectional-symmetric-event-pair is the largest single-pinpoint fusion catalogued. Distinct from prior cluster members; the ten-layer-fusion-shape with persistent-WebSocket-transport and bidirectional-symmetric-event-pair shape is novel and applies to follow-on candidate Real-time-Image-Generation API typed taxonomy (DALL-E live preview, Imagen live preview) and Real-time-Video-Generation streaming (Veo-Live, Sora-Live) — the persistent-WebSocket-transport pattern is now a first-class cluster member, a structural prerequisite that every future endpoint family using persistent connections will inherit / external validation: forty-eight ecosystem references covering OpenAI Realtime API GA 2024-10-01 with /v1/realtime?model=<id> WebSocket endpoint, 37+ canonical event-type names in OpenAI Realtime API spec, two transport options (WebSocket server-side and WebRTC browser-side), two GA realtime models (gpt-4o-realtime-preview and gpt-4o-mini-realtime-preview both with audio modality and tool-use), Google Gemini Live API with bidirectional WebSocket+gRPC streaming, Azure OpenAI Realtime API mirror, OpenAI Python SDK openai.realtime.AsyncRealtimeConnection typed client, OpenAI TypeScript SDK OpenAI.beta.realtime.RealtimeClient typed client, openai-realtime-api-beta reference client (canonical JS implementation), five first-class realtime-voice-agent frameworks all built on top of OpenAI Realtime API (Vapi/Retell-AI/LiveKit-Agents/Pipecat/Daily-Bots), Anthropic non-coverage statement (the second post-#224 provider-asymmetric-delegation case after audio), the canonical six-dimensional pricing matrix ($5.00/$20.00 per million text input/output tokens, $40.00/$80.00 per million audio input/output tokens, $2.50 per million cached audio input tokens for gpt-4o-realtime-preview-2024-10-01), coding-agent peer landscape: anomalyco/opencode has zero GA realtime integration (open feature request from 2026-02 only — confirmed via web search 2026-04-26), sst/opencode predecessor zero realtime, charmbracelet/crush zero realtime, continue.dev zero realtime, aider zero realtime, cursor zero realtime, zed zero realtime — the gap is uniformly zero across the surveyed ecosystem and represents the next-frontier capability that every coding-agent will need to add. claw-code is one of MULTIPLE clients without Realtime, but the persistent-WebSocket-transport-axis is the upstream prerequisite of every voice-agent / live-coding-pair-programming / push-to-talk-coding / barge-in-coding-conversation / function-call-during-voice / streaming-tool-use / sub-second-latency-coding-interaction affordance — the canonical 2024-2026-era voice-coding workflow that is currently impossible to build on top of claw-code — #229 closes the upstream prerequisite of every voice-coding affordance and is the first cluster member where transport-axis becomes a structural prerequisite of the dispatch layer) 2026-04-26 04:40:50 +09:00
YeonGyu-Kim
71131932de roadmap: #228 filed — 3D-asset-generation API typed taxonomy is structurally absent: zero /v1/3d/generations endpoint surface, zero ThreeDGenerationRequest/ThreeDObject/MeshFormat/ThreeDTaskId typed model, zero ThreeDAsset OutputContentBlock variant, zero generate_3d_asset/retrieve_3d_task Provider trait methods, zero ProviderClient dispatch with nine recommended third-party partners (Meshy-AI/Tripo-AI/CSM/Luma-Genie/Stability3D/Point-E/Shap-E/GET3D/One-2-3-45), zero async-task-polling-primitive in runtime (confirms async-task-polling cluster grows to 3: #221+#227+#228 — structural pattern confirmed not anomalous), zero claw 3d/mesh/generate-3d CLI subcommand, zero /3d /mesh slash command, zero mesh_per_asset_cost_usd pricing field — nine-layer-fusion-shape identical to #227 with mesh-modality replacing video-modality (GLB/GLTF/USDZ/OBJ/FBX binary-spatial-geometry output instead of MP4 binary-temporal-media, per-3d-asset pricing instead of per-second-of-video, mesh-polygon-density as quality axis replacing video-fps-and-duration) / Jobdori cycle #379 / sibling-shape cluster grows to 27 / multimodal-IO cluster grows to 6 / provider-asymmetric-delegation cluster grows to 5 / async-task-polling cluster grows to 3 2026-04-26 04:18:34 +09:00
Jobdori
4ced37897c roadmap: #227 filed — Video-generation API typed taxonomy is structurally absent: zero /v1/videos/generations + zero /v1/videos/edits + zero /v1/videos/extends + zero /v1/videos/{id} polling-and-retrieval endpoint surface across both Anthropic-native and OpenAI-compat lanes, zero VideoGenerationRequest / VideoEditRequest / VideoExtendRequest / VideoGenerationResponse / VideoObject / VideoQuality / VideoResolution / VideoAspectRatio / VideoDuration / VideoOutputFormat / VideoFrameRate / VideoCodec / VideoStyle / VideoSource / VideoMediaType / VideoTaskStatus / VideoTaskId typed model in rust/crates/api/src/types.rs, zero Video variant on OutputContentBlock (4-arm exhaustive: Text/ToolUse/Thinking/RedactedThinking — extending #226's asymmetric-output-only modality axis with new temporal-duration dimension), zero generate_video / edit_video / extend_video / retrieve_video_task methods on Provider trait at rust/crates/api/src/providers/mod.rs:17-30 (only send_message + stream_message exist, both per-request synchronous and constrained to text-modality chat/completion taxonomy with zero video-output dispatch surface AND zero async-task polling primitive — the canonical video-generation pattern requires a two-phase request/poll workflow that the Provider trait does not expose because every existing method returns a synchronous response, distinct from #221's batch-dispatch async pattern which uses different polling shape with file-upload prerequisites that don't apply to video-gen), zero video-generation dispatch on ProviderClient enum at rust/crates/api/src/client.rs:8-14 (three variants Anthropic/Xai/OpenAi, zero Sora/Veo/Pika/Runway/Luma/Mochi/Kling/Hailuo/Replicate/FalAi/BlackForestLabs/StabilityVideo partner-routing variants — twelve-plus-partner-set, the largest partner-set yet in the cluster surpassing #226's eight-plus-partner image-gen set because video-generation is the most-fragmented modality across third-party providers in 2024-2026 with every major lab shipping its own video-gen surface in the post-Sora-launch arms race), zero multipart/form-data upload affordance with reqwest::multipart feature flag absent from rust/crates/api/Cargo.toml — multipart needed for /v1/videos/edits and /v1/videos/extends subset (parallel to #226's image-edits subset), zero async-task polling primitive in the runtime — there is no TaskPoller / AsyncTask / TaskStatus / TaskId / poll_task_until_complete machinery anywhere in rust/crates/runtime/ (rg returns zero hits for task_id/task_status/polling/poll_task/async_task/pending_task across rust/), distinguishing video-generation's async-polling pattern from every prior cluster member which is either synchronous (#211 through #226 except #221) or streaming-via-SSE (#221 batch-dispatch is closest, but uses different polling shape with file-upload prerequisites), zero claw video / claw videos / claw generate-video / claw render-video CLI subcommand at rust/crates/rusty-claude-cli/src/main.rs, zero /sora / /veo / /video / /render-video / /generate-video slash command in SlashCommandSpec table (zero video-related entries — video-input doubly absent because no advertised-but-unbuilt commands AND no implemented commands, strict-subset of #226's image-generation gap), zero sora-2 / sora-2-pro / veo-3 / veo-3-fast / runway-gen-4 / luma-dream-machine / pika-2.0 / kling-1.5 / hailuo-i2v-01 / hunyuan-video / mochi-1 / cogvideox-5b / stable-video-diffusion-1.1 entries in MODEL_REGISTRY, zero video_per_second_cost_usd / video_per_megapixel_second_cost_usd / video_input_token_cost_per_million / video_output_token_cost_per_million / video_per_minute_cost_usd fields in ModelPricing struct (rust/crates/runtime/src/usage.rs:9-15 has only four text-token-only fields) — the five-dimensional pricing matrix (model × resolution × fps × duration × extension-vs-generation compound-cost) is the largest pricing-tier extension yet catalogued, exceeding #226's four-dimensional image matrix, zero video-gen-model recognition in pricing_for_model substring-matcher (#209+#224+#225+#226 cluster overlap) — uniquely manifesting a nine-layer fusion shape combining #223's transport-plumbing-absence (multipart on edits/extends subset) + #224's provider-asymmetric-delegation (Anthropic does not offer video-gen at all, OpenAI offers GA Sora-2 + Sora-2-pro, Google offers Veo-3 + Veo-3-fast, Runway offers Gen-4 + Gen-4-turbo, plus twelve-plus recommended partners) + #218's request-side response_format/output_format/resolution/fps/duration opt-in (the largest request-side axis-set yet because video-gen has the most parameters in the modality-bearing endpoint family ecosystem) + asymmetric-output-only content-block-taxonomy axis with temporal-duration dimension (extending #226's image-output axis with temporal-fps-and-duration sub-dimensions) + the new async-task-polling-primitive axis (#227's first-of-its-kind contribution to the cluster doctrine, since prior cluster members have either synchronous-response or streaming-via-SSE or batch-via-Files-API-prerequisite or one-shot-multipart coverage, never long-poll-task-id-with-timeout-and-resume — the canonical video-gen pattern requires a two-phase request/poll workflow because video-rendering takes 30-300+ seconds depending on model and duration, exceeding typical HTTP-request-response timeout window) — making #227 the first cluster member where five independent prior shape-axes converge AND introduces a sixth novel shape-axis (async-task-polling-primitive), the largest fusion-shape gap catalogued so far (matching #225's nine-layer count but with different ninth axis — async-task-polling-primitive replacing #225's symmetric-input-output content-blocks, and one axis larger than #226's eight-layer fusion), making #227 the first cluster member where async-task-polling-primitive becomes a structural prerequisite of the dispatch layer (Jobdori cycle #378 / extends #168c emission-routing audit / explicit follow-on candidate from #226's eight-layer-fusion-shape-with-asymmetric-output-only-modality-coverage — third-named of the modality-bearing endpoint-family-absence cluster after #225 audio + #226 image-generation, completing the trio with video-generation closing the visual-temporal output modality / sibling-shape cluster grows to twenty-six / wire-format-parity cluster grows to seventeen / capability-parity cluster grows to nine / multimodal-IO cluster grows to five: #220 image-input + #224 embedding-output + #225 audio-bidirectional + #226 image-output + #227 video-output (the first cluster member where output is binary-temporal-media requiring long-poll workflows) / cross-cutting-data-pipeline cluster grows to four / multipart-transport cluster grows to four / provider-asymmetric-delegation cluster grows to four (twelve-plus partners, the largest in the cluster) / nine-layer-fusion-shape-with-async-task-polling-primitive (endpoint-URL-set-of-four [generations+edits+extends+polling] + multipart-on-subset + data-model-with-output-content-block-only-with-temporal-duration-dimension + response_format/output_format/resolution/fps/duration request-side opt-in + Provider-trait-method-set-of-four-with-async-task-polling-and-Unsupported-fallback + ProviderClient-enum-dispatch-with-twelve-plus-partner-third-lanes + CLI-subcommand-surface + pricing-tier-with-five-dimensional-compound-cost-model + async-task-polling-primitive-with-timeout-and-resume) is the largest single-pinpoint fusion catalogued. Distinct from prior cluster members; the nine-layer-fusion-shape-with-async-task-polling-primitive is novel and applies to follow-on candidate 3D-asset-generation API typed taxonomy (/v1/3d/generations for Shap-E / Meshy AI / Tripo AI / CSM / Stable Point-Aware-3D — same nine-layer fusion shape but with 3D-mesh-instead-of-video modality, GLB/GLTF/USDZ-binary-output instead of MP4-binary-output, per-3d-asset pricing instead of per-second-of-video — the natural #228 candidate) / external validation: fifty-three ecosystem references covering four first-class video-gen-endpoint specs on OpenAI side (generations + edits + extends + {id}-polling), one Anthropic non-coverage statement, one Google Veo-3 API spec with long-running-operation polling, twelve first-class third-party video-gen providers (Runway/Luma/Pika/Kling/Hailuo/Hunyuan/Mochi/CogVideoX/Stability-Video/BFL-Video/Replicate-Video/Fal-Video), three first-class CLI/SDK implementations of typed video-gen surface (OpenAI Python+TypeScript videos.generate + videos.retrieve, Runway TypeScript SDK, Luma Python SDK), six first-class local-video-gen providers (Stable Video Diffusion / AnimateDiff / Hunyuan-Video weights / Mochi-1 weights / CogVideoX weights / ComfyUI workflows), one community-maintained authoritative benchmark (VBench 16-evaluation-dimensions), nine coding-agent peers with video-gen capability, one canonical Anthropic-recommended partner-set (Sora-2/Veo-3/Runway/Luma per third-party-integration guide), the OpenAI /v1/responses endpoint with video_call tool for conversational video-output decoding via OutputContentBlock::Video, the canonical five-dimensional pricing matrix (per-model × per-resolution × per-fps × per-duration × per-extension-vs-generation), the canonical async-polling workflow with task-id polling at typical 5-second intervals and 5-minute typical-completion-time and 30-minute maximum-completion-time before timeout — claw-code is the sole client/agent/CLI in the surveyed coding-agent ecosystem with zero /v1/videos/{generations,edits,extends} integration AND zero Sora-2/Veo-3/Runway/Luma/Pika/Kling/Hailuo/Hunyuan/Mochi/CogVideoX/Stability-Video/BFL-Video partner-routing AND zero /sora / /veo / /video / /render-video / /generate-video slash command AND zero claw video / claw videos / claw generate-video / claw render-video CLI subcommand AND zero OutputContentBlock::Video variant AND zero multipart-form-data transport plumbing for video-edit binary uploads AND zero async-task-polling-primitive at the runtime layer — all seven gaps unique to claw-code in the surveyed ecosystem, the video-generation-API gap is the upstream prerequisite of every visual-temporal-output coding-agent affordance, and the nine-layer-fusion-shape-with-async-task-polling-primitive is novel within the cluster — #227 closes the upstream prerequisite of every visual-temporal-output coding-agent affordance and is the first cluster member where async-task-polling-primitive shape-axis is introduced) 2026-04-26 04:17:24 +09:00
Yeachan-Heo
897055a455 roadmap: #226 filed 2026-04-25 19:03:10 +00:00
YeonGyu-Kim
84a89f7e07 roadmap: #225 filed — Audio API typed taxonomy is structurally absent: zero /v1/audio/transcriptions + zero /v1/audio/translations + zero /v1/audio/speech endpoint surface across both Anthropic-native and OpenAI-compat lanes, zero TranscriptionRequest / SpeechRequest / AudioVoice / AudioFormat / AudioMediaType / AudioSource / Modality / AudioRequestConfig / SpeechResponse / TranscriptionResponse typed model in rust/crates/api/src/types.rs, zero Audio variant on InputContentBlock (3-arm exhaustive: Text/ToolUse/ToolResult), zero Audio variant on OutputContentBlock (4-arm exhaustive: Text/ToolUse/Thinking/RedactedThinking), zero modalities/audio fields on MessageRequest for gpt-4o-audio request-side opt-in, zero transcribe/translate/synthesize_speech methods on Provider trait at rust/crates/api/src/providers/mod.rs:17-30 (only send_message + stream_message exist), zero audio dispatch on ProviderClient enum at rust/crates/api/src/client.rs:8-14 (three variants Anthropic/Xai/OpenAi, zero Whisper/ElevenLabs/Cartesia/Deepgram/AssemblyAI/Speechmatics partner-routing variants), zero multipart/form-data upload affordance with reqwest::multipart feature flag absent from rust/crates/api/Cargo.toml (rg returns zero hits for multipart across rust/), zero claw audio/transcribe/speak/tts/whisper CLI subcommand at rust/crates/rusty-claude-cli/src/main.rs, zero /transcribe/whisper/tts slash command, AND the existing /voice + /listen + /speak slash commands at rust/crates/commands/src/lib.rs:295-301+603-609+610-616 advertise audio-capability summaries but are all gated under STUB_COMMANDS at rust/crates/rusty-claude-cli/src/main.rs:8333+8388+8389 (advertised-but-unbuilt shape ×3, the largest single-pinpoint advertised-but-unbuilt slash-command count catalogued, strict-superset of #220's /image+/screenshot ×2 and #223's /files ×1), zero whisper-1/tts-1/tts-1-hd/gpt-4o-audio-preview/gpt-4o-realtime-preview/gpt-4o-mini-tts/gpt-4o-mini-transcribe entries in MODEL_REGISTRY, zero audio_input_per_minute/audio_output_per_minute/tts_per_million_chars/whisper_per_minute fields in ModelPricing struct (rust/crates/runtime/src/usage.rs:9-15 has only four text-token-only fields), zero audio-model recognition in pricing_for_model substring-matcher (#209+#224 cluster overlap) — uniquely manifesting a fusion shape combining #223's transport-plumbing-absence (multipart/form-data) + #224's provider-asymmetric-delegation (Anthropic does not offer audio at all per docs.anthropic.com/audio explicitly recommending AssemblyAI/Deepgram/OpenAI-Whisper, OpenAI offers GA whisper-1+tts-1+tts-1-hd+gpt-4o-audio-preview+gpt-4o-realtime-preview+gpt-4o-mini-tts+gpt-4o-mini-transcribe, Google Gemini Live API offers bidirectional audio modality, six-plus recommended partners ElevenLabs/Cartesia/PlayHT/Deepgram/AssemblyAI/Speechmatics) + #220's advertised-but-unbuilt-slash-commands (×3, the largest count catalogued) + #218's modalities-request-side-absence (gpt-4o-audio-preview's modalities:[text,audio] opt-in) + symmetric-input-output content-block-taxonomy axis (#225's first-of-its-kind contribution to the cluster doctrine since prior members have either input-only [#220] or output-only [#214,#224] or stateless [#221/#222/#223] modality coverage) — making #225 the first cluster member where four independent prior shape-axes converge in a single pinpoint and the largest fusion-shape gap catalogued so far (Jobdori cycle #377 / extends #168c emission-routing audit / explicit follow-on candidate from #224's provider-asymmetric-delegation shape — the first-named of two named candidates: Audio API typed taxonomy (this pinpoint #225) / Image-generation API typed taxonomy (open candidate for #226), Audio chosen because it inherits #223's multipart-transport-plumbing dimension that Image-generation does not — the multipart sibling of #223 that the cycle hint explicitly identifies / sibling-shape cluster grows to twenty-four / wire-format-parity cluster grows to fifteen / capability-parity cluster grows to seven / multimodal-IO cluster grows to three: #220 input-only + #224 output-only + #225 full-duplex-bidirectional / advertised-but-unbuilt cluster grows to four / multipart-transport cluster grows to two / provider-asymmetric-delegation cluster grows to two / nine-layer-fusion-shape (endpoint-URL-set-of-three + multipart-form-data-transport-plumbing + data-model-taxonomy-with-input-AND-output-content-blocks + modalities-request-side-opt-in + Provider-trait-method-set-of-three-with-Unsupported-fallback + ProviderClient-enum-dispatch-with-six-partner-third-lanes + advertised-but-unbuilt-slash-commands-×3 + CLI-subcommand-surface + pricing-tier-with-per-minute-and-per-million-chars-and-per-million-audio-tokens-compound-cost-model) is the largest single-pinpoint fusion catalogued / external validation: forty-seven ecosystem references covering three first-class audio-endpoint specs on OpenAI side, one Anthropic non-coverage statement, one Google Gemini Live API spec, six first-class STT providers, six first-class TTS providers, one full-duplex bidirectional-audio endpoint OpenAI /v1/realtime, three first-class CLI/SDK typed-surface implementations, six first-class local-audio-providers, one community-maintained Common Voice benchmark, seven coding-agent peers with audio capability, one canonical Anthropic-recommended three-partner-set / claw-code is the sole client/agent/CLI with zero /v1/audio/{transcriptions,translations,speech} integration AND zero ElevenLabs/Cartesia/Deepgram/AssemblyAI/Speechmatics/Whisper partner-routing AND three advertised-but-unbuilt slash commands AND zero modalities request-side opt-in AND zero Audio content-block taxonomy variant on either input or output side AND zero multipart-form-data transport plumbing for audio uploads — all six gaps unique to claw-code in the surveyed ecosystem) 2026-04-26 03:47:33 +09:00