diff --git a/ROADMAP.md b/ROADMAP.md index 99657a9..6462cc1 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -16176,3 +16176,11 @@ The minimal fix is a nine-touch architectural extension that is structurally dis **Status:** Open. No code changed. Filed 2026-04-26 03:36 KST. Branch: feat/jobdori-168c-emission-routing. HEAD: c01b470. Sibling-shape cluster (silent-fallback / silent-drop / silent-strip / silent-misnomer / silent-shadow / silent-prefix-mismatch / structural-absence / silent-zero-coercion / silent-content-discard / silent-header-discard / silent-tier-absence / silent-finish-mistranslation / silent-capability-absence / silent-false-positive-opt-in / advertised-but-unbuilt / endpoint-family-level-absence / advertised-but-rerouted / endpoint-family-level-absence-with-transport-plumbing-absence / endpoint-family-level-absence-with-provider-asymmetric-delegation / nine-layer-fusion-shape): #201/#202/#203/#206/#207/#208/#209/#210/#211/#212/#213/#214/#215/#216/#217/#218/#219/#220/#221/#222/#223/#224/#225 — twenty-four pinpoints. Wire-format-parity cluster grows to fifteen: #211 (max_completion_tokens) + #212 (parallel_tool_calls) + #213 (cached_tokens response-side) + #214 (reasoning_content) + #215 (Retry-After) + #216 (service_tier + system_fingerprint) + #217 (finish_reason taxonomy) + #218 (response_format / output_config / refusal) + #219 (cache_control request-side) + #220 (image content block + media_type) + #221 (Message Batches API) + #222 (Models list endpoint) + #223 (Files API + multipart-form-data transport plumbing) + #224 (Embeddings API + EmbeddingRequest + EmbeddingResponse + Voyage AI third-lane routing + provider-asymmetric-delegation pattern) + #225 (Audio API + TranscriptionRequest + SpeechRequest + AudioVoice + AudioFormat + AudioMediaType + AudioSource + Modality + AudioRequestConfig + InputContentBlock::Audio + OutputContentBlock::Audio + multipart-form-data audio-upload + six-partner provider-asymmetric-delegation + nine-layer-fusion-shape). Capability-parity cluster grows to seven: #218 (structured outputs) + #220 (multimodal input) + #221 (batch dispatch) + #222 (model discovery) + #223 (file management) + #224 (embeddings + RAG prerequisite) + #225 (audio + voice-loop prerequisite, the first cluster member with full-duplex symmetric-input-output modality coverage) — seven members, all four-or-more-layer structural absences. Cross-cutting-data-pipeline cluster grows to two: #224 (RAG prerequisite, semantic-similarity manifold) + #225 (voice-loop prerequisite, full-duplex audio bidirectional modality, the upstream root cause of every speech-driven coding-agent affordance). Multimodal-IO cluster grows to three: #220 (image input only, output is JSON markdown) + #224 (embedding output only, fixed-dimensional float vector) + #225 (audio input AND output, the first cluster member with full-duplex bidirectional modality where the same content-block-taxonomy axis applies to both InputContentBlock and OutputContentBlock variants). Advertised-but-unbuilt cluster grows to four: #220 (`/image`+`/screenshot` ×2) + #223 (`/files` ×1) + #225 (`/voice`+`/listen`+`/speak` ×3, the largest single-pinpoint count catalogued — strict-superset of #220's ×2 and #223's ×1). Multipart-transport cluster grows to two: #223 (Files API binary upload via /v1/files) + #225 (Audio transcription binary upload via /v1/audio/transcriptions, a strict-prerequisite-disjoint extension because audio-files do not need to be persisted via Files API for one-shot transcription — they're streamed inline as multipart/form-data per Whisper API spec, meaning #225 needs multipart-transport-plumbing even if #223's Files API surface is shipped first). Provider-asymmetric-delegation cluster grows to two: #224 (Voyage-AI single-partner-recommendation for embeddings) + #225 (ElevenLabs/Cartesia/PlayHT/Deepgram/AssemblyAI/Speechmatics six-plus-partner-set for TTS+STT, the largest partner-set in the surveyed ecosystem because audio is the most-fragmented modality across third-party providers). Nine-layer-fusion-shape (endpoint-URL-set-of-three [/v1/audio/transcriptions + /v1/audio/translations + /v1/audio/speech] + multipart-form-data-transport-plumbing + data-model-taxonomy-with-input-AND-output-content-blocks + modalities-request-side-opt-in + Provider-trait-method-set-of-three-with-Unsupported-fallback + ProviderClient-enum-dispatch-with-six-partner-third-lanes + advertised-but-unbuilt-slash-commands-×3 + CLI-subcommand-surface + pricing-tier-with-per-minute-and-per-million-chars-and-per-million-audio-tokens-compound-cost-model) is the largest single-pinpoint fusion catalogued, fusing #223's transport-plumbing axis + #224's provider-asymmetric-delegation axis + #220's advertised-but-unbuilt-slash-commands axis + #218's modalities-request-side axis + the new symmetric-input-output content-block-taxonomy axis (#225's first-of-its-kind contribution to the cluster doctrine, since prior cluster members have either input-only [#220] or output-only [#214] or stateless [#221/#222/#223] or input-with-fixed-output-vector [#224] modality coverage). Distinct from prior single-field (#211/#212/#214) / response-only (#213/#207) / header-only (#215) / three-dimensional (#216) / classifier-leakage (#217) / four-layer (#218) / false-positive-opt-in (#219) / five-layer-feature-absence (#220) / seven-layer-endpoint-family-absence (#221) / eight-layer-endpoint-family-absence-with-misleading-alias (#222) / seven-layer-endpoint-family-absence-with-transport-plumbing-absence (#223) / seven-layer-endpoint-family-absence-with-provider-asymmetric-delegation (#224) members; the nine-layer-fusion-shape is novel and applies to follow-on candidate Image-generation API typed taxonomy (`/v1/images/generations` + `/v1/images/edits` + `/v1/images/variations`, also provider-asymmetric — Anthropic does not offer image generation, OpenAI offers GA dall-e-3 + dall-e-2 + gpt-image-1, Google offers Imagen, recommended-partners include Stability AI / Midjourney / Black Forest Labs / Ideogram, and `/v1/images/edits` requires multipart-form-data with binary image+mask uploads — sibling fusion shape but with image-instead-of-audio modality, JSON-with-base64-or-url-output instead of binary-audio-output, and no symmetric input-AND-output content-block-taxonomy axis because images are output-only in the gpt-image-1 generation flow rather than full-duplex like gpt-4o-audio's bidirectional voice loop) — open candidate for #226. The fusion-shape pattern recurs across every modality-bearing endpoint family that combines provider-asymmetric coverage with multipart-transport needs and advertised-but-unbuilt-slash-command-clusters and symmetric-modality-input-output coverage, and #225 is the first cluster member where all five axes converge in a single pinpoint — the largest fusion-shape gap catalogued so far, the upstream prerequisite of every voice-driven coding-agent affordance, and the first cluster member where the symmetric-modality-input-output content-block-taxonomy axis is introduced. 🪨 + +## Pinpoint #226 — Image-generation API typed taxonomy is structurally absent + +Dogfooded 2026-04-26 04:03 KST on branch `feat/jobdori-168c-emission-routing` after #225 left Image-generation API typed taxonomy as the next named provider-asymmetric-delegation candidate. Repo scan confirms the same structural absence pattern for generated-image endpoints: zero `/v1/images/generations`, `/v1/images/edits`, or `/v1/images/variations` endpoint surface across `rust/`; zero `ImageGenerationRequest` / `ImageGenerationResponse` / `GeneratedImage` / `ImageEditRequest` / `ImageVariationRequest` / `ImageSize` / `ImageQuality` / `ImageStyle` / `ImageResponseFormat` typed model in `rust/crates/api/src/types.rs`; zero provider-trait method such as `generate_image`, `edit_image`, or `create_image_variation`; zero image-generation dispatch in `ProviderClient`; zero OpenAI `gpt-image-1` / `dall-e-3` / `dall-e-2` model-registry and pricing entries; zero `claw image generate` / `claw image edit` / `claw image variation` CLI surface; and the existing advertised image-adjacent slash commands remain capability-stubbed rather than wired to an image generation lane. + +This is a sibling fusion shape to #225 but with image-generation-specific transport/output semantics: Anthropic does not offer native image generation and delegates users to external partners, while OpenAI offers first-class `/v1/images/*` endpoints and Google/partner ecosystems offer Imagen / Stability AI / Midjourney / Black Forest Labs / Ideogram-style generation lanes. `/v1/images/generations` is JSON-in with URL/base64 JSON-out, while `/v1/images/edits` and `/v1/images/variations` require multipart image/mask upload plumbing, so the fix inherits #223/#225's multipart transport axis without #225's full-duplex audio content-block symmetry. The missing taxonomy blocks canonical coding-agent workflows such as “generate UI mockup / asset / diagram from prompt”, “edit screenshot/mockup with mask”, and “return generated image artifacts with stable provenance instead of prose-only descriptions.” + +Required fix shape: (a) add typed request/response structs for image generation, edit, and variation endpoints, including model, prompt, size, quality, style, response format, background/transparent-output options where supported, and generated-image provenance metadata; (b) extend provider capabilities with explicit unsupported/recommendation returns for Anthropic and OpenAI/partner implementations for image endpoints; (c) add multipart transport support for edit/variation image+mask uploads if not already landed by Files/Audio work; (d) expose CLI and slash-command surfaces that distinguish image input (#220) from image output generation (#226); (e) add pricing/model-registry coverage for `gpt-image-1`, `dall-e-3`, `dall-e-2`, Imagen/partner equivalents, and generated-image usage accounting; (f) add regression coverage for JSON generation, multipart edit/variation, Anthropic unsupported recommendation, and artifact provenance. **Status:** Open. No source code changed. Filed as ROADMAP-only dogfood pinpoint from the 2026-04-25 19:00 UTC claw-code nudge. Cluster delta: sibling-shape +1 (now 25), wire-format parity +1 (now 16), capability parity +1 (now 8), provider-asymmetric-delegation +1 (now 3), multipart-transport follow-on remains coupled to #223/#225 for edit/variation paths.