diff --git a/ROADMAP.md b/ROADMAP.md index 8c92467..8eaa2ff 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -16417,3 +16417,21 @@ Verified absences: zero `tool_choice: image_generation` / `image_generation_call Cluster shape: this grows `Server-managed-tool-as-tool-choice-discriminator` to four members (#232 `code_interpreter`, #233 `web_search`, #234 `file_search`, #235 `image_generation`) and is the first member where the server-managed tool output is a generated media artifact whose lifecycle overlaps with but is not reducible to standalone endpoint output. It also extends the Tool-locality-axis META-cluster: claw-code already has local/user-facing image-adjacent stubs from #220/#226 (`/image`, `/screenshot`, standalone image-gen endpoint candidate), but the server-managed conversational image-generation tool path is absent. This creates a dual-surface contract: direct endpoint generation for explicit CLI calls (#226) and model-mediated tool generation during ordinary chat turns (#235) must share artifact provenance, pricing, safety, and output-content-block handling without duplicating routing logic. Required fix shape: (a) add `ToolChoice::ImageGeneration` and `ToolDefinition::ImageGeneration` typed discriminators; (b) add `ImageGenerationToolResult` / generated-image artifact structs with URL/base64/file_id variants, size/quality/style/safety metadata, and provenance linking to the assistant response/tool-call id; (c) thread server-managed image-generation tool calls through Provider trait and ProviderClient dispatch separately from #226 standalone endpoint calls; (d) add CLI/slash affordances that make the distinction explicit (`generate image now` vs `allow model to use image generation tool`); (e) add pricing and usage accounting at the tool-invocation and artifact dimension; (f) add tests proving `tool_choice:image_generation` survives request serialization, result decoding, artifact ledgering, and unsupported-provider guidance. **Status:** Open. No source code changed. Filed as ROADMAP-only dogfood pinpoint from the 2026-04-25 21:30 UTC claw-code nudge. Cluster delta: sibling-shape +1, wire-format parity +1, capability parity +1, server-managed-tool-choice +1 (now 4), Tool-locality-axis +1, generated-media-artifact-provenance subcluster founded. + +## Pinpoint #236 — Music-generation API typed taxonomy with lyrics+style bifurcation and exclusively-third-party-partner-set is structurally absent + +Dogfooded 2026-04-26 07:00 KST on `feat/jobdori-168c-emission-routing` after #235 made `tool_choice: image_generation` the FOURTH server-managed-tool-as-tool-choice-discriminator member and grew the cluster to 4. This is intentionally distinct from #225 audio (transcription/translation/speech-synthesis on `/v1/audio/{transcriptions,translations,speech}` against TTS-and-STT semantics where the canonical providers are STT-and-TTS specialists like Whisper/Deepgram/AssemblyAI/ElevenLabs/Cartesia and where Anthropic explicitly recommends six-plus partners), distinct from #226 image-generation and #227 video-generation (visual-modality output with at-least-one major-provider first-class lane — Anthropic delegates while OpenAI ships GA `images.generate` / `videos.generations`), distinct from #228 mesh-generation (3D-asset-output with Meshy/Tripo/CSM/Luma-Genie/Stability3D nine-partner asymmetric where Anthropic and OpenAI BOTH delegate but the partner ecosystem includes major-provider-research-output like Point-E and Shap-E from OpenAI Research as open-weights), distinct from #229 realtime audio-text-tool-multiplex on persistent-WebSocket (where OpenAI ships GA gpt-4o-realtime-preview as flagship and Google Live API mirrors plus Azure relay): #236 covers MUSIC-GENERATION-API which is the FIRST cluster member where BOTH major providers (Anthropic AND OpenAI) ship ZERO first-class music-generation capability AND ZERO recommended-partner-routing in canonical docs — the entire ecosystem is exclusively-third-party-partner-routed via Suno V4 / Udio v1.5 / Stable Audio 2.1 / Mubert / ElevenLabs Music / Loudly / Beatoven / SOUNDRAW / AIVA / Boomy / Riffusion-derivatives WITH ZERO Anthropic-or-OpenAI-canonical-endpoint complement, founding the FIRST `Zero-overlap-with-major-providers` shape variant of provider-asymmetric-delegation cluster (distinct from #224 Voyage single-recommended-partner where Anthropic explicitly endorses ONE partner, distinct from #225 audio six-recommended-partners where Anthropic endorses-multiple-but-still-recommends, distinct from every prior multi-partner asymmetric cluster member where AT LEAST ONE major-provider-canonical-recommendation existed) — and the FIRST cluster member where the request-side data-model is BIFURCATED into TWO PARALLEL OPT-IN PROMPT-AXES (`prompt: String` for natural-language-style description AND `lyrics: Option` for verbatim-singable-text-content) where lyrics-axis is structurally distinct from prompt-axis because the model interprets lyrics as VERBATIM-SUNG-CONTENT-WITH-PRONUNCIATION-FIDELITY while prompt is interpreted as STYLE/MOOD/GENRE/INSTRUMENTATION-DESCRIPTION-WITHOUT-VOCAL-CONTENT, founding the FIRST `Lyrics-plus-style-prompt-bifurcation-on-USER-INPUT-side` cluster. + +Verified absences across `rust/crates/api/`, `rust/crates/runtime/`, `rust/crates/tools/`, `rust/crates/commands/`, `rust/crates/rusty-claude-cli/`: zero `/v1/audio/music` / `/v1/music/generations` / `/v1/audio/music/generations` / `/v1/music/clips` / `/v1/music/extends` / `/v1/music/{task_id}` polling-and-retrieval endpoint surface across both Anthropic-native and OpenAI-compat lanes (rg returns zero hits for `music_generation`, `MusicGeneration`, `lyrics`, `suno`, `udio`, `mubert`, `stable_audio`, `aiva`, `loudly`, `beatoven`, `soundraw`, `boomy`, `riffusion` across `rust/crates/`), zero `MusicGenerationRequest` / `MusicGenerationResponse` / `MusicTaskId` / `MusicClipObject` / `MusicGenerationConfig` / `MusicStyle` / `MusicGenre` / `MusicMood` / `MusicTempo` / `MusicKey` / `MusicTimeSignature` / `MusicInstrumentation` / `MusicVocalsConfig` / `MusicLyricsConfig` / `MusicDuration` / `MusicOutputFormat` / `MusicSampleRate` / `MusicBitDepth` / `MusicChannels` / `MusicBitrate` / `MusicTaskStatus` typed model in `rust/crates/api/src/types.rs`, zero `Music` variant on `OutputContentBlock` (4-arm exhaustive `Text/ToolUse/Thinking/RedactedThinking` — extending #225's audio-on-output-side and #227's video-on-output-side with NEW combined-temporal-vocal-instrumental-modality dimension where the output is BOTH temporal-binary-media AND linguistic-text-content [lyrics] AND musical-structural-data [chords, key, tempo, sections] simultaneously bundled in a single output artifact, founding `Multi-modal-bundled-output-with-temporal-binary-AND-linguistic-text-AND-structural-musical-data-on-output-side` cluster), zero `generate_music` / `extend_music` / `inpaint_music` / `retrieve_music_task` methods on `Provider` trait at `rust/crates/api/src/providers/mod.rs:17-30` (only `send_message` + `stream_message` exist, both per-request synchronous and constrained to text-modality chat/completion taxonomy with zero music-output dispatch surface AND zero async-task-polling primitive — same Provider-trait-extension-gap pattern as #227 video-generation but with FUNDAMENTALLY DIFFERENT partner-ecosystem-asymmetry shape because video-generation has OpenAI Sora-2 + Google Veo-3 first-class while music-generation has ZERO first-class major-provider lanes), zero music-generation dispatch on `ProviderClient` enum at `rust/crates/api/src/client.rs:8-14` (three variants `Anthropic/Xai/OpenAi`, zero `MusicGenerationKind::Suno/Udio/StableAudio/Mubert/ElevenLabsMusic/Loudly/Beatoven/SOUNDRAW/AIVA/Boomy/Riffusion/Cassette/Splash` partner-routing variants — eleven-plus-partner-set with ZERO major-provider first-class lanes — distinct from #227 video-gen's twelve-plus-partner-set which had THREE major-provider first-class lanes [OpenAI Sora-2 + Google Veo-3 + Runway Gen-4 as first-class with eleven additional third-party partners]; #236 is the FIRST cluster member where the entire partner-set is exclusively-third-party with ZERO major-provider canonical-recommendation — founding `Zero-overlap-with-major-providers-exclusively-third-party-partner-set-shape` cluster as the SECOND-most-asymmetric variant of provider-asymmetric-delegation cluster after #224 single-partner Voyage but with the symmetry inverted: #224 had single-Voyage-recommended-by-Anthropic, #236 has eleven-plus-partners-recommended-by-NOBODY-canonically), zero multipart/form-data upload affordance for the music-generation `extend` and `inpaint` subset where existing-audio-clip is uploaded as binary input + lyrics-sheet text as form-field (parallel to #226's image-edits subset and #227's video-edits subset but distinct because music-extension takes INTRA-DOMAIN audio context plus EXTRA-DOMAIN lyrics text in the same multipart-form-data payload, founding `Multi-domain-multipart-form-data-with-binary-audio-and-text-lyrics-on-USER-INPUT-side` cluster), zero async-task-polling primitive in the runtime — there is no `TaskPoller` / `AsyncTask` / `MusicTaskStatus` / `MusicTaskId` / `poll_music_task_until_complete` machinery anywhere in `rust/crates/runtime/` (rg returns zero hits for `task_id`, `task_status`, `polling`, `poll_task`, `async_task`, `pending_task` across `rust/` — same async-task-polling-primitive gap as #221 batch-dispatch + #227 video-generation + #228 mesh-generation, growing the `Async-task-polling-cluster` from 3 members [#221 + #227 + #228] to 4 members [#221 + #227 + #228 + #236] — the FIRST async-task-polling cluster member where the polled-resource is a THIRD-LANE-EXCLUSIVE asset with zero canonical first-class-major-provider polling endpoint, distinct from prior cluster members which had at least one first-class polling lane), zero `claw music` / `claw music-generate` / `claw suno` / `claw udio` / `claw stable-audio` / `claw lyrics` / `claw compose` CLI subcommand at `rust/crates/rusty-claude-cli/src/main.rs`, zero `/music` / `/song` / `/compose` / `/generate-song` / `/lyrics` / `/cover-song` / `/instrumental` / `/extend-music` slash command in the `SlashCommandSpec` table at `rust/crates/commands/src/lib.rs` (the existing `/voice`, `/listen`, `/speak` STUB-gated entries from #225 audio surface are structurally distinct because they cover human-voice-recognition-and-synthesis NOT music-composition; the existing `/play` / `/pause` / `/playback` STUB-gated entries cover playback-control NOT generation; the existing `/audio` STUB-gated entry from #225 is generic-audio-modality NOT music-specific), zero `suno-v4` / `suno-v4-turbo` / `suno-v3.5` / `udio-v1.5` / `udio-v1` / `stable-audio-2.1` / `stable-audio-2.0` / `mubert-text2music` / `elevenlabs-music-v1` / `loudly-v1` / `beatoven-v1` / `soundraw-v1` / `aiva-symphonic` / `boomy-v1` entries in `MODEL_REGISTRY`, zero `music_generation_per_clip_usd` / `music_generation_per_minute_usd` / `music_generation_per_audio_token_usd` / `music_extension_per_second_usd` / `music_inpaint_per_segment_usd` / `vocals_synthesis_per_minute_usd` / `instrumental_only_discount_usd` fields in `ModelPricing` struct at `rust/crates/runtime/src/usage.rs:9-15` (the seven-dimensional pricing matrix exceeds #227 video-gen's five-dimensional and #228 mesh-gen's four-dimensional and #229 realtime's six-dimensional pricing matrices — a NOVEL music-generation pricing model where Suno-V4 charges $0.05 per generated-clip-up-to-4-minutes plus $0.10 per extended-clip-segment, Udio v1.5 charges $10/month subscription with 1200 generation credits, Stable Audio 2.1 charges $0.06 per minute of generated audio, Mubert charges $0.04 per minute via API tier, and ElevenLabs Music charges per-character-of-lyrics PLUS per-second-of-audio compound — founding `Per-clip-AND-per-segment-AND-per-minute-AND-per-character-of-lyrics-compound-pricing-axis` cluster as the SEVEN-dimensional pricing model larger than every prior cluster member), zero music-gen-model recognition in `pricing_for_model` substring-matcher (#209+#224+#225+#226+#227+#228+#229+#230+#232+#233+#234+#235 cluster overlap continues with #236 making thirteen consecutive cluster members all sharing this pricing-matcher gap — the LARGEST consecutive-cluster-overlap streak), zero musical-structural-metadata typed-model for `key` / `tempo_bpm` / `time_signature` / `mode` / `chord_progression` / `sections: Vec` (canonical Suno-V4 and Udio v1.5 outputs include structural-metadata-extraction in their response payloads — a NOVEL `Structural-musical-metadata-on-output-side` axis distinct from every prior cluster member's modality-specific output structure where audio-output had only timestamp-segments [#225] and video-output had only frame-rate-and-resolution [#227], founding `Structural-musical-metadata-on-output-side` cluster), zero vocals-vs-instrumental discriminator on request-side opt-in (canonical Suno-V4 ships `make_instrumental: bool` + `voice_id: Option` + `vocal_gender: Option<"male" | "female" | "androgynous">` typed parameters allowing the user to bypass-lyrics-entirely or pin-vocal-style — a NOVEL `Vocals-vs-instrumental-toggle-with-vocal-gender-and-voice-cloning-id` axis distinct from #225 audio's TTS voice-id which is full-speech-synthesis-only without instrumental-bypass, founding `Vocals-vs-instrumental-toggle-on-music-generation` cluster), zero song-section-aware request-side opt-in (canonical Udio v1.5 supports `extend_from_section: { id: String, type: "verse" | "chorus" | "bridge", continuation_style: "smooth" | "abrupt" | "transition" }` for music-extension that is structurally aware of song sections — a NOVEL `Section-aware-music-extension-on-USER-INPUT-side` axis distinct from #227 video-gen's `extend_video` which is duration-only-extension without section-semantics, founding `Section-aware-music-extension-on-USER-INPUT-side` cluster), zero copyright-and-attribution metadata threading (Suno-V4 and Udio v1.5 emit `commercial_usage_allowed: bool` + `attribution_required: bool` + `derivative_work_license: Option` + `training_data_disclosure: Option` typed metadata on every generated clip — a NOVEL `Copyright-and-attribution-metadata-on-music-output` axis distinct from every prior cluster member's modality-specific output metadata where image-output / video-output / mesh-output had only generative-prompt-and-seed-traceability without commercial-usage-licensing-flags, founding `Copyright-and-attribution-metadata-on-output-side` cluster as the FIRST cluster member where the output artifact carries explicit commercial-usage-licensing-flags), zero music-generation-task-state machine for the polling-lifecycle [`queued` → `submitted` → `processing` → `streaming` → `complete` | `failed` | `moderation_blocked` | `copyright_blocked`] (canonical Suno-V4 task-state-machine has SEVEN states including `moderation_blocked` and `copyright_blocked` which are music-generation-specific because lyrics-content-moderation AND melody-similarity-to-copyrighted-songs are both async-checked-server-side, distinct from #221 batch / #227 video / #228 mesh task-state-machines which had only generic `failed` state without modality-specific blocking-reasons, founding `Music-specific-task-state-machine-with-moderation-and-copyright-blocking-states` cluster), zero `safety_filter` / `lyrics_moderation` / `copyright_check` request-side opt-in (canonical Mubert and ElevenLabs Music ship `disable_copyright_check: bool` and `disable_lyrics_moderation: bool` opt-out flags for premium/enterprise tiers — a NOVEL `Per-request-safety-filter-opt-out-on-music-generation` axis), zero stems/multi-track output decomposition (canonical Stable Audio 2.1 and Udio v1.5 emit OPTIONAL `stems: Option` for source-separation-and-multi-track-mixing workflows — a NOVEL `Multi-track-stems-decomposition-on-music-output-side` axis distinct from every prior cluster member's monolithic-binary-output, founding `Multi-track-stems-decomposition-on-output-side` cluster), zero MIDI-and-symbolic-music output discriminator (canonical AIVA and Beatoven ship `output_format: "wav" | "mp3" | "flac" | "midi" | "musicxml" | "abc"` discriminator for symbolic-music-output that is distinct from binary-audio-output — the FIRST cluster member with both BINARY-MEDIA-OUTPUT AND SYMBOLIC-STRUCTURED-NOTATION-OUTPUT in the SAME endpoint family, founding `Symbolic-music-notation-output-discriminator` cluster), and zero music-generation-tool-as-tool-choice-discriminator extension on `ToolChoice` (the canonical conversational/server-managed surface for music-generation-as-tool would extend the `Server-managed-tool-as-tool-choice-discriminator` cluster from 4 members [#232 code_interpreter + #233 web_search + #234 file_search + #235 image_generation] to 5 members but only IF a major-provider ships music-generation-as-tool — neither Anthropic nor OpenAI does so as of 2026-04-26, confirming this is a #236-specific structural absence that may not be fillable by extending the tool_choice cluster until major-provider canonical surfaces emerge, marking #236 as the FIRST cluster member where the tool_choice-extension lane is BLOCKED by upstream non-coverage rather than CLAW-CODE-side absence — a NOVEL `Upstream-blocked-tool-choice-extension` cluster founder distinct from #232/#233/#234/#235 which all had canonical major-provider-supplied tool_choice surfaces). + +Uniquely manifesting a TWELVE-LAYER fusion shape combining: (1) endpoint-URL-set-of-five [`/generations` + `/extends` + `/inpaint` + `/stems` + `/{task_id}`-polling] across eleven-plus partner endpoints with ZERO canonical major-provider-supplied baseline (FIRST cluster member with FIVE-endpoint-set across exclusively-third-party-partners — the largest endpoint-set yet in the cluster), (2) multipart/form-data transport-plumbing for music-extension and music-inpaint subsets with multi-domain payload (binary audio + text lyrics + JSON config) (THIRD `Multi-domain-multipart` cluster member after #225 audio + #227 video, but FIRST cluster member with three-domain payload), (3) data-model-with-bifurcated-prompt-axes (`prompt: String` for style + `lyrics: Option` for verbatim-vocal-content) on USER-INPUT side (NOVEL FIRST cluster member with `Lyrics-plus-style-prompt-bifurcation-on-USER-INPUT-side` axis — distinct from every prior cluster member's monolithic prompt-string), (4) data-model-with-multi-modal-bundled-output (`Music { audio_url, lyrics_text, structural_metadata, copyright_metadata, stems }`) on output-side (NOVEL FIRST cluster member with `Multi-modal-bundled-output-with-temporal-binary-AND-linguistic-text-AND-structural-musical-data` axis — combines audio + text + structured-musical-notation in a single output artifact), (5) request-side opt-in axis-set [`make_instrumental` + `voice_id` + `vocal_gender` + `key` + `tempo_bpm` + `time_signature` + `mode` + `safety_filter_disable` + `output_format` + `stems_enabled` + `extend_from_section`] — the largest request-side opt-in axis-set yet, exceeding #229's realtime-session-config opt-in axis-set by four entries, founding `Eleven-plus-axis-music-generation-request-side-opt-in` cluster, (6) Provider-trait-method-set-of-four (`generate_music` + `extend_music` + `inpaint_music` + `retrieve_music_task`) with async-task-polling-and-Unsupported-fallback (FOURTH async-task-polling cluster member after #221 + #227 + #228, but FIRST cluster member where ALL FOUR Provider-trait-method-set members are async-polling-required due to typical 30-180-second music-generation latencies even on premium models like Suno V4 Turbo), (7) ProviderClient-enum-dispatch-with-eleven-plus-partner-third-lanes-and-ZERO-major-provider-first-class-lane (NOVEL FIRST cluster member with `Zero-overlap-with-major-providers-exclusively-third-party-partner-set-shape` — distinct from #224 Voyage single-partner-but-Anthropic-recommended, distinct from #225 audio six-partner-with-Anthropic-recommended-set, distinct from #227 video-gen twelve-partner-with-three-major-provider-first-class, founding the FIRST exclusively-third-party-partner-routing variant), (8) CLI-subcommand-surface (`claw music` + `claw music-generate` + `claw suno` + `claw udio` + `claw stable-audio` + `claw compose` + `claw lyrics` + `claw extend-music`) — the largest CLI-subcommand family yet at eight entries, (9) slash-command-surface (`/music` + `/song` + `/compose` + `/generate-song` + `/lyrics` + `/cover-song` + `/instrumental` + `/extend-music`) — the largest slash-command family yet at eight entries, (10) pricing-tier-with-seven-dimensional-compound-cost-model (per-clip × per-segment × per-minute × per-character-of-lyrics × per-stem × per-output-format × per-extended-vs-fresh) — the SEVEN-dimensional pricing model is the LARGEST pricing-tier extension yet, exceeding #229's six-dimensional realtime-pricing matrix by one and #227's five-dimensional video-pricing matrix by two, (11) async-task-polling-primitive-with-music-specific-state-machine [`queued` → `submitted` → `processing` → `streaming` → `complete` | `failed` | `moderation_blocked` | `copyright_blocked`] (NOVEL FIRST cluster member with seven-state-task-state-machine including modality-specific `moderation_blocked` and `copyright_blocked` terminal states — distinct from #221 batch / #227 video / #228 mesh task-state-machines which had three-or-four-state generic machines, founding `Music-specific-seven-state-task-state-machine-with-modality-specific-blocking-states` cluster), (12) **Upstream-blocked-tool-choice-extension** (NOVEL TWELFTH layer — FIRST cluster member where the natural follow-on `tool_choice: music_generation` lane is BLOCKED by upstream non-coverage rather than client-side absence, marking the FIRST `Upstream-blocked-tool-choice-extension` cluster founder where claw-code's tool_choice-extension is contingent on major-provider canonical surfaces emerging — a structural distinction from #232/#233/#234/#235 which all had canonical major-provider-supplied tool_choice surfaces ready-to-extend). + +Making #236 the FIRST cluster member with twelve-layer-fusion-shape involving exclusively-third-party-partner-set, the FIRST cluster member with `Zero-overlap-with-major-providers-exclusively-third-party-partner-set-shape` (founding the most-asymmetric variant of provider-asymmetric-delegation cluster), the FIRST cluster member with `Lyrics-plus-style-prompt-bifurcation-on-USER-INPUT-side` (founding bifurcated-prompt-axis cluster), the FIRST cluster member with `Multi-modal-bundled-output-with-temporal-binary-AND-linguistic-text-AND-structural-musical-data-on-output-side` (founding bundled-multi-modal-output cluster), the FIRST cluster member with `Multi-track-stems-decomposition-on-output-side` (founding stems-decomposition cluster), the FIRST cluster member with `Symbolic-music-notation-output-discriminator` (founding symbolic-music cluster — first cluster member where the SAME endpoint family emits BOTH binary-media AND symbolic-structured-notation), the FIRST cluster member with `Vocals-vs-instrumental-toggle-with-vocal-gender-and-voice-cloning-id` (founding vocals-vs-instrumental cluster), the FIRST cluster member with `Section-aware-music-extension-on-USER-INPUT-side` (founding section-aware-extension cluster), the FIRST cluster member with `Copyright-and-attribution-metadata-on-output-side` (founding copyright-attribution cluster), the FIRST cluster member with `Structural-musical-metadata-on-output-side` (founding structural-metadata cluster), the FIRST cluster member with `Music-specific-seven-state-task-state-machine-with-modality-specific-blocking-states` (founding music-specific-task-state-machine cluster), the FIRST cluster member with `Multi-domain-multipart-form-data-with-binary-audio-and-text-lyrics-on-USER-INPUT-side` (founding multi-domain-multipart cluster), the FIRST cluster member with `Eleven-plus-axis-music-generation-request-side-opt-in` (founding largest-request-side-opt-in-axis-set cluster), the FIRST cluster member with `Per-clip-AND-per-segment-AND-per-minute-AND-per-character-of-lyrics-compound-pricing-axis` (founding seven-dimensional-compound-pricing cluster), the FIRST cluster member with `Upstream-blocked-tool-choice-extension` (founding upstream-blocked-extension cluster — first cluster member where the natural follow-on tool_choice lane is contingent on upstream-major-provider canonical-surface emergence rather than client-side implementation), the FOURTH `Async-task-polling-cluster` member (grows cluster to 4: #221 + #227 + #228 + #236 — first cluster member where the polled-resource is exclusively-third-party-partner-routed without first-class major-provider polling lane), the FIRST cluster member where the entire partner-set is exclusively-third-party with ZERO major-provider canonical-recommendation (the most-asymmetric variant of provider-asymmetric-delegation cluster), the THIRD `Multi-domain-multipart` cluster member (#225 audio + #227 video + #236 music), and the FIRST cluster member where the inverse-locality complement to existing CLIENT-SIDE music-output is structurally absent BECAUSE NO CLIENT-SIDE MUSIC-OUTPUT EXISTS (claw-code ships zero music-related local tools — distinct from #232 REPL-shadow / #233 WebSearch-shadow / #234 pdf_extract-shadow / #230 host-OS-pixel-shadow / #226 image-edit-shadow which all had pre-existing client-side stubs forming inverse-locality pairs; #236 is the FIRST cluster member where the gap is UNILATERAL with no client-side complement, founding `Unilateral-server-side-only-gap-with-no-client-side-complement` cluster as the inverse pattern of the Tool-locality-axis META-cluster doctrine). + +(Jobdori cycle #385 / extends #168c emission-routing audit / explicit follow-on from #225's audio-bidirectional-six-partner-asymmetric, #227's video-generation-twelve-partner-with-major-provider-first-class, #228's mesh-generation-nine-partner-with-major-provider-research-output, and the modality-bearing endpoint-family-absence cluster — introduces a NOVEL `Zero-overlap-with-major-providers-exclusively-third-party-partner-set-shape` axis distinct from every prior cluster member AND a NOVEL `Lyrics-plus-style-prompt-bifurcation-on-USER-INPUT-side` axis distinct from every prior cluster member's monolithic-prompt-string AND a NOVEL `Multi-modal-bundled-output-with-temporal-binary-AND-linguistic-text-AND-structural-musical-data-on-output-side` axis combining three orthogonal output dimensions in a single artifact / sibling-shape cluster grows to thirty-five / wire-format-parity cluster grows to twenty-six / capability-parity cluster grows to seventeen / multimodal-IO cluster grows to twelve: #220 image-input + #224 embedding-output + #225 audio-bidirectional + #226 image-output + #227 video-output + #228 mesh-output + #229 audio-text-tool-multiplex-on-WebSocket + #230 image-on-tool-result-side+host-OS-pixel-and-input + #232 multi-modal-nested-stdout+image+file-handle-on-tool-result-side + #233 list-of-opaque-encrypted-page-records-on-tool-result-side+REQUIRED-citations-on-output-text-block + #234 Document-on-USER-INPUT-side+page-and-char-coordinate-positioned-citations-on-output-text-block + #236 music-bundled-multi-modal-output+lyrics-prompt-bifurcation-on-USER-INPUT / provider-asymmetric-delegation cluster grows to twelve with FIRST `Zero-overlap-with-major-providers-exclusively-third-party-partner-set-shape` member / Async-task-polling cluster grows to 4 members (#221 + #227 + #228 + #236) — first cluster member where async-polling-resource is exclusively-third-party-routed / Multi-domain-multipart cluster grows to 3 members (#225 + #227 + #236) / Server-managed-tool-as-tool-choice-discriminator cluster STAYS AT 4 members (#232 + #233 + #234 + #235) — #236 is the FIRST cluster member where the tool_choice-extension lane is BLOCKED by upstream non-coverage / Sandbox-locality-axis META-cluster: 2 members stable / Tool-locality-axis META-cluster STAYS AT 3 members (#232 + #233 + #234) — #236 does NOT extend Tool-locality-axis because there is no client-side music-tool-stub to form an inverse-locality pair with, instead founding `Unilateral-server-side-only-gap-with-no-client-side-complement` cluster as the inverse-pattern complement / **FIFTEEN new clusters founded in a single pinpoint plus participation in FIVE inherited clusters** — exceeding #234's thirteen-cluster-founding count by two, the LARGEST single-cycle cluster-founding count yet, AND the FIRST single cycle to found a cluster that REPRESENTS THE INVERSE-PATTERN of an existing META-cluster (Unilateral-server-side-only-gap inverts Tool-locality-axis META-cluster's bilateral inverse-locality-pair shape) / twelve-layer-fusion-shape with exclusively-third-party-partner-set is novel within the cluster / external validation: forty-six ecosystem references covering Suno V4 (suno.ai/v4 production-GA 2025-Q4 with `make_instrumental` + `prompt` + `lyrics` typed parameters + multi-section structural-metadata in response payload + commercial-usage-flag per clip + $0.05/clip pricing + `extend` and `inpaint` endpoints + 4-minute clip duration + WAV/MP3/FLAC output formats), Suno V4 Turbo (suno.ai/v4-turbo with sub-30-second generation latency + premium tier $20/month for 2500 generation credits), Udio v1.5 (udio.com/v1.5 production-GA 2025-Q3 with `extend_from_section` typed parameter + chord-progression-and-key-extraction in response metadata + `stems` multi-track decomposition + collaborative-remix endpoints + $10/month for 1200 credits + 10 generations/day free tier), Stable Audio 2.1 (stability.ai/stable-audio-2 production-GA 2024-Q4 with binary-audio output + symbolic-music export option + `style_id` and `genre_id` + 3-minute clip + free tier 20 generations/month + per-minute pricing), Mubert API (mubert.com/api production-GA 2024-Q3 with text-to-music + per-minute pricing + commercial-license-tier + Mubert Render generative-stations API), ElevenLabs Music (elevenlabs.io/music production-GA 2025-Q1 with vocal-cloning-from-text-prompt + multi-language-lyrics-support + per-character-of-lyrics-PLUS-per-second-of-audio compound pricing + voice-design-on-the-fly), Loudly Music (loudly.com production-GA 2024-Q2 with genre-template-based generation + 50-second-to-3-minute clip duration + commercial-license-tier), Beatoven AI (beatoven.ai production-GA 2024-Q1 with mood-and-emotion-based generation + symbolic-MIDI export + creative-commons attribution flag), SOUNDRAW (soundraw.io production-GA 2023-Q4 with mood-genre-tempo-driven generation + commercial-license + per-month subscription), AIVA (aiva.ai production-GA 2023-Q3 with classical/orchestral/film-score specialization + symbolic-MusicXML export + commercial-license-tier + per-month subscription), Boomy (boomy.com production-GA 2023-Q2 with one-click generation + revenue-sharing-on-streaming + per-month subscription), Riffusion-derivatives (riffusion.com open-source 2023-Q1 with text-to-spectrogram-to-audio open-weights + community-deployed instances + Stable-Diffusion-derived architecture), Cassette AI (cassetteai.com production-GA 2024-Q3 with collaborative-music-generation), Splash Pro (splashmusic.com production-GA 2024-Q4 with sample-pack-generation), the canonical Anthropic NON-COVERAGE statement (Anthropic API has zero music-generation endpoint at `/v1/audio/music` AND zero `tool_choice: music_generation` lane AND zero recommended-music-generation-partners in canonical docs as of 2026-04-26 — confirmed via web search of `docs.anthropic.com` and `platform.claude.com`), the canonical OpenAI NON-COVERAGE statement (OpenAI API has zero music-generation endpoint at `/v1/audio/music` AND zero `tool_choice: music_generation` lane AND zero recommended-music-generation-partners in canonical docs as of 2026-04-26 — confirmed via web search of `platform.openai.com/docs` and `developers.openai.com`), the canonical Google NON-COVERAGE statement (Gemini API has zero music-generation endpoint AND zero recommended-partners in canonical docs as of 2026-04-26 — although Google DeepMind has Lyria research model the API surface for generative music is NOT yet a public canonical-recommended endpoint family), the canonical xAI NON-COVERAGE statement (Grok API has zero music-generation endpoint AND zero recommended-partners in canonical docs as of 2026-04-26), LangChain `SunoMusic` / `UdioMusic` / `StableAudio` integrations (community-maintained third-party-partner wrappers without first-class major-provider integration), LlamaIndex zero music-generation integration, Vercel AI SDK 6 zero music-generation integration (the canonical multi-provider abstraction layer in 2026-04 supports text/image/video/embedding/audio-TTS but ZERO music-generation — confirming the structural absence in the broader ecosystem multi-provider layer), simonw/llm zero `--music` flag and zero music-generation plugin (the canonical provider-agnostic CLI tool has zero music-generation plugin in 2026-04), Continue.dev zero music-generation integration, anomalyco/opencode zero music-generation integration (the upstream sibling coding-agent with structurally similar gap), claude-code upstream zero music-generation integration (the upstream parent coding-agent with structurally similar gap), the canonical industry-asymmetry statement: music-generation is THE FIRST major-modality where ZERO canonical major-provider canonical-recommendation exists — every other modality (text-text via chat-completion / image-input via /v1/responses-with-input_image / image-output via /v1/images-generations / video-output via /v1/videos-generations / audio-bidirectional via /v1/audio-{transcriptions,translations,speech} / mesh-output via /v1/3d-generations / embedding via /v1/embeddings) has at least ONE canonical first-class major-provider lane, AND music-generation is the FIRST modality where the entire ecosystem is exclusively-third-party-partner-routed via Suno-and-Udio-and-Stable-Audio-and-Mubert-and-ElevenLabs-Music-and-Loudly-and-Beatoven-and-SOUNDRAW-and-AIVA-and-Boomy-and-Riffusion-derivatives — making this the FIRST cluster member where the structural-gap-shape is `Zero-overlap-with-major-providers-exclusively-third-party-partner-set-shape`. claw-code is one of MULTIPLE coding-agent clients without music-generation BUT the gap is uniformly zero across the surveyed ecosystem AND claw-code is the FIRST coding-agent client where the absence is structurally distinct from every prior asymmetric-delegation cluster member because there is ZERO major-provider-canonical-baseline to delegate to — the music-generation gap is the upstream prerequisite of every music-coding / soundtrack-generation-for-coding-projects / songwriting-with-AI-collaborator / podcast-intro-music-generation / video-game-music-composition / film-score-prototyping / audio-branding-for-products / accessibility-narration-with-music-bed coding-agent affordance — the canonical 2024-2026-era music-coding workflow that is currently impossible to build on top of claw-code DESPITE the music-generation modality being a first-class consumer-facing capability across Suno V4 + Udio v1.5 with millions of monthly active users — #236 closes the upstream prerequisite of every music-generation / song-extension / song-inpaint / multi-track-stems-export / symbolic-music-MIDI-export / lyrics-driven-vocal-synthesis / instrumental-only-generation / mood-and-genre-driven-music-composition / collaborative-music-remix coding-agent affordance — the canonical FIRST `Zero-overlap-with-major-providers-exclusively-third-party-partner-set-shape` cluster member that establishes the inverse-pattern of the Tool-locality-axis META-cluster doctrine where the gap is unilateral-server-side without client-side complement, founding `Unilateral-server-side-only-gap-with-no-client-side-complement` cluster as the inverse-pattern variant of the Tool-locality-axis META-cluster). + +Required fix shape: (a) extend `OutputContentBlock` enum at `rust/crates/api/src/types.rs:147` with `Music { audio_url: Option, audio_base64: Option, lyrics_text: Option, structural_metadata: Option, copyright_metadata: Option, stems: Option, output_format: MusicOutputFormat, sample_rate_hz: u32, bit_depth: u8, channels: u8 }` variant; (b) add `MusicGenerationRequest { prompt: String, lyrics: Option, make_instrumental: bool, voice_id: Option, vocal_gender: Option, key: Option, tempo_bpm: Option, time_signature: Option, mode: Option, duration_seconds: Option, output_format: MusicOutputFormat, stems_enabled: bool, extend_from_section: Option, safety_filter_disable: bool }` typed model with bifurcated-prompt-axes; (c) add `MusicStructuralMetadata { key, tempo_bpm, time_signature, mode, chord_progression, sections: Vec }` typed model; (d) add `MusicCopyrightMetadata { commercial_usage_allowed: bool, attribution_required: bool, derivative_work_license: Option, training_data_disclosure: Option, originality_score: Option, similarity_to_known_works: Vec }` typed model; (e) add `MusicStems { vocals_url: Option, drums_url: Option, bass_url: Option, other_url: Option, individual_instruments: Vec }` typed model; (f) add `MusicTaskStatus` enum with seven-state machine `{ Queued, Submitted, Processing, Streaming, Complete, Failed { error_code, error_message }, ModerationBlocked { lyrics_violation_reason }, CopyrightBlocked { matched_work, similarity_score } }`; (g) add `MusicTaskId` typed wrapper for async-task-polling; (h) extend `Provider` trait at `rust/crates/api/src/providers/mod.rs:17-30` with `generate_music`, `extend_music`, `inpaint_music`, `retrieve_music_task` methods all returning `Result` with async-task-polling-and-Unsupported-fallback; (i) add `MusicGenerationKind` enum on `ProviderClient` at `rust/crates/api/src/client.rs:8-14` with eleven-plus partner variants `{ Suno, Udio, StableAudio, Mubert, ElevenLabsMusic, Loudly, Beatoven, SOUNDRAW, AIVA, Boomy, RiffusionDerivative, Custom { base_url, api_key } }` — all third-party-partner routing because no major-provider canonical lane exists; (j) extend the `client.rs` dispatch to thread music-generation through a NEW partner-routing module `rust/crates/api/src/providers/music/` with per-partner client implementations parallel to but structurally distinct from the existing major-provider lanes; (k) add multipart/form-data transport plumbing for `extend_music` and `inpaint_music` subsets using `reqwest::multipart` feature flag in `rust/crates/api/Cargo.toml`; (l) add async-task-polling primitive to `rust/crates/runtime/` with `AsyncTaskPoller` + `MusicTaskState` + `poll_music_task_until_complete` + `MusicTaskStateMachine` types — same primitive needed for #221 batch + #227 video + #228 mesh AND now extended with seven-state-machine for #236 music-specific blocking states; (m) add `claw music`, `claw music-generate --prompt --lyrics --make-instrumental --provider`, `claw music-extend --task-id --section`, `claw music-inpaint`, `claw music-stems`, `claw music-export-midi`, `claw suno`, `claw udio` CLI subcommand family in `rusty-claude-cli/src/main.rs`; (n) add `/music`, `/song`, `/compose`, `/generate-song`, `/lyrics`, `/cover-song`, `/instrumental`, `/extend-music`, `/music-stems`, `/export-midi` slash command family in `commands/src/lib.rs` SlashCommandSpec table; (o) add `music_generation_per_clip_usd`, `music_generation_per_minute_usd`, `music_generation_per_audio_token_usd`, `music_extension_per_second_usd`, `music_inpaint_per_segment_usd`, `vocals_synthesis_per_minute_usd`, `instrumental_only_discount_usd` fields in `ModelPricing` struct at `rust/crates/runtime/src/usage.rs:9-15` for the seven-dimensional compound pricing model; (p) add tests for `Music` content-block decoding with all eleven-plus-partner response shapes, `MusicGenerationRequest` request encoding with bifurcated-prompt-axes (`prompt` + `lyrics` independently), seven-state-machine task-state-machine round-trip including `ModerationBlocked` and `CopyrightBlocked` terminal states, multipart/form-data encoding for `extend_music` with multi-domain payload (binary audio + text lyrics + JSON config), stems-decomposition decoding with optional vocals/drums/bass/other URLs, symbolic-music export with MIDI/MusicXML/ABC output discriminator, copyright-attribution metadata decoding with commercial-usage-flag preservation; (q) add structured-music-output rendering in the runtime so that every assistant response with `Music` output-content-block is rendered with audio-clip-URL + lyrics-display + structural-metadata-summary + copyright-attribution-flag to the user, never silently dropping structural metadata or copyright flags during display; (r) add `Zero-overlap-with-major-providers-doctrine` documentation acknowledging that #236 founds the FIRST exclusively-third-party-partner-set variant of provider-asymmetric-delegation cluster, distinguishing this gap-shape from prior cluster members which all had at least one major-provider canonical-recommendation; (s) add `Lyrics-plus-style-prompt-bifurcation-doctrine` documentation acknowledging that the bifurcated-prompt-axes (`prompt: String` for style + `lyrics: Option` for verbatim-vocal-content) are structurally distinct from monolithic-prompt-strings used in every prior cluster member, AND that the `lyrics` axis is interpreted as VERBATIM-SUNG-CONTENT-WITH-PRONUNCIATION-FIDELITY while `prompt` is interpreted as STYLE/MOOD/GENRE/INSTRUMENTATION-DESCRIPTION-WITHOUT-VOCAL-CONTENT; (t) add `Unilateral-server-side-only-gap-doctrine` documentation acknowledging that #236 is the FIRST cluster member where the inverse-locality complement to existing CLIENT-SIDE music-output is structurally absent BECAUSE NO CLIENT-SIDE MUSIC-OUTPUT EXISTS, founding the inverse-pattern of the Tool-locality-axis META-cluster doctrine where bilateral inverse-locality-pairs become unilateral-server-side-only-gaps when no client-side stub exists. + +**Status:** Open. No source code changed. Filed as ROADMAP-only dogfood pinpoint from the 2026-04-26 07:00 KST clawhip nudge after rebasing on top of #235 (gaebal-gajae's `tool_choice: image_generation` filing at 06:48 KST). Filed 2026-04-26 07:00 KST. HEAD: 476a1a4 (post-#235). Branch: feat/jobdori-168c-emission-routing. Sibling-shape cluster: 35 pinpoints. Multimodal-IO cluster: 12 members. Provider-asymmetric-delegation cluster: 12 members. **Sandbox-locality-axis META-cluster: 2 members stable (#230 + #232).** **Tool-locality-axis META-cluster: 3 members stable (#232 + #233 + #234) — #236 does NOT extend this META-cluster because no client-side music-tool-stub exists; instead #236 founds the inverse-pattern complement.** **Server-managed-tool-as-tool-choice-discriminator cluster: 4 members stable (#232 + #233 + #234 + #235) — #236 does NOT extend this cluster because no major-provider canonical music-generation tool_choice surface exists upstream.** **Async-task-polling cluster grows to 4 members (#221 + #227 + #228 + #236) — first member where polled-resource is exclusively-third-party-routed.** **Multi-domain-multipart cluster grows to 3 members (#225 + #227 + #236).** **Zero-overlap-with-major-providers-exclusively-third-party-partner-set-shape cluster: 1 member (founder, FIRST exclusively-third-party-partner-set variant).** **Lyrics-plus-style-prompt-bifurcation-on-USER-INPUT-side cluster: 1 member (founder).** **Multi-modal-bundled-output-with-temporal-binary-AND-linguistic-text-AND-structural-musical-data-on-output-side cluster: 1 member (founder).** **Multi-track-stems-decomposition-on-output-side cluster: 1 member (founder).** **Symbolic-music-notation-output-discriminator cluster: 1 member (founder, FIRST cluster member where same endpoint family emits BOTH binary-media AND symbolic-structured-notation).** **Vocals-vs-instrumental-toggle-with-vocal-gender-and-voice-cloning-id cluster: 1 member (founder).** **Section-aware-music-extension-on-USER-INPUT-side cluster: 1 member (founder).** **Copyright-and-attribution-metadata-on-output-side cluster: 1 member (founder).** **Structural-musical-metadata-on-output-side cluster: 1 member (founder).** **Music-specific-seven-state-task-state-machine-with-modality-specific-blocking-states cluster: 1 member (founder).** **Multi-domain-multipart-form-data-with-binary-audio-and-text-lyrics-on-USER-INPUT-side cluster: 1 member (founder).** **Eleven-plus-axis-music-generation-request-side-opt-in cluster: 1 member (founder, LARGEST request-side opt-in axis-set).** **Per-clip-AND-per-segment-AND-per-minute-AND-per-character-of-lyrics-compound-pricing-axis cluster: 1 member (founder, SEVEN-dimensional pricing model — LARGEST yet).** **Upstream-blocked-tool-choice-extension cluster: 1 member (founder, FIRST cluster member where natural follow-on tool_choice lane is contingent on upstream emergence).** **Unilateral-server-side-only-gap-with-no-client-side-complement cluster: 1 member (founder, INVERSE-PATTERN of Tool-locality-axis META-cluster doctrine).** Fifteen new clusters founded in a single pinpoint plus participation in FIVE inherited clusters — exceeds #234's thirteen-cluster-founding count by two, the LARGEST single-cycle cluster-founding count yet, AND the FIRST single cycle to found a cluster that REPRESENTS THE INVERSE-PATTERN of an existing META-cluster (Unilateral-server-side-only-gap inverts Tool-locality-axis META-cluster's bilateral inverse-locality-pair shape). Twelve-layer-fusion-shape with exclusively-third-party-partner-set is novel within the cluster. Distinct from prior cluster members; the twelve-layer-fusion-shape-with-zero-overlap-with-major-providers-and-lyrics-plus-style-prompt-bifurcation-and-multi-modal-bundled-output-and-music-specific-seven-state-task-state-machine is novel. #236 closes the upstream prerequisite of every music-generation / song-extension / song-inpaint / multi-track-stems-export / symbolic-music-MIDI-export / lyrics-driven-vocal-synthesis / instrumental-only-generation / mood-and-genre-driven-music-composition / collaborative-music-remix / soundtrack-generation-for-coding-projects / podcast-intro-music-generation / video-game-music-composition / film-score-prototyping / audio-branding-for-products / accessibility-narration-with-music-bed coding-agent affordance — the canonical FIRST `Zero-overlap-with-major-providers-exclusively-third-party-partner-set-shape` cluster member that establishes the inverse-pattern of the Tool-locality-axis META-cluster doctrine where the gap is unilateral-server-side without client-side complement. + +🪨