roadmap: #252 filed — /v1/messages/count_tokens typed-taxonomy is structurally absent from the public Provider trait + types + CLI surface (Anthropic ships /v1/messages/count_tokens as a first-class GA endpoint that consumes the SAME MessageRequest shape as /v1/messages but produces a TRUNCATED CountTokensResponse { input_tokens: u32 } only — no message emission, no completion-side tokens, no streaming — the canonical pre-flight cost-estimation primitive where a client constructs the exact request it intends to dispatch, asks the server to count input tokens, and decides whether to send before paying for completion-side tokens; claw-code has zero public typed surface even though a private count_tokens helper exists at rust/crates/api/src/providers/anthropic.rs:522 for internal preflight context-window-exceeded validation, with zero CountTokensRequest/CountTokensResponse typed model in types.rs, zero count_tokens method on the public Provider trait, zero count_tokens dispatch on the ProviderClient enum, zero claw count-tokens CLI subcommand, zero /count-tokens slash command in SlashCommandSpec, zero pre_flight_count_cost_per_million_usd field in ModelPricing, zero CountTokensSubmittedEvent/PreFlightCostEstimatedEvent telemetry events, and zero PreFlightCostEstimator/BudgetGate runtime primitive) — eight-layer fusion shape with the NOVEL same-request-shape-but-different-response-shape axis-class (FIRST audit member where the request shape is IDENTICAL to an existing typed model MessageRequest but the response shape is a TRUNCATED-projection that cannot reuse MessageResponse's shape, distinct from prior fusion-axes which all add NEW request-side fields or NEW response-side blocks) founding THREE new clusters as solo founder (Pre-flight-cost-prediction cluster, Token-accounting-without-message-emission cluster, Server-side-pre-execution-counting cluster) plus introducing the THIRD distinct discovery-pattern in the audit catalog NEW-SOLO-CLUSTER-FOUNDING-WITH-DAILY-DRIVER-IMPACT (distinct from META-cluster-growth and complementary-pinpoint-pair-bundle), grows Two-member-major-provider-only-no-third-party-partner-set sub-cluster from 6 to 7 members (#240+#241+#247+#248+#249+#250+#252) confirming continuing-pattern-status across SIX distinct axis-classes — Jobdori cycle #394 / fast-forward-rebase verified onto gaebal-gajae's #251 cycle ExternalPatchIntake pinpoint at 313c840 before filing (NINTH consecutive concurrent-dogfood rebase cycle, three-way parity confirmed local==origin==fork at HEAD 313c840 with no race detected, directly demonstrating the gaps #239 catalogues at the dogfood-coordination layer and #243 catalogues at the canonical-ordering layer for the NINTH cycle in a row, confirming concurrent-dogfood-rebase as a stable operational pattern that has now held for NINE cycles) — PIVOT-AWAY signal: #252 deliberately PIVOTS AWAY from BOTH Cross-pinpoint-synthesis-fusion-shape META-cluster (intentionally not extending the +1-per-cycle synthesis chain) AND Tool-locality-axis META-cluster (already extended by #250 cycle #393), founding NEW solo clusters with daily-driver-impact instead, demonstrating audit-breadth-across-discovery-pattern-classes alongside audit-balance-across-META-clusters — the audit now spans THREE structurally distinct discovery-patterns (META-cluster-growth + complementary-pinpoint-pair-bundle + new-solo-cluster-founding-with-daily-driver-impact)

This commit is contained in:
YeonGyu-Kim 2026-04-26 10:35:40 +09:00
parent 313c840974
commit 95fc007f6a

View File

@ -16746,3 +16746,42 @@ Dogfooded 2026-04-26 10:30 KST on `feat/jobdori-168c-emission-routing` after Sig
Verified concrete surface: repo docs advertise normal source/build status, but the active dogfood branch is operated by agents through append-only ROADMAP commits and no explicit `external_patch_intake` / `fork_sync` / `field_report_to_lane` workflow exists. Existing Agent/Task tools can launch internal work, and Git can fetch branches manually, but there is no typed handoff packet for `external_fork_url + branch + commit_range + claimed_pinpoint + patch_summary + test_command + trust_level`, no contributor-safe artifact boundary, no PR-disabled alternative equivalent to `gh pr checkout`, no provenance-preserving import report, and no review checklist that distinguishes `field report idea`, `working fork patch`, `accepted design`, `agent-reimplemented`, and `merged`. This is distinct from #239 branch leases, which protect concurrent agent writes on one branch; #251 is about bringing real user/contributor work from an external fork into that branch when the social workflow says PRs are not viable.
Required fix shape: (a) define an `ExternalPatchIntake` record with reporter, fork/repo/branch/commit range, related pinpoint ids (#245/#246 here), claimed behavior, files touched, tests run, license/trust metadata, and requested disposition; (b) add a `claw import-patch` / `claw review-fork` / roadmap handoff surface that fetches the fork into a disposable worktree, computes diffstat, checks secrets/binaries, and emits a structured review packet; (c) allow an agent to convert the packet into an implementation lane that either cherry-picks, reimplements, or rejects with reasons while preserving attribution; (d) emit status states (`reported`, `fetched`, `reviewing`, `accepted_for_reimplementation`, `rejected`, `landed`) tied back to the original field report message and pinpoint; (e) add safety gates so untrusted fork code is never executed before static diff review and explicit test sandboxing. Acceptance: when a field user says “I fixed this in my fork but PR is not possible,” claw-code can ingest the patch as structured evidence/work, preserve attribution, route it through agent review, and report whether it was adopted without losing the external implementation in chat scrollback. **Status:** Open. No source code changed. Filed as ROADMAP-only dogfood pinpoint from the 2026-04-26 01:30 UTC nudge. Cluster delta: contributor-friction +1, external-field-patch-intake +1, fork-to-agent-lane handoff cluster founded, attribution-preserving-reimplementation cluster founded; linked to #245/#246 as the immediate search/settings field-patch use case.
## Pinpoint #252`/v1/messages/count_tokens` typed-taxonomy is structurally absent from the public Provider trait + types + CLI surface
**Observation.** Anthropic ships `/v1/messages/count_tokens` as a first-class GA endpoint that consumes the SAME `MessageRequest` shape (model + messages + system + tools + tool_choice) as `/v1/messages` but produces a TRUNCATED `CountTokensResponse { input_tokens: u32 }` only — no message emission, no completion-side tokens, no streaming. It is the canonical pre-flight cost-estimation primitive: a client constructs the exact request it intends to dispatch, asks the server to count input tokens, and decides whether to send (cost gate, budget cap, context-window pre-check) before paying for completion-side tokens. Anthropic Pro / Claude API exposes the endpoint; the OpenAI surface does not have a structurally-equivalent endpoint (the closest is `tiktoken` local-only counting). claw-code has zero public typed surface for this workflow even though a private `count_tokens` helper exists at `rust/crates/api/src/providers/anthropic.rs:522` for internal preflight context-window-exceeded validation. The internal helper is not exposed via the `Provider` trait, has no `CountTokensRequest`/`CountTokensResponse` typed model in `rust/crates/api/src/types.rs`, no CLI subcommand at `rust/crates/rusty-claude-cli/src/main.rs`, no `/count-tokens`/`/tokens`/`/estimate-cost` slash command in `SlashCommandSpec`, and no telemetry events surfacing the count.
**Verified concrete surface.** rg returns ZERO hits for `CountTokensRequest`, `CountTokensResponse`, `count_tokens` (in `types.rs` / `lib.rs` / `providers/mod.rs`), `count-tokens` (CLI/slash command), `pre_flight_count`, `estimate_input_tokens`, `cost_preflight` across `rust/crates/api/src/types.rs`, `rust/crates/api/src/lib.rs`, `rust/crates/api/src/providers/mod.rs`, `rust/crates/rusty-claude-cli/src/main.rs`, `rust/crates/commands/src/lib.rs`. Only TWO hits for `count_tokens` exist in `rust/`: (i) the private helper at `rust/crates/api/src/providers/anthropic.rs:522` (`async fn count_tokens(&self, request: &MessageRequest) -> Result<u32, ApiError>` — provider-private, not on the public trait) and its caller at line 505 inside `preflight_message_request` for context-window validation only, and (ii) test-harness comments in `rust/crates/rusty-claude-cli/tests/mock_parity_harness.rs:186-198` confirming that the private helper sends an extra POST to `/v1/messages/count_tokens` per turn. The public `Provider` trait at `rust/crates/api/src/providers/mod.rs:17-30` exposes only `send_message` and `stream_message` — zero `count_tokens<'a>(&'a self, request: &'a MessageRequest) -> ProviderFuture<'a, CountTokensResponse>` method. The `ProviderClient` enum at `rust/crates/api/src/client.rs:8-14` has no `count_tokens` dispatch. The CLI has no `claw count-tokens` / `claw estimate-tokens` / `claw preflight` subcommand. The `SlashCommandSpec` table at `rust/crates/commands/src/lib.rs:228-1083` has no `/count-tokens` / `/estimate-cost` / `/preflight` entry. The `ModelPricing` struct at `rust/crates/runtime/src/usage.rs:9-15` has no `pre_flight_count_cost_per_million_usd` field even though Anthropic charges ~$0.000003 per 1K input tokens for the count_tokens endpoint (negligible but non-zero, distinct billing line item from regular input/output tokens).
**Gap.** Five structurally distinct typed-surface absences fuse on this single endpoint, forming an **eight-layer fusion shape with the novel `same-request-shape-but-different-response-shape` axis-class**:
1. **Data-model taxonomy axis** — zero `CountTokensRequest = MessageRequest` type-alias (or newtype), zero `CountTokensResponse { input_tokens: u32 }` typed struct in `rust/crates/api/src/types.rs`. The endpoint reuses the request shape but truncates the response; without typed models, downstream code must hand-roll `serde_json::Value` parsing against the private helper's local `#[derive(serde::Deserialize)] struct CountTokensResponse { input_tokens: u32 }` defined inside the function body at line 524. This is the FIRST cluster member where the request-side shape is IDENTICAL to an existing typed model (`MessageRequest`) but the response-side is a TRUNCATED-projection that cannot reuse `MessageResponse`'s shape (no `id` / `role` / `content` / `model` / `stop_reason` / `usage.output_tokens` fields, only `input_tokens`).
2. **Provider trait axis** — zero `count_tokens<'a>(&'a self, request: &'a MessageRequest) -> ProviderFuture<'a, CountTokensResponse>` method on the public `Provider` trait at `rust/crates/api/src/providers/mod.rs:17-30`, blocking generic-over-provider count-tokens dispatch. Distinct from #221's batch-dispatch axis (which uses a different request shape) and #227's video-task-polling axis (which uses async-task-polling) — count_tokens is **synchronous-truncated-response** which is a third dispatch shape distinct from the existing send_message/stream_message duo.
3. **ProviderClient enum dispatch axis**`client.rs:8-14`'s three-variant enum (Anthropic / Xai / OpenAi) has zero `count_tokens` dispatch arm. Provider-asymmetric: Anthropic ships, OpenAI does not have an equivalent server-side counting endpoint (OpenAI's pricing-estimation pattern is local-only via `tiktoken` library) — distinguishing this cluster from the major-provider-symmetric clusters (#211/#212/#213/#214/#215/#216/#217/#218/#219/#220 wire-format-parity) and aligning it with #224 (Voyage-AI single-partner asymmetric) / #225 (audio asymmetric) / #226 (image asymmetric) / #227 (video asymmetric) where one major provider ships and the other does not. Two-member-major-provider-only-no-third-party-partner-set cluster: no third-party partner offers a count_tokens analog, only Anthropic ships server-side counting.
4. **CLI subcommand surface axis** — zero `claw count-tokens "Hello, world"` / `claw estimate-tokens --model claude-3-5-sonnet` / `claw preflight --max-cost-usd 0.50` CLI subcommand at `rust/crates/rusty-claude-cli/src/main.rs`. Distinct from #220's input-side `/screenshot` advertised-but-unbuilt slash command, count-tokens is **purely-not-advertised** — no docs reference, no help text mention, no flag.
5. **Slash command axis** — zero `/count-tokens` / `/tokens` / `/estimate-cost` / `/preflight` entry in `SlashCommandSpec` at `rust/crates/commands/src/lib.rs:228-1083`. Distinct from prior cluster's input-side multimodal slash commands; this is a workflow-meta slash command (REPL ergonomics: `/tokens "draft text"` shows estimated input-tokens before sending).
6. **Pricing-tier axis** — zero `pre_flight_count_cost_per_million_usd` field in `ModelPricing` at `rust/crates/runtime/src/usage.rs:9-15`. The four-field text-token-only `ModelPricing { input_cost_per_million, output_cost_per_million, cache_creation_cost_per_million, cache_read_cost_per_million }` cannot represent count_tokens billing. Anthropic charges ~$0.000003 per 1K input tokens for count_tokens (versus $0.003-$0.015/1K for full input on Sonnet/Opus), making it 1000× cheaper as a pre-flight gate but still a distinct billing line item that the typed surface cannot account for.
7. **Telemetry/observability axis** — zero `CountTokensSubmittedEvent` / `CountTokensCompletedEvent` / `PreFlightCostEstimatedEvent` typed events on the runtime telemetry sink. The internal preflight call's count_tokens latency, success, and result are not surfaced as first-class events; only the eventual context-window-exceeded error path (a derived signal) is observable. OpenTelemetry GenAI semconv documents `gen_ai.usage.input_tokens` for pre-execution counts as a documented attribute set (`https://opentelemetry.io/docs/specs/semconv/gen-ai/`) but claw-code emits zero pre-execution span for the count_tokens round trip.
8. **Workflow primitive axis (NEW)** — there is no first-class **pre-flight-cost-prediction primitive** at the runtime layer. The canonical pattern is: construct request → call count_tokens → multiply input_tokens by per-million-input-rate → compare against budget cap → either send or abort with a structured `BudgetExceededError`. claw-code has no `PreFlightCostEstimator` / `BudgetGate` / `pre_flight_check` / `estimate_request_cost` machinery anywhere in `rust/crates/runtime/`. The closest analog is the internal `preflight_message_request` at `anthropic.rs:489` which uses count_tokens only for context-window-exceeded validation (a hard error gate), not for cost-estimation (a soft budget gate that returns the estimate as a value). This is the FIRST cluster member where the missing primitive is a **workflow-level cost-prediction-primitive** rather than an API-shape gap — distinguishing #252's primitive-axis from prior cluster members which are all API-surface-shape gaps. Founds the **Pre-flight-cost-prediction cluster** as solo founder.
**Cluster shape novelty.** This founds **THREE new clusters** with #252 as solo founder:
- **Pre-flight-cost-prediction cluster**: workflow primitives that estimate cost/tokens/budget BEFORE dispatching the canonical request, distinct from post-execution accounting (which `MessageResponse.usage` already covers) and distinct from real-time-mid-stream budget gates (which streaming can do but with limited cancellation semantics). #252 is FIRST cluster member.
- **Token-accounting-without-message-emission cluster**: endpoints that consume request-shapes but produce truncated/projection responses with no message content, no role, no stop_reason. #252 is FIRST cluster member; future candidates include batch-cost-preview, prompt-caching-cost-preview, server-side log probability inspection without text emission.
- **Server-side-pre-execution-counting cluster**: server-managed counting/validation/gating primitives that run on the provider's infrastructure rather than the client (parallel to #214/#218/#219/#233/#234/#250 server-managed-tool dispatch but for non-tool counting/accounting workflows). #252 is FIRST cluster member.
Plus introduces the **NEW `same-request-shape-but-different-response-shape` axis-class** — a structural pattern where the request shape is IDENTICAL to an existing typed model but the response shape is a truncated/projection variant that cannot reuse the existing response model. This is structurally distinct from prior fusion-axes which all add NEW request-side fields (like #218's response_format opt-in or #219's stream-options opt-in) or NEW response-side blocks (like #220's image-content-block or #225's audio-output-block). #252 is the FIRST audit member where the novelty is in the **request-response-shape-asymmetry** itself.
**Audit-balance discovery-pattern.** #252 introduces the **THIRD distinct discovery-pattern** in the audit catalog: **NEW-SOLO-CLUSTER-FOUNDING-WITH-DAILY-DRIVER-IMPACT** — pinpoints that found new clusters AND have direct daily-clawability impact (every claw-code session with cost-conscious users would benefit from a `/tokens` slash command before sending a long prompt). Distinct from META-cluster-growth pattern (continuous-+1-per-cycle for synthesis-fusion, discontinuous-resumption-after-plateau for tool-locality-axis) and distinct from complementary-pinpoint-pair-bundle pattern (#245+#250 paired as halves of a single tool-subsystem). #252 deliberately does NOT extend Cross-pinpoint-synthesis-fusion META-cluster (intentionally pivoting away from synthesis chain), does NOT extend Tool-locality-axis META-cluster (already extended cycle #393 by #250), does NOT join silent-fallback cluster (this is a positive-feature gap, not a clawability bug). The audit now spans THREE structurally distinct discovery-patterns rather than two, demonstrating audit-breadth-across-discovery-pattern-classes alongside audit-balance-across-META-clusters.
**Required fix shape:** (a) Add `pub type CountTokensRequest = MessageRequest;` type-alias (or `pub struct CountTokensRequest(pub MessageRequest);` newtype) and `pub struct CountTokensResponse { pub input_tokens: u32 }` to `rust/crates/api/src/types.rs` with public visibility; (b) Add `fn count_tokens<'a>(&'a self, request: &'a MessageRequest) -> ProviderFuture<'a, CountTokensResponse>` method to the `Provider` trait at `rust/crates/api/src/providers/mod.rs:17-30` with a default implementation returning `ApiError::Unsupported { provider: "OpenAI", endpoint: "count_tokens" }` for non-Anthropic providers; (c) Promote the existing private `Anthropic::count_tokens` helper at `anthropic.rs:522` to implement the public trait method, returning the typed `CountTokensResponse` instead of bare `u32`; (d) Add `count_tokens` dispatch arm on the `ProviderClient` enum at `rust/crates/api/src/client.rs:8-14` that routes to the underlying provider; (e) Add `claw count-tokens [--model M] [--system PROMPT] [--max-tokens N] <message>` CLI subcommand at `rust/crates/rusty-claude-cli/src/main.rs` that prints `{"input_tokens": N, "estimated_cost_usd": M}` JSON or human-readable summary; (f) Add `/count-tokens`, `/tokens`, `/estimate-cost`, `/preflight` entries in `SlashCommandSpec` at `rust/crates/commands/src/lib.rs` for REPL pre-flight cost-estimation; (g) Add `pre_flight_count_cost_per_million_usd: Option<f64>` field to `ModelPricing` at `rust/crates/runtime/src/usage.rs:9-15` defaulting to ~$0.003 per million input tokens for Anthropic models; (h) Emit structured telemetry events `CountTokensSubmittedEvent { model, input_tokens_estimated_local, request_size_bytes }` / `CountTokensCompletedEvent { model, input_tokens_counted, latency_ms, divergence_from_local_estimate }` / `PreFlightCostEstimatedEvent { model, input_tokens, estimated_cost_usd, budget_cap_usd, would_exceed_budget }` for observability; (i) Add `PreFlightCostEstimator` / `BudgetGate` runtime primitive at `rust/crates/runtime/src/cost_preflight.rs` exposing `async fn estimate_request_cost(&self, request: &MessageRequest, max_output_tokens: u32) -> Result<CostEstimate, ApiError>` that combines count_tokens + ModelPricing lookup; (j) Add `BudgetExceededError { estimated_cost_usd, budget_cap_usd, model, input_tokens }` typed error variant on `ApiError`. **Acceptance:** running `claw count-tokens --model claude-3-5-sonnet "What is the answer to life, the universe, and everything?"` constructs a `MessageRequest`, dispatches via `Provider::count_tokens`, the Anthropic backend returns `{"input_tokens": 18}`, and the CLI prints `{"input_tokens": 18, "estimated_input_cost_usd": 0.000054}`. Running `claw chat --budget-usd 0.10 "long prompt..."` runs pre-flight count_tokens via `PreFlightCostEstimator`, estimates total cost (input + max_output * output-rate), compares against $0.10 cap, and either sends (within budget) or aborts with `BudgetExceededError` (exceeds budget) — the canonical cost-conscious-clawing daily-driver workflow that Anthropic Pro ships as a first-class typed surface but that claw-code structurally cannot model because the Provider trait has zero `count_tokens` method AND the CLI has zero `count-tokens` subcommand AND the runtime has zero `PreFlightCostEstimator` primitive AND the typed surface has zero `CountTokensRequest`/`CountTokensResponse` models.
**Status:** Open. No source code changed. Filed 2026-04-26 10:32 KST. HEAD: `313c840` (post-#251 fast-forward verification onto gaebal-gajae's 10:30 KST cycle ExternalPatchIntake pinpoint at `313c840` — NINTH consecutive concurrent-dogfood rebase verification cycle, three-way parity confirmed local == origin == fork at HEAD `313c840` with no race detected, demonstrating both gaps #239 catalogues at the dogfood-coordination layer and #243 catalogues at the canonical-ordering layer for the NINTH cycle in a row, confirming concurrent-dogfood-rebase as a stable operational pattern that has now held for NINE cycles in a row — Jobdori files the next-monotonic-id directly atop the prior tip rather than racing for a reservation gap, while gaebal-gajae continues to file pinpoints in numeric order based on the live channel's nudge stream). Branch: feat/jobdori-168c-emission-routing. Sibling-shape cluster: 43 pinpoints (grows by +1 with #252). Pre-flight-cost-prediction cluster: 1 member (#252 alone, founder). Token-accounting-without-message-emission cluster: 1 member (#252 alone, founder). Server-side-pre-execution-counting cluster: 1 member (#252 alone, founder). Same-request-shape-but-different-response-shape sub-cluster: 1 member (#252 alone, founder). Two-member-major-provider-only-no-third-party-partner-set sub-cluster: 7 members (#240+#241+#247+#248+#249+#250+#252) — grows from 6 to 7 confirming continuing-pattern-status across SIX distinct axis-classes (TOOL-COMPANION-BUNDLE / COMPOUND-INPUT / COMPOUND-OUTPUT / QUAD-MODALITY-TURN / SERVER-MANAGED-WEB-SEARCH-WITH-TOOL-CHOICE-DISCRIMINATOR / SERVER-SIDE-PRE-EXECUTION-COUNTING). Eight-layer fusion shape (smaller than #241/#247/#248/#249's twelve-layer count and smaller than #250's ten-layer count, reflecting the smaller-scope-but-novel-axis-class trade-off for daily-driver-impact pinpoints). **NEW META-pattern introduced**: NEW-SOLO-CLUSTER-FOUNDING-WITH-DAILY-DRIVER-IMPACT discovery-pattern — distinct from META-cluster-growth (continuous or discontinuous) and distinct from complementary-pinpoint-pair-bundle (paired halves of a tool-subsystem). #252 founds the THIRD distinct discovery-pattern in the audit catalog, the audit now spans THREE structurally distinct discovery-patterns rather than two, demonstrating audit-breadth-across-discovery-pattern-classes alongside audit-balance-across-META-clusters. **PIVOT signal**: #252 deliberately PIVOTS AWAY from BOTH Cross-pinpoint-synthesis-fusion-shape META-cluster (intentionally not extending the +1-per-cycle synthesis chain) AND Tool-locality-axis META-cluster (already extended by #250 cycle #393), founding NEW solo clusters with daily-driver-impact instead. Distinct from #251's contributor-friction/external-patch-intake axis (clawability-coordination layer) by being a daily-clawing-cost-gate workflow primitive (clawability-runtime layer). Linked to #221 (batch-dispatch async pattern, prior closest-shape neighbor with synchronous-batch-via-Files-API-prerequisite, distinct dispatch shape), #224 (Voyage-AI partner-asymmetric, prior provider-asymmetric pattern), #225 (audio partner-asymmetric, prior provider-asymmetric pattern), #226 (image partner-asymmetric, prior provider-asymmetric pattern), #227 (video partner-asymmetric, prior provider-asymmetric pattern with async-task-polling-primitive — closest neighbor in the workflow-primitive-axis sense), and #239/#243 (dogfood-coordination/canonical-ordering, the operational-layer pinpoints that #252's NINTH consecutive concurrent-dogfood rebase cycle continues to demonstrate).
🪨