From 0e9cff588da03a4fccc91fb81265d4d90c3617da Mon Sep 17 00:00:00 2001
From: YeonGyu-Kim <code.yeon.gyu@gmail.com>
Date: Sat, 25 Apr 2026 18:20:04 +0900
Subject: [PATCH] =?UTF-8?q?roadmap:=20#206=20filed=20=E2=80=94=20normalize?=
 =?UTF-8?q?=5Ffinish=5Freason=20covers=202/5=20OpenAI=20finish=20reasons,?=
 =?UTF-8?q?=20length/content=5Ffilter/function=5Fcall=20unmapped=20(Jobdor?=
 =?UTF-8?q?i=20cycle=20#357)?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Pinpoint #206: normalize_finish_reason() in openai_compat.rs only maps
stop→end_turn and tool_calls→tool_use. The 'other => other' pass-through
arm silently leaks length, content_filter, function_call to downstream
consumers expecting Anthropic vocabulary (max_tokens, refusal, tool_use).

Sibling of #201/#202/#203/#204 (silent fallbacks at provider boundary).
No structured event for unmapped values; test coverage locks only the
two-case happy path.

Branch: feat/jobdori-168c-emission-routing
HEAD: dba4f28
---
 ROADMAP.md | 91 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 91 insertions(+)

diff --git a/ROADMAP.md b/ROADMAP.md
index 0b7d2d2..8fcc482 100644
--- a/ROADMAP.md
+++ b/ROADMAP.md
@@ -12506,3 +12506,94 @@ claw doctor
 **Status:** Open. No code changed. Filed 2026-04-25 17:16 KST by Q *YeonGyu Kim, formalized to ROADMAP by Jobdori cycle #351. Branch: feat/jobdori-168c-emission-routing.
 
 🪨
+
+---
+
+## Pinpoint #206 — `normalize_finish_reason` covers 2 of 5 OpenAI finish reasons: `length`, `content_filter`, `function_call` pass through unmapped, breaking Anthropic-vocabulary parity that downstream consumers assume (Jobdori, cycle #357)
+
+**Observed:** In `rust/crates/api/src/providers/openai_compat.rs`, `normalize_finish_reason()` (line 1389) translates OpenAI-compat `finish_reason` strings into Anthropic-compatible `stop_reason` vocabulary. The mapping only handles two cases:
+
+```rust
+fn normalize_finish_reason(value: &str) -> String {
+    match value {
+        "stop" => "end_turn",
+        "tool_calls" => "tool_use",
+        other => other,
+    }
+    .to_string()
+}
+```
+
+The `other => other` arm passes any other value through unchanged. OpenAI's actual finish_reason enum includes at minimum five values: `stop`, `length`, `content_filter`, `tool_calls`, `function_call` (legacy). Three of those — `length`, `content_filter`, `function_call` — silently bypass normalization.
+
+**Gap:**
+
+1. **`length` is not mapped to `max_tokens`.** When the model hits the max_tokens cap, OpenAI returns `finish_reason: "length"`. Anthropic returns `stop_reason: "max_tokens"`. claw-code surfaces a raw `"length"` to downstream consumers expecting Anthropic vocabulary. Any orchestrator branching on `stop_reason == "max_tokens"` to detect output truncation will silently miss the OpenAI-compat path.
+
+2. **`content_filter` is not mapped to `refusal` or a typed safety state.** When OpenAI's content filter blocks output (`finish_reason: "content_filter"`), claw-code surfaces a raw `"content_filter"` instead of routing it through a refusal/safety taxonomy. Anthropic uses `stop_reason: "refusal"` for the equivalent state. Downstream code that watches for refusal/safety stops cannot detect the OpenAI safety stop without a separate code path. Worse: no event is emitted indicating the model was blocked — the message just ends with an unfamiliar `stop_reason` value.
+
+3. **`function_call` (legacy single-tool format) is not mapped to `tool_use`.** Some OpenAI-compat backends still emit `finish_reason: "function_call"` instead of `tool_calls`. claw-code passes this through unchanged, but most consumers branch on `stop_reason == "tool_use"` (the Anthropic vocabulary) which never matches the legacy form. Tool-use detection silently fails on those backends.
+
+4. **No structured event for unmapped values.** The pass-through path emits no log, no warning, no `unknown_finish_reason` event. A claw cannot tell from the JSON output alone whether a `stop_reason` value came from a recognized mapping or fell through unchanged. This compounds with #201/#202/#203 — silent fallbacks at the provider boundary keep widening the observability gap that the structured-event taxonomy is supposed to close.
+
+5. **Test coverage locks in only the two-case happy path.** `normalizes_stop_reasons` test (line 1635-1638) asserts `"stop" → "end_turn"` and `"tool_calls" → "tool_use"`. There is no test for `length`, `content_filter`, or `function_call`, so the missing mappings are invisible to CI. The test also does not assert pass-through behavior for unknown values, leaving room for silent regressions when new finish_reason values are added by OpenAI.
+
+**Repro:**
+
+```rust
+// rust/crates/api/src/providers/openai_compat.rs:1389
+fn normalize_finish_reason(value: &str) -> String {
+    match value {
+        "stop" => "end_turn",
+        "tool_calls" => "tool_use",
+        other => other,  // <-- silent pass-through
+    }
+    .to_string()
+}
+
+// At call site (line 1202):
+stop_reason: choice
+    .finish_reason
+    .map(|value| normalize_finish_reason(&value)),
+
+// When OpenAI returns finish_reason="length":
+// stop_reason in MessageResponse becomes Some("length")
+// Downstream code branching on stop_reason == "max_tokens" never matches
+```
+
+**Verification check:** `grep -n "normalize_finish_reason" rust/crates/api/src/providers/openai_compat.rs` returns 5 hits — implementation, two call sites (streaming + non-streaming), and two test imports/assertions. The pass-through path has no caller-side guard or post-normalization assertion.
+
+**Expected:**
+
+- `normalize_finish_reason` exhaustively covers OpenAI's documented finish_reason enum: `stop` → `end_turn`, `length` → `max_tokens`, `content_filter` → `refusal`, `tool_calls` → `tool_use`, `function_call` → `tool_use`
+- Unknown finish_reason values emit a structured `unknown_finish_reason` event with the raw value, source provider, and request id, before falling through to the raw string (or to a typed `unknown` placeholder)
+- A `claw doctor` check or session diagnostic surfaces sessions where unmapped finish_reasons were observed
+- Test coverage extends to all five mapped cases plus an explicit pass-through-with-event assertion for unknown values
+
+**Fix sketch:**
+
+1. Extend `normalize_finish_reason` to a full mapping table covering `length`, `content_filter`, `function_call`. ~5 LOC.
+2. Change the `other` arm to either (a) return a typed `(String, bool)` where the bool indicates whether normalization was recognized, or (b) emit an `unknown_finish_reason` event before returning the raw string. ~10 LOC + event channel plumbing.
+3. Add per-arm regression tests in the existing `normalizes_stop_reasons` test. ~15 LOC test additions.
+4. Add to classifier event taxonomy and document the mapping table in SCHEMAS.md (companion to #200's derive-from-source enforcement gap).
+5. Verify Anthropic-side `stop_reason` enum: confirm `max_tokens` and `refusal` are the canonical Anthropic spellings before locking the OpenAI → Anthropic mapping.
+
+**Why this matters for clawability:**
+
+- Principle #2 ("Truth is split across layers") — finish_reason is the canonical signal for *why* the model stopped. Half of the truth values silently leak through unmapped to downstream consumers that expect Anthropic vocabulary.
+- Principle #5 ("Partial success is first-class — degraded-mode reporting") — `content_filter` is a partial-success / safety-blocked state; the current code path emits zero structured signal for it.
+- Sibling of #201 (silent tool-arg parse fallback), #202 (silent tool-message drop), #203 (auto-compaction no SSE event), #204 (reasoning_tokens not surfaced) — all four are silent-fallback gaps at the provider boundary that the structured-event taxonomy was designed to close, and #206 is the same pattern at the finish_reason translation layer.
+- Real-world impact: a claw consuming `--output-format json` and branching on `stop_reason` to decide "was output truncated, was it refused, was a tool used" gets wrong answers on three of OpenAI's five finish_reason values when that path runs against an OpenAI-compat backend.
+
+**Acceptance criteria:**
+
+- `normalize_finish_reason("length")` returns `"max_tokens"`
+- `normalize_finish_reason("content_filter")` returns `"refusal"` (or a documented Anthropic-canonical equivalent)
+- `normalize_finish_reason("function_call")` returns `"tool_use"`
+- Unknown values emit an `unknown_finish_reason` event before returning the raw string
+- `normalizes_stop_reasons` test asserts all five mappings plus the unknown-event behavior
+- Downstream consumers branching on Anthropic stop_reason vocabulary work correctly against OpenAI-compat backends without per-provider special-casing
+
+**Status:** Open. No code changed. Filed 2026-04-25 18:20 KST. Branch: feat/jobdori-168c-emission-routing. HEAD: dba4f28.
+
+🪨