diff --git a/ROADMAP.md b/ROADMAP.md
index 9e9fa5e..0ca5a0b 100644
--- a/ROADMAP.md
+++ b/ROADMAP.md
@@ -8372,3 +8372,70 @@ if let Some(head_path) = resolve_git_head_path() {
 
 ---
 
+
+---
+
+## Cluster Update: #161 Elevated to Diagnostic-Strictness Family
+
+**Source:** gaebal-gajae validation on cycle #65 closure (2026-04-23 03:32 Seoul). Key quote: "이건 단순 build quirk가 아니라: 'version surface가 runtime reality를 잘못 설명한다'는 점에서 #57 원칙 정면 위반입니다."
+
+### The Reclassification
+
+**Before (cycle #65 initial filing):** #161 was grouped as "build-pipeline truthfulness" — a tooling-adjacent category.
+
+**After (cycle #67 reframe):** #161 is a first-class member of the **diagnostic-strictness family** (originally cycles #57–#59).
+
+### Why The Reclass Matters
+
+`claw version` is a **diagnostic surface**. It exists precisely to answer "what is the state of this binary?" When it reports stale Git SHA in a git worktree, it is:
+
+1. **Describing runtime reality incorrectly** — #57 principle violation ("diagnostic surfaces must be at least as strict as runtime reality")
+2. **Misleading downstream consumers** — bug reports, CI provenance, dogfood validation all inherit the stale SHA
+3. **Silent about the failure mode** — nothing in the output signals "this may be stale"
+
+The failure mode is identical in shape to #122 (doctor doesn't check stale-base) and #122b (doctor doesn't check broad-cwd): **diagnostic surface reports success/state, but underlying reality diverges**.
+
+### The Diagnostic-Strictness Family — Updated Membership
+
+| # | Surface | Runtime Reality | Gap | Status |
+|---|---|---|---|---|
+| #122 | `claw doctor` | Stale-base preflight (prompt path) | Doctor skipped stale-base check | 🟢 REVIEW-READY |
+| #122b | `claw doctor` | Broad-cwd check (prompt path) | Doctor green in home/root | 🟢 REVIEW-READY |
+| **#161** | **`claw version`** | **Current binary's Git SHA (real HEAD)** | **Reports stale SHA in worktrees** | **📋 FILED (new family member)** |
+
+All three:
+- Describe divergent realities (config vs. runtime)
+- Mislead the user who reads the diagnostic output
+- Can be fixed by making the diagnostic surface probe the actual state
+
+### Why This Is A Cluster, Not A Series Of One-Offs
+
+At cycle #57, we observed: `doctor` has one gap. At cycle #58, a second gap. At cycle #59, we formalized: **"diagnostic-strictness" is a principle, with an audit checklist.**
+
+Cycle #65 found a third instance. **This validates the cycle #59 investment.** Instead of treating #161 as novel, the audit lens immediately classified it: "This is the same failure mode as #122/#122b, just on a different surface."
+
+### Pattern Formalized: Diagnostic Surfaces Must Probe Current Reality
+
+Any surface whose name is "what is the state?" must:
+1. Read **live state** (not cached build metadata)
+2. Detect **mode-specific failures** (worktree vs. non-worktree, broad-cwd, stale-base)
+3. Warn when underlying reality diverges from what's reported
+
+**Surfaces on watch list** (not yet probed):
+- `claw state` — does it probe live session state?
+- `claw status` — does it probe auth/sandbox live?
+- `claw sandbox` — does it probe actual sandbox capability?
+- `claw config` — does it reflect active config or just raw file?
+
+### Implication For Future Cycles
+
+**Cycle #67 and onward:** When dogfooding, apply the diagnostic-strictness lens first.
+
+- See a diagnostic output? Ask: "Does this reflect runtime reality?"
+- See a stale value? Ask: "Is this a one-off, or a #122-family gap?"
+- See a success report? Ask: "Would the corresponding runtime call actually succeed?"
+
+This audit lens has now found 3 instances (#122, #122b, #161) in fewer than 10 cycles. The principle is **evidence-backed, not aspirational**.
+
+---
+