ROADMAP #102: mcp list/show/doctor surface MCP config-time only; no preflight, no liveness, not even command-exists check

Dogfooded 2026-04-18 on main HEAD eabd257 from /tmp/cdW2. A .claw.json pointing at command='/does/not/exist' as an MCP server cheerfully reports: mcp show unreachable → found: true mcp list → configured_servers: 1, status field absent doctor → config: ok, MCP servers: 1, has_failures: false The broken server is invisible until agent tries to call a tool from it mid-turn — burning tokens on failed tool call and forcing retry loop. Trace: main.rs:1701-1780 check_config_health counts via runtime_config.mcp().servers().len() No which(). No TcpStream::connect(). No filesystem touch. render_doctor_report has 6 checks (auth/config/install_source/ workspace/sandbox/system). No check_mcp_health exists. commands/src/lib.rs mcp list/show emit config-side repr only. No status field, no reachable field, no startup_state. runtime/mcp_stdio.rs HAS startup machinery with error types, but only invoked at turn-execution time — too late for preflight. Roadmap prescribes this exact surface: - Phase 1 §3.5 Boot preflight / doctor contract explicitly lists 'MCP config presence and server reachability expectations' - Phase 2 §4 canonical lane event schema includes lane.ready - Phase 4.4.4 event provenance / environment labeling - Product Principle #5 'Partial success is first-class' — 'MCP startup can succeed for some servers and fail for others, with structured degraded-mode reporting' All four unimplementable without preflight + per-server status. Fix shape (~110 lines): - check_mcp_health: which(command) for stdio, 1s TcpStream connect for http/sse. Aggregate ok/warn/fail with per-server detail lines. - mcp list/show: add status field (configured/resolved/command_not_found/connect_refused/ startup_failed). --probe flag for deeper handshake. - doctor top-level: degraded_mode: bool, startup_summary. - Wire preflight into prompt/repl bootstrap; emit one-time mcp_preflight event. Joins unplumbed-subsystem cross-cluster (#78, #100, #102) — subsystem exists, diagnostic surface JSON-invisible. Joins truth-audit (#80-#84, #86, #87, #89, #100) — doctor: ok lies when MCP broken. Natural bundle: #78 + #96 + #100 + #102 unplumbed-surface quartet. Also #100 + #102 as pure doctor-surface-coverage 2-way. Filed in response to Clawhip pinpoint nudge 1494797126041862285 in #clawcode-building-in-public.
2026-06-10 11:13:33 +08:00 · 2026-04-18 05:34:30 +09:00 · 2026-04-18 05:34:30 +09:00 · 6a16f0824d
commit 6a16f0824d
parent eabd257968
1 changed files with 76 additions and 0 deletions
--- a/ROADMAP.md
+++ b/ROADMAP.md
@ -2451,3 +2451,79 @@ ear], /color [scheme], /effort [low|medium|high], /fast, /summary, /tag [label],
     **Blocker.** None. Parser-unification is ~30 lines. Env rejection is ~15 lines. Docs are ~10 lines. The broad-vs-narrow accepted-set decision is the only architectural question and can be resolved by checking existing user configs for alias usage; if `dontAsk` / `plan` / etc. are uncommon, narrow the set; if common, keep broad.

     **Source.** Jobdori dogfood 2026-04-18 against `/tmp/cdV` on main HEAD `d63d58f` in response to Clawhip pinpoint nudge at `1494789577687437373`. Joins the **permission-audit sweep** (#50 / #87 / #91 / #94 / #97 / #101) on the env-var axis — the third and final permission-mode input surface. #50 (merge-edge cases), #87 (fresh-workspace default), #91 (CLI↔config parser mismatch), #94 (permission-rule validation), #97 (tool-allow-list), and now #101 (env-var silent fail-open) together audit every input surface for permission configuration. Cross-cluster with **silent-flag / documented-but-unenforced** (#96–#100) but qualitatively worse than that bundle: this is fail-OPEN, not fail-inert. And cross-cluster with **truth-audit** (#80–#87, #89, #100) because the operator has no way to verify the resolved permission_mode's source. Natural bundle: the six-way permission-audit sweep (#50 + #87 + #91 + #94 + #97 + **#101**) — the end-state cleanup that closes the entire permission-input attack surface in one pass.
+
+102. **`claw mcp list` / `claw mcp show` / `claw doctor` surface MCP servers at *configure-time* only — no preflight, no liveness probe, not even a `command-exists-on-PATH` check. A `.claw.json` pointing at `/does/not/exist` as an MCP server command cheerfully reports `found: true` in `mcp show`, `configured_servers: 1` in `mcp list`, `MCP servers: 1` in `doctor` config check, and `status: ok` overall. The actual reachability / startup failure only surfaces when the agent tries to *use* a tool from that server mid-turn — exactly the diagnostic surprise the Roadmap's Phase 2 §4 "Canonical lane event schema" and Product Principle #5 "Partial success is first-class" were written to avoid** — dogfooded 2026-04-18 on main HEAD `eabd257` from `/tmp/cdW2`. A three-server config with 2 broken commands currently shows up everywhere as "Config: ok, MCP servers: 3." An orchestrating claw cannot tell from JSON alone which of its tool surfaces will actually respond.
+
+     **Concrete repro.**
+     ```
+     $ cd /tmp/cdW2 && git init -q .
+     $ cat > .claw.json <<'JSON'
+     {
+       "mcpServers": {
+         "unreachable": {
+           "command": "/does/not/exist",
+           "args": []
+         }
+       }
+     }
+     JSON
+     $ claw --output-format json mcp list | jq '.servers[0].summary, .configured_servers'
+     "/does/not/exist"
+     1
+     # mcp list reports 1 configured server, no status field, no reachability probe
+
+     $ claw --output-format json mcp show unreachable | jq '.found, .server.details.command'
+     true
+     "/does/not/exist"
+     # `found: true` for a command that doesn't exist on disk — the "finding" is purely config-level
+
+     $ claw --output-format json doctor | jq '.checks[] | select(.name == "config") | {status, summary, details}'
+     {
+       "status": "ok",
+       "summary": "runtime config loaded successfully",
+       "details": [
+         "Config files      loaded 1/1",
+         "MCP servers       1",
+         "Discovered file   /private/tmp/cdW2/.claw.json"
+       ]
+     }
+     # doctor: all ok. The broken server is invisible.
+
+     $ claw --output-format json doctor | jq '.summary, .has_failures'
+     {"failures": 0, "ok": 4, "total": 6, "warnings": 2}
+     false
+     # has_failures: false, despite a 100%-unreachable MCP server
+     ```
+
+     **Trace path.**
+     - `rust/crates/rusty-claude-cli/src/main.rs:1701-1780` — `check_config_health` is the doctor check that touches MCP config. It counts configured servers via `runtime_config.mcp().servers().len()` and emits `MCP servers: {n}` in the detail list. It does not invoke any MCP startup helper, not even a "does this command resolve on PATH" stub. No separate `check_mcp_health` exists.
+     - `rust/crates/rusty-claude-cli/src/main.rs` — `render_doctor_report` assembles six checks: `auth`, `config`, `install_source`, `workspace`, `sandbox`, `system`. No MCP-specific check. No plugin-liveness check. No tool-surface-health check.
+     - `rust/crates/commands/src/lib.rs` — the `mcp list` / `mcp show` handlers format the config-side representation of each server (transport, command, args, env_keys, tool_call_timeout_ms). The output includes `summary: <command>` and `scope: {id, label}` but no `status` / `reachable` / `startup_state` field. `found` in `mcp show` is strictly config-presence, not runtime presence.
+     - `rust/crates/runtime/src/mcp_stdio.rs` — the MCP startup machinery exists and has its own error types. It knows how to `spawn()` and how to detect startup failures. But these paths are only invoked at turn-execution time, when the agent actually calls an MCP tool — too late for a pre-flight.
+     - `rust/crates/runtime/src/config.rs:953-1000` — `parse_mcp_server_config` and `parse_mcp_remote_server_config` validate the shape of the config entry (required fields, valid transport kinds) but perform no filesystem or network touch. A `command: "/does/not/exist"` parses fine.
+     - Verified absence: `grep -rn "Command::new\(...\).arg\(.*--version\).*mcp\|which\|std::fs::metadata\(.*command\)" rust/crates/commands/ rust/crates/runtime/src/mcp_stdio.rs rust/crates/rusty-claude-cli/src/main.rs` returns zero hits. No code exists anywhere that cheaply checks "does this MCP command exist on the filesystem or PATH?"
+
+     **Why this is specifically a clawability gap.**
+     1. *Roadmap Phase 2 §4 prescribes this exact surface.* The canonical lane event schema includes `lane.ready` and contract-level startup signals. Phase 1 §3.5 ("Boot preflight / doctor contract") explicitly lists "MCP config presence and server reachability expectations" as a required preflight check. Phase 4.4.4 ("Event provenance / environment labeling") expects MCP startup to emit typed success/failure events. The doctor surface is today the machine-readable foothold for all three of those product principles and it reports config presence only.
+     2. *Product Principle #5 "Partial success is first-class"* says "MCP startup can succeed for some servers and fail for others, with structured degraded-mode reporting." Today's doctor JSON has no field to express per-server liveness. There is no `servers[].startup_state`, `servers[].reachable`, `servers[].last_error`, `degraded_mode: bool`, or `partial_startup_count`.
+     3. *Sibling of #100.* #100 is "commit identity missing from status/doctor JSON — machinery exists but is JSON-invisible." #102 is the same shape on the MCP axis: the startup machinery exists in `runtime::mcp_stdio`, doctor only surfaces config-time counts. Both are "subsystem present, JSON-invisible."
+     4. *A trivial first tranche is free.* `which(command)` on stdio servers, `TcpStream::connect(url, 1s timeout)` on http/sse servers — each is <10 lines and would already classify every "totally broken" vs "actually wired up" server. No full MCP handshake required to give a huge clawability win.
+     5. *Undetected-breakage amplification.* A claw that reads `doctor` → `ok` and relies on an MCP tool will discover the breakage only when the LLM actually tries to call that tool, burning tokens on a failed tool call and forcing a retry loop. Preflight would catch this at lane-spawn time, before any tokens are spent.
+     6. *Config parser already validated shape, never content.* `parse_mcp_server_config` catches type errors (`url: 123` rejected, per the tests at `config.rs:1745`). But it never reaches out of the JSON to touch the filesystem. A typo like `command: "/usr/local/bin/mcp-servr"` (missing `e`) is indistinguishable from a working config.
+
+     **Fix shape — add a cheap MCP preflight to doctor + expose per-server reachability in `mcp list`.**
+     1. *Add `check_mcp_health` to the doctor check set.* Iterate over `runtime_config.mcp().servers()`. For stdio transport, run `which(command)` (or `std::fs::metadata(command)` if the command looks like an absolute path). For http/sse transport, attempt a 1s-timeout TCP connect (not a full handshake). Aggregate results: `ok` if all servers resolve, `warn` if some resolve, `fail` if none resolve. Emit per-server detail lines:
+        ```
+        MCP server       {name}        {resolved|command_not_found|connect_timeout|...}
+        ```
+        ~50 lines.
+     2. *Expose per-server `status` in `mcp list` / `mcp show` JSON.* Add a `status: "configured"|"resolved"|"command_not_found"|"connect_refused"|"startup_failed"` field to each server entry. Do NOT do a full handshake in list/show by default — those are meant to be cheap. Add a `--probe` flag for callers that want the deeper check. ~30 lines.
+     3. *Populate `degraded_mode: bool` and `startup_summary` at the top-level doctor JSON.* Matches Product Principle #5's "partial success is first-class." ~10 lines.
+     4. *Wire the preflight into the prompt/repl bootstrap path.* When a lane starts, emit a one-time `mcp_preflight` event with the resolved status of each configured server. Feeds the Phase 2 §4 lane event schema directly. ~20 lines.
+     5. *Regression tests.* One per reachability state. One for partial startup (one server resolves, one fails). One for all-resolved. One for zero-servers (should not invent a warning).
+
+     **Acceptance.** `claw doctor --output-format json` on a workspace with a broken MCP server (`command: "/does/not/exist"`) emits `{status: "warn"|"fail", degraded_mode: true, servers: [{name, status: "command_not_found", ...}]}`. `claw mcp list` exposes per-server `status` distinguishing `configured` from `resolved`. A lane that reads `doctor` can tell whether all its MCP surfaces will respond before burning its first token on a tool call.
+
+     **Blocker.** None. The cheapest tier (`which` / absolute-path existence check) is ~10 lines per server transport class and closes the "command doesn't exist on disk" gap entirely. Deeper handshake probes can be added later behind an opt-in `--probe` flag.
+
+     **Source.** Jobdori dogfood 2026-04-18 against `/tmp/cdW2` on main HEAD `eabd257` in response to Clawhip pinpoint nudge at `1494797126041862285`. Joins the **unplumbed-subsystem** cross-cluster with #78 (`claw plugins` route never constructed) and #100 (stale-base JSON-invisible) — same shape: machinery exists, diagnostic surface doesn't expose it. Joins **truth-audit / diagnostic-integrity** (#80-#84, #86, #87, #89, #100) because `doctor: ok` is a lie when MCP servers are unreachable. Directly implements the roadmap's own Phase 1 §3.5 (boot preflight), Phase 2 §4 (canonical lane events), Phase 4.4.4 (event provenance), and Product Principle #5 (partial success is first-class). Natural bundle: **#78 + #100 + #102** (unplumbed-surface quartet, now with #96) — four surfaces where the subsystem exists but the JSON diagnostic doesn't expose it; tight family PR. Also **#100 + #102** as the pure "doctor surface coverage" 2-way: #100 surfaces commit identity, #102 surfaces MCP reachability, together they let `claw doctor` actually live up to its name.