diff --git a/ROADMAP.md b/ROADMAP.md index 010269e..6a9b5e4 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -4815,7 +4815,19 @@ ear], /color [scheme], /effort [low|medium|high], /fast, /summary, /tag [label], **Source.** Jobdori dogfood 2026-04-20 against `/tmp/claw-dogfood` (env-cleaned, no git, no config) on main HEAD `7370546` in response to Clawhip pinpoint nudge at `1495620050424434758`. Joins **Silent-flag / documented-but-unenforced** (#96–#101, #104, #108, #111, #115, #116, #117, #118, #119, #121, #122, #123, #124, #126) as 18th — `--json` silently swallowed into Prompt dispatch instead of being recognized or rejected. Joins **Parser-level trust gap quintet** (#108, #117, #119, #122, **#127**) as 5th — same `_other => Prompt` fall-through arm, fifth distinct entry case (#108 = typoed verb, #117 = `-p` greedy, #119 = bare slash + arg, #122 = `--base-commit` greedy, **#127 = valid verb + unrecognized suffix arg**). Joins **Cred-error misdirection / failure-classification gaps** as a sibling of #99 (system-prompt unvalidated) — same family of "local diagnostic verb pretends to need API creds." Joins **Truth-audit / diagnostic-integrity** (#80–#87, #89, #100, #102, #103, #105, #107, #109, #110, #112, #114, #115, #125) — `claw --help` lies about per-verb accepted flags. Joins **Parallel-entry-point asymmetry** (#91, #101, #104, #105, #108, #114, #117, #122, #123, #124) as 11th — three working forms and one broken form for the same logical intent (`--json` doctor output). Joins **Claude Code migration parity** (#103, #109, #116) as 4th — Claude Code's `--json` convention shorthand is unrecognized in claw-code's verb-suffix position; users migrating get cred errors instead. Cross-cluster with **README/USAGE doc-vs-implementation gap** — README explicitly recommends `claw doctor` as the first health check; the natural JSON form of that exact command is broken. Natural bundle: **#108 + #117 + #119 + #122 + #127** — parser-level trust gap quintet: complete `_other => Prompt` fall-through audit (typoed verb + greedy `-p` + bare slash-verb + greedy `--base-commit` + valid verb + unrecognized suffix). Also **#99 + #127** — local-diagnostic cred-error misdirection pair: `system-prompt` and verb-suffix `--json` both pretend to need creds for pure-local operations. Also **#126 + #127** — diagnostic-verb surface integrity pair: `/config` section args ignored (#126) + verb-suffix args silently mis-dispatched (#127). Session tally: ROADMAP #127. -128. **`claw --model ` (spaces, empty string, special chars, invalid provider/model syntax) silently falls through to API-layer cred error instead of rejecting at parse time** — dogfooded 2026-04-20 on main HEAD `d284ef7` from a fresh environment (no config, no auth). The `--model` flag accepts any string without syntactic validation: spaces (`claw --model "bad model"`), empty strings (`claw --model ""`), special characters (`claw --model "@invalid"`), non-existent provider/model combinations all parse successfully. The malformed model string then flows into the runtime's provider-detection layer, which silently accepts it as Anthropic fallback or passes it to an API layer that fails with `missing Anthropic credentials` (misdirection) rather than a clear "invalid model syntax" error at parse time. With API credentials configured, a malformed model string gets sent to the API, billing tokens against a request that should have failed client-side. +128. **[CLOSED 2026-04-21]** **`claw --model ` (spaces, empty string, special chars, invalid provider/model syntax) silently falls through to API-layer cred error instead of rejecting at parse time** — dogfooded 2026-04-20 on main HEAD `d284ef7` from a fresh environment (no config, no auth). The `--model` flag accepts any string without syntactic validation: spaces (`claw --model "bad model"`), empty strings (`claw --model ""`), special characters (`claw --model "@invalid"`), non-existent provider/model combinations all parse successfully. The malformed model string then flows into the runtime's provider-detection layer, which silently accepts it as Anthropic fallback or passes it to an API layer that fails with `missing Anthropic credentials` (misdirection) rather than a clear "invalid model syntax" error at parse time. With API credentials configured, a malformed model string gets sent to the API, billing tokens against a request that should have failed client-side. + + **Closure (2026-04-21):** Re-verified on main HEAD `4cb8fa0`. All cases now rejected at parse time: + ``` + $ claw --model '' status → error: model string cannot be empty + $ claw --model 'bad model' status → error: invalid model syntax: 'bad model' contains spaces + $ claw --model 'sonet' status → error: invalid model syntax: 'sonet'. Expected provider/model ... + $ claw --model '@invalid' status → error: invalid model syntax: '@invalid'. Expected provider/model ... + $ claw --model 'totally-not-real-xyz' status → error: invalid model syntax ... + $ claw --model sonnet status → ok, resolves to claude-sonnet-4-6 + $ claw --model anthropic/claude-opus-4-6 status → ok, passes through + ``` + Validation happens in `validate_model_syntax()` before `resolve_model_alias_with_config()`. All `--model` and `--model=` parse paths call it. No API call ever reached with malformed input. Residual gap (model provenance in status JSON — raw input vs resolved value) was split off as #148 (see below). 129. **MCP server startup blocks credential validation — `claw ` with any `.claw.json` `mcpServers` entry awaits the MCP server's stdio handshake BEFORE checking whether the operator has Anthropic credentials. With no `ANTHROPIC_AUTH_TOKEN` / `ANTHROPIC_API_KEY` set and `mcpServers.everything = { command: "npx", args: ["-y", "@modelcontextprotocol/server-everything"] }` configured, the CLI hangs forever (verified via `timeout 30s` — still in MCP startup at 30s with three repeated `"Starting default (STDIO) server..."` lines), instead of fail-fasting with the same `missing Anthropic credentials` error that fires in milliseconds when no MCP is configured. A misconfigured-but-running MCP server (one that spawns successfully but never completes its `initialize` handshake) wedges every `claw ` invocation permanently. A misconfigured MCP server with a slow-but-eventually-succeeding init (npx download, container pull, network roundtrip) burns startup latency on every Prompt invocation regardless of whether the LLM call would even succeed. This is the runtime-side companion to #102's config-time MCP diagnostic gap: #102 says doctor doesn't surface MCP reachability; #129 says the Prompt path's reachability check is implicit, blocking, retried, and runs *before* the cheaper auth precondition that should run first** — dogfooded 2026-04-20 on main HEAD `d284ef7` from `/tmp/claw-mcp-test` with `env -i PATH=$PATH HOME=$HOME` (all auth env vars unset). @@ -5665,3 +5677,46 @@ With valid credentials, the empty string would be sent to Claude as a user promp **Blocker.** None. 5-line change in `parse_subcommand()`. **Source.** Jobdori dogfood 2026-04-21 20:32 KST on main HEAD `f877aca` in response to Clawhip nudge. Joins **prompt misdelivery** cluster (#145 sibling). Session tally: ROADMAP #147. + +## Pinpoint #148. `claw status` JSON shows resolved model but not raw input or source — post-hoc "why did my --model flag behave this way?" requires re-reading argv + +**Gap.** After #128 closed (malformed model strings now rejected at parse time), the residual provenance gap from the original #124 pinpoint remains: `claw status --output-format json` surfaces only the resolved model string. No trace of whether the user passed `--model sonnet` (alias → resolved), `--model anthropic/claude-opus-4-6` (pass-through), or relied on env/config default. A claw debugging "which model actually runs if I invoke this?" has to inspect argv instead of reading a structured field. + +**Verified on main HEAD `4cb8fa0` (2026-04-21 20:40 KST):** + +``` +$ claw --model sonnet --output-format json status | jq '{model}' +{"model": "claude-sonnet-4-6"} + +$ claw --model anthropic/claude-opus-4-6 --output-format json status | jq '{model}' +{"model": "anthropic/claude-opus-4-6"} + +# Same resolved value can come from three different sources; +# JSON envelope gives no way to distinguish. +``` + +**Why this is a clawability gap.** +1. **Loss of origin information**: alias resolution collapses `sonnet` and `claude-sonnet-4-6` and `{"aliases":{"x":"claude-sonnet-4-6"}}` + `--model x` into one string. Debug forensics has to read argv. +2. **Clawhip orchestration**: a clawhip dispatcher sending `--model` wants to confirm its flag was honored, not that the default kicked in (#105 model-resolution-source disagreement is adjacent). +3. **Truth-audit / diagnostic-integrity**: the status envelope is supposed to be the single source of truth for "what would this process run as". Missing provenance weakens the contract. + +**Fix shape (~50 lines).** Add two fields to status JSON: +- `model_source`: `"flag" | "env" | "config" | "default"` — where the model string came from. +- `model_raw`: the user's original input (pre-alias-resolution). Null when source is `default`. + +Text mode appends a line: `Model source flag (raw: sonnet)` or `Model source default`. + +Threading: parser already knows the source (it's the arm that sets `model`). Propagate `(model, model_raw, model_source)` tuple through `CliAction::Status` and into `StatusContext`. Env/default resolution paths are in `resolve_repl_model*` helpers. + +**Acceptance.** +- `claw --model sonnet --output-format json status` → `model: "claude-sonnet-4-6"`, `model_raw: "sonnet"`, `model_source: "flag"`. +- `claw --model anthropic/claude-opus-4-6 --output-format json status` → `model_raw: "anthropic/claude-opus-4-6"`, `model_source: "flag"`. +- `claw --output-format json status` (no flag) → `model_raw: null`, `model_source: "default"` (or `"env"` if `ANTHROPIC_MODEL` set; or `"config"` if `.claw.json` set `model`). +- Text mode shows same provenance. +- Regression test: parse_args + status_json_value roundtrip asserts each source value. + +**Blocker.** None. All resolution sites already exist; only plumbing + one serialization addition. + +**Not a regression of #128.** #128 was about rejecting malformed strings (now closed). #148 is about labeling the valid ones after resolution. + +**Source.** Jobdori dogfood 2026-04-21 20:40 KST on main HEAD `4cb8fa0` in response to Q's bundle hint. Split from historical #124 residual. Joins **truth-audit / diagnostic-integrity** cluster. Session tally: ROADMAP #148. diff --git a/rust/crates/rusty-claude-cli/src/main.rs b/rust/crates/rusty-claude-cli/src/main.rs index 7625ca4..f70902f 100644 --- a/rust/crates/rusty-claude-cli/src/main.rs +++ b/rust/crates/rusty-claude-cli/src/main.rs @@ -57,6 +57,96 @@ use tools::{ }; const DEFAULT_MODEL: &str = "claude-opus-4-6"; + +/// #148: Model provenance for `claw status` JSON/text output. Records where +/// the resolved model string came from so claws don't have to re-read argv +/// to audit whether their `--model` flag was honored vs falling back to env +/// or config or default. +#[derive(Debug, Clone, PartialEq, Eq)] +enum ModelSource { + /// Explicit `--model` / `--model=` CLI flag. + Flag, + /// ANTHROPIC_MODEL environment variable (when no flag was passed). + Env, + /// `model` key in `.claw.json` / `.claw/settings.json` (when neither + /// flag nor env set it). + Config, + /// Compiled-in DEFAULT_MODEL fallback. + Default, +} + +impl ModelSource { + fn as_str(&self) -> &'static str { + match self { + ModelSource::Flag => "flag", + ModelSource::Env => "env", + ModelSource::Config => "config", + ModelSource::Default => "default", + } + } +} + +#[derive(Debug, Clone, PartialEq, Eq)] +struct ModelProvenance { + /// Resolved model string (after alias expansion). + resolved: String, + /// Raw user input before alias resolution. None when source is Default. + raw: Option, + /// Where the resolved model string originated. + source: ModelSource, +} + +impl ModelProvenance { + fn default_fallback() -> Self { + Self { + resolved: DEFAULT_MODEL.to_string(), + raw: None, + source: ModelSource::Default, + } + } + + fn from_flag(raw: &str) -> Self { + Self { + resolved: resolve_model_alias_with_config(raw), + raw: Some(raw.to_string()), + source: ModelSource::Flag, + } + } + + fn from_env_or_config_or_default(cli_model: &str) -> Self { + // Only called when no --model flag was passed. Probe env first, + // then config, else fall back to default. Mirrors the logic in + // resolve_repl_model() but captures the source. + if cli_model != DEFAULT_MODEL { + // Already resolved from some prior path; treat as flag. + return Self { + resolved: cli_model.to_string(), + raw: Some(cli_model.to_string()), + source: ModelSource::Flag, + }; + } + if let Some(env_model) = env::var("ANTHROPIC_MODEL") + .ok() + .map(|value| value.trim().to_string()) + .filter(|value| !value.is_empty()) + { + return Self { + resolved: resolve_model_alias_with_config(&env_model), + raw: Some(env_model), + source: ModelSource::Env, + }; + } + if let Some(config_model) = config_model_for_current_dir() { + return Self { + resolved: resolve_model_alias_with_config(&config_model), + raw: Some(config_model), + source: ModelSource::Config, + }; + } + Self::default_fallback() + } +} + fn max_tokens_for_model(model: &str) -> u32 { if model.contains("opus") { 32_000 @@ -217,9 +307,10 @@ fn run() -> Result<(), Box> { } => resume_session(&session_path, &commands, output_format), CliAction::Status { model, + model_flag_raw, permission_mode, output_format, - } => print_status_snapshot(&model, permission_mode, output_format)?, + } => print_status_snapshot(&model, model_flag_raw.as_deref(), permission_mode, output_format)?, CliAction::Sandbox { output_format } => print_sandbox_status_snapshot(output_format)?, CliAction::Prompt { prompt, @@ -351,6 +442,10 @@ enum CliAction { }, Status { model: String, + // #148: raw `--model` flag input (pre-alias-resolution), if any. + // None means no flag was supplied; env/config/default fallback is + // resolved inside `print_status_snapshot`. + model_flag_raw: Option, permission_mode: PermissionMode, output_format: CliOutputFormat, }, @@ -447,6 +542,9 @@ impl CliOutputFormat { #[allow(clippy::too_many_lines)] fn parse_args(args: &[String]) -> Result { let mut model = DEFAULT_MODEL.to_string(); + // #148: when user passes --model/--model=, capture the raw input so we + // can attribute source: "flag" later. None means no flag was supplied. + let mut model_flag_raw: Option = None; let mut output_format = CliOutputFormat::Text; let mut permission_mode_override = None; let mut wants_help = false; @@ -496,12 +594,14 @@ fn parse_args(args: &[String]) -> Result { .ok_or_else(|| "missing value for --model".to_string())?; validate_model_syntax(value)?; model = resolve_model_alias_with_config(value); + model_flag_raw = Some(value.clone()); // #148 index += 2; } flag if flag.starts_with("--model=") => { let value = &flag[8..]; validate_model_syntax(value)?; model = resolve_model_alias_with_config(value); + model_flag_raw = Some(value.to_string()); // #148 index += 1; } "--output-format" => { @@ -683,7 +783,7 @@ fn parse_args(args: &[String]) -> Result { return action; } if let Some(action) = - parse_single_word_command_alias(&rest, &model, permission_mode_override, output_format) + parse_single_word_command_alias(&rest, &model, model_flag_raw.as_deref(), permission_mode_override, output_format) { return action; } @@ -885,6 +985,8 @@ fn is_help_flag(value: &str) -> bool { fn parse_single_word_command_alias( rest: &[String], model: &str, + // #148: raw --model flag input for status provenance. None = no flag. + model_flag_raw: Option<&str>, permission_mode_override: Option, output_format: CliOutputFormat, ) -> Option> { @@ -922,6 +1024,7 @@ fn parse_single_word_command_alias( "version" => Some(Ok(CliAction::Version { output_format })), "status" => Some(Ok(CliAction::Status { model: model.to_string(), + model_flag_raw: model_flag_raw.map(str::to_string), // #148 permission_mode: permission_mode_override.unwrap_or_else(default_permission_mode), output_format, })), @@ -3009,6 +3112,7 @@ fn run_resume_command( }, default_permission_mode().as_str(), &context, + None, // #148: resumed sessions don't have flag provenance )), json: Some(status_json_value( session.model.as_deref(), @@ -3021,6 +3125,7 @@ fn run_resume_command( }, default_permission_mode().as_str(), &context, + None, // #148: resumed sessions don't have flag provenance )), }) } @@ -4375,6 +4480,7 @@ impl LiveCli { }, self.permission_mode.as_str(), &status_context(Some(&self.session.path)).expect("status context should load"), + None, // #148: REPL /status doesn't carry flag provenance ) ); } @@ -5208,6 +5314,7 @@ fn render_repl_help() -> String { fn print_status_snapshot( model: &str, + model_flag_raw: Option<&str>, permission_mode: PermissionMode, output_format: CliOutputFormat, ) -> Result<(), Box> { @@ -5219,18 +5326,30 @@ fn print_status_snapshot( estimated_tokens: 0, }; let context = status_context(None)?; + // #148: resolve model provenance. If user passed --model, source is + // "flag" with the raw input preserved. Otherwise probe env -> config + // -> default and record the winning source. + let provenance = match model_flag_raw { + Some(raw) => ModelProvenance { + resolved: model.to_string(), + raw: Some(raw.to_string()), + source: ModelSource::Flag, + }, + None => ModelProvenance::from_env_or_config_or_default(model), + }; match output_format { CliOutputFormat::Text => println!( "{}", - format_status_report(model, usage, permission_mode.as_str(), &context) + format_status_report(&provenance.resolved, usage, permission_mode.as_str(), &context, Some(&provenance)) ), CliOutputFormat::Json => println!( "{}", serde_json::to_string_pretty(&status_json_value( - Some(model), + Some(&provenance.resolved), usage, permission_mode.as_str(), &context, + Some(&provenance), ))? ), } @@ -5242,6 +5361,12 @@ fn status_json_value( usage: StatusUsage, permission_mode: &str, context: &StatusContext, + // #148: optional provenance for `model` field. Surfaces `model_source` + // ("flag" | "env" | "config" | "default") and `model_raw` (user input + // before alias resolution, or null when source is "default"). Callers + // that don't have provenance (legacy resume paths) pass None, in which + // case both new fields are omitted. + provenance: Option<&ModelProvenance>, ) -> serde_json::Value { // #143: top-level `status` marker so claws can distinguish // a clean run from a degraded run (config parse failed but other fields @@ -5249,11 +5374,15 @@ fn status_json_value( // when present; it's a string rather than a typed object in Phase 1 and // will join the typed-error taxonomy in Phase 2 (ROADMAP §4.44). let degraded = context.config_load_error.is_some(); + let model_source = provenance.map(|p| p.source.as_str()); + let model_raw = provenance.and_then(|p| p.raw.clone()); json!({ "kind": "status", "status": if degraded { "degraded" } else { "ok" }, "config_load_error": context.config_load_error, "model": model, + "model_source": model_source, + "model_raw": model_raw, "permission_mode": permission_mode, "usage": { "messages": usage.message_count, @@ -5352,6 +5481,10 @@ fn format_status_report( usage: StatusUsage, permission_mode: &str, context: &StatusContext, + // #148: optional model provenance to surface in a `Model source` line. + // Callers without provenance (legacy resume paths) pass None and the + // source line is omitted for backward compat. + provenance: Option<&ModelProvenance>, ) -> String { // #143: if config failed to parse, surface a degraded banner at the top // of the text report so humans see the parse error before the body, while @@ -5368,10 +5501,21 @@ fn format_status_report( "Config load error\n Status fail\n Summary runtime config failed to load; reporting partial status\n Details {err}\n Hint `claw doctor` classifies config parse errors; fix the listed field and rerun" )); } + // #148: render Model source line after Model, showing where the string + // came from (flag / env / config / default) and the raw input if any. + let model_source_line = provenance + .map(|p| match &p.raw { + Some(raw) if raw != model => { + format!("\n Model source {} (raw: {raw})", p.source.as_str()) + } + Some(_) => format!("\n Model source {}", p.source.as_str()), + None => format!("\n Model source {}", p.source.as_str()), + }) + .unwrap_or_default(); blocks.extend([ format!( "{status_line} - Model {model} + Model {model}{model_source_line} Permission mode {permission_mode} Messages {} Turns {} @@ -9786,6 +9930,50 @@ mod tests { typo_err.starts_with("unknown subcommand:"), "typo guard should fire for 'sttaus', got: {typo_err}" ); + // #148: `--model` flag must be captured as model_flag_raw so status + // JSON can report provenance (source: flag, raw: ). + match parse_args(&[ + "--model".to_string(), + "sonnet".to_string(), + "status".to_string(), + ]) + .expect("--model sonnet status should parse") + { + CliAction::Status { + model, + model_flag_raw, + .. + } => { + assert_eq!(model, "claude-sonnet-4-6", "sonnet alias should resolve"); + assert_eq!( + model_flag_raw.as_deref(), + Some("sonnet"), + "raw flag input should be preserved" + ); + } + other => panic!("expected CliAction::Status, got: {other:?}"), + } + // --model= form should also capture raw. + match parse_args(&[ + "--model=anthropic/claude-opus-4-6".to_string(), + "status".to_string(), + ]) + .expect("--model=... status should parse") + { + CliAction::Status { + model, + model_flag_raw, + .. + } => { + assert_eq!(model, "anthropic/claude-opus-4-6"); + assert_eq!( + model_flag_raw.as_deref(), + Some("anthropic/claude-opus-4-6"), + "--model= form should also preserve raw input" + ); + } + other => panic!("expected CliAction::Status, got: {other:?}"), + } } #[test] @@ -9971,7 +10159,7 @@ mod tests { cumulative: runtime::TokenUsage::default(), estimated_tokens: 0, }; - let json = super::status_json_value(Some("test-model"), usage, "workspace-write", &context); + let json = super::status_json_value(Some("test-model"), usage, "workspace-write", &context, None); assert_eq!( json.get("status").and_then(|v| v.as_str()), Some("degraded"), @@ -9999,7 +10187,7 @@ mod tests { }); assert!(clean_context.config_load_error.is_none()); let clean_json = - super::status_json_value(Some("test-model"), usage, "workspace-write", &clean_context); + super::status_json_value(Some("test-model"), usage, "workspace-write", &clean_context, None); assert_eq!( clean_json.get("status").and_then(|v| v.as_str()), Some("ok"), @@ -10073,6 +10261,7 @@ mod tests { parse_args(&["status".to_string()]).expect("status should parse"), CliAction::Status { model: DEFAULT_MODEL.to_string(), + model_flag_raw: None, // #148: no --model flag passed permission_mode: PermissionMode::DangerFullAccess, output_format: CliOutputFormat::Text, } @@ -11147,6 +11336,7 @@ mod tests { sandbox_status: runtime::SandboxStatus::default(), config_load_error: None, }, + None, // #148 ); assert!(status.contains("Status")); assert!(status.contains("Model claude-sonnet"));