658 Commits

Author SHA1 Message Date
YeonGyu-Kim
79352a2d20 feat: #152 — hint --output-format json when user types --json on diagnostic verbs
## Problem

Users commonly type `claw doctor --json`, `claw status --json`, or
`claw system-prompt --json` expecting JSON output. These fail with
`unrecognized argument \`--json\` for subcommand` with no hint that
`--output-format json` is the correct flag.

## Discovery

Filed as #152 during 21:17 dogfood nudge. The #127 worktree contained
a more comprehensive patch but conflicted with #141 (unified --help).
On re-investigation of main, Bugs 1 and 3 from #127 are already closed
(positional arg rejection works, no double "error:" prefix). Only
Bug 2 (the `--json` hint) remained.

## Fix

Two call sites add the hint:

1. `parse_single_word_command_alias`'s diagnostic-verb suffix path:
   when rest[1] == "--json", append "Did you mean \`--output-format json\`?"

2. `parse_system_prompt_options` unknown-option path: same hint when
   the option is exactly `--json`.

## Verification

Before:
  $ claw doctor --json
  error: unrecognized argument `--json` for subcommand `doctor`
  Run `claw --help` for usage.

After:
  $ claw doctor --json
  error: unrecognized argument `--json` for subcommand `doctor`
  Did you mean `--output-format json`?
  Run `claw --help` for usage.

Covers: `doctor --json`, `status --json`, `sandbox --json`,
`system-prompt --json`, and any other diagnostic verb that routes
through `parse_single_word_command_alias`.

Other unrecognized args (`claw doctor garbage`) correctly don't
trigger the hint.

## Tests

- 2 new assertions in `parses_multiple_diagnostic_subcommands`:
  - `claw doctor --json` produces hint
  - `claw doctor garbage` does NOT produce hint
- 177 rusty-claude-cli tests pass
- Workspace tests green

Closes ROADMAP #152.
2026-04-21 21:23:17 +09:00
YeonGyu-Kim
7bc66e86e8 feat: #151 — canonicalize workspace path in SessionStore::from_cwd/data_dir
## Problem

`workspace_fingerprint(path)` hashes the raw path string without
canonicalization. Two equivalent paths (e.g. `/tmp/foo` vs
`/private/tmp/foo` on macOS) produce different fingerprints and
therefore different session stores. #150 fixed the test-side symptom;
this fixes the underlying product contract.

## Discovery path

#150 fix (canonicalize in test) was a workaround. Q's ack on #150
surfaced the deeper gap: the function itself is still fragile for
any caller passing a non-canonical path:

1. Embedded callers with a raw `--data-dir` path
2. Programmatic `SessionStore::from_cwd(user_path)` calls
3. NixOS store paths, Docker bind mounts, case-insensitive normalization

The REPL's default flow happens to work because `env::current_dir()`
returns canonical paths on macOS. But any caller passing a raw path
risks silent session-store divergence.

## Fix

Canonicalize inside `SessionStore::from_cwd()` and `from_data_dir()`
before computing the fingerprint. Kept `workspace_fingerprint()` itself
as a pure function for determinism — canonicalization is the entry
point's responsibility.

```rust
let canonical_cwd = fs::canonicalize(cwd).unwrap_or_else(|_| cwd.to_path_buf());
let sessions_root = canonical_cwd.join(".claw").join("sessions").join(workspace_fingerprint(&canonical_cwd));
```

Falls back to the raw path if canonicalize fails (directory doesn't
exist yet).

## Test-side updates

Three legacy-session tests expected the non-canonical base path to
match the store's workspace_root. Updated them to canonicalize
`base` after creation — same defensive pattern as #150, now
explicit across all three tests.

## Regression test

Added `session_store_from_cwd_canonicalizes_equivalent_paths` that
creates two stores from equivalent paths (raw vs canonical) and
asserts they resolve to the same sessions_dir.

## Verification

- `cargo test -p runtime session_store_` — 9/9 pass
- `cargo test --workspace` — all green, no FAILED markers
- No behavior change for existing users (REPL default flow already
  used canonical paths)

## Backward compatibility

Users on macOS who always went through `env::current_dir()`:
no hash change, sessions resume identically.

Users who ever called with a non-canonical path: hash would change,
but those sessions were already broken (couldn't be resumed from a
canonical-path cwd). Net improvement.

Closes ROADMAP #151.
2026-04-21 21:06:09 +09:00
YeonGyu-Kim
eaa077bf91 fix: #150 — eliminate symlink canonicalization flake in resume_latest test + file #246 (reminder outcome ambiguity)
## #150 Fix: resume_latest test flake

**Problem:** `resume_latest_restores_the_most_recent_managed_session` intermittently
fails when run in the workspace suite or multiple times in sequence, but passes in
isolation.

**Root cause:** `workspace_fingerprint(path)` hashes the path string without
canonicalization. On macOS, `/tmp` is a symlink to `/private/tmp`. The test
creates a temp dir via `std::env::temp_dir().join(...)` which returns
`/var/folders/...` (non-canonical). When the subprocess spawns,
`env::current_dir()` returns the canonical path `/private/var/folders/...`.
The two fingerprints differ, so the subprocess looks in
`.claw/sessions/<hash1>` while files are in `.claw/sessions/<hash2>`.
Session discovery fails.

**Fix:** Call `fs::canonicalize(&project_dir)` after creating the directory
to ensure test and subprocess use identical path representations.

**Verification:** 5 consecutive runs of the full test suite — all pass.
Previously: 5/5 failed when run in sequence.

## #246 Filing: Reminder cron outcome ambiguity (control-loop blocker)

The `clawcode-dogfood-cycle-reminder` cron times out repeatedly with no
structured feedback on whether the nudge was delivered, skipped, or died in-flight.

**Phase 1 outcome schema** — add explicit field to cron result:
- `delivered` — nudge posted to Discord
- `timed_out_before_send` — died before posting
- `timed_out_after_send` — posted but cleanup timed out
- `skipped_due_to_active_cycle` — previous cycle active
- `aborted_gateway_draining` — daemon shutdown

Assigned to gaebal-gajae (cron/orchestration domain). Unblocks trustworthy
dogfood cycle observability.

Closes ROADMAP #150. Filed ROADMAP #246.
2026-04-21 21:01:09 +09:00
YeonGyu-Kim
bc259ec6f9 fix: #149 — eliminate parallel-test flake in runtime::config tests
## Problem

`runtime::config::tests::validates_unknown_top_level_keys_with_line_and_field_name`
intermittently fails during `cargo test --workspace` (witnessed during
#147 and #148 workspace runs) but passes deterministically in isolation.

Example failure from workspace run:
  test result: FAILED. 464 passed; 1 failed

## Root cause

`runtime/src/config.rs::tests::temp_dir()` used nanosecond timestamp
alone for namespace isolation:

  std::env::temp_dir().join(format!("runtime-config-{nanos}"))

Under parallel test execution on fast machines with coarse clock
resolution, two tests start within the same nanosecond bucket and
collide on the same path. One test's `fs::remove_dir_all(root)` then
races another's in-flight `fs::create_dir_all()`.

Other crates already solved this pattern:
- plugins::tests::temp_dir(label) — label-parameterized
- runtime::git_context::tests::temp_dir(label) — label-parameterized

runtime/src/config.rs was missed.

## Fix

Added process id + monotonically-incrementing atomic counter to the
namespace, making every callsite provably unique regardless of clock
resolution or scheduling:

  static COUNTER: AtomicU64 = AtomicU64::new(0);
  let pid = std::process::id();
  let seq = COUNTER.fetch_add(1, Ordering::Relaxed);
  std::env::temp_dir().join(format!("runtime-config-{pid}-{nanos}-{seq}"))

Chose counter+pid over the label-parameterized pattern to avoid
touching all 20 callsites in the same commit (mechanical noise with
no added safety — counter alone is sufficient).

## Verification

Before: one failure per workspace run (config test flake).
After: 5 consecutive `cargo test --workspace` runs — zero config
test failures. Only pre-existing `resume_latest` flake remains
(orthogonal, unrelated to this change).

  for i in 1 2 3 4 5; do cargo test --workspace; done
  # All 5 runs: config tests green. Only resume_latest flake appears.

  cargo test -p runtime
  # 465 passed; 0 failed

## ROADMAP.md

Added Pinpoint #149 documenting the gap, root cause, and fix.

Closes ROADMAP #149.
2026-04-21 20:54:12 +09:00
YeonGyu-Kim
f84c7c4ed5 feat: #148 + #128 closure — model provenance in claw status JSON/text
## Scope

Two deltas in one commit:

### #128 closure (docs)

Re-verified on main HEAD `4cb8fa0`: malformed `--model` strings already
rejected at parse time (`validate_model_syntax` in parse_args). All
historical repro cases now produce specific errors:

  claw --model ''                       → error: model string cannot be empty
  claw --model 'bad model'              → error: invalid model syntax: 'bad model' contains spaces
  claw --model 'sonet'                  → error: invalid model syntax: 'sonet'. Expected provider/model or known alias
  claw --model '@invalid'               → error: invalid model syntax: '@invalid'. Expected provider/model ...
  claw --model 'totally-not-real-xyz'   → error: invalid model syntax: ...
  claw --model sonnet                   → ok, resolves to claude-sonnet-4-6
  claw --model anthropic/claude-opus-4-6 → ok, passes through

Marked #128 CLOSED in ROADMAP with repro block. Residual provenance gap
split off as #148.

### #148 implementation

**Problem.** After #128 closure, `claw status --output-format json`
still surfaces only the resolved model string. No way for a claw to
distinguish whether `claude-sonnet-4-6` came from `--model sonnet`
(alias resolution) vs `--model claude-sonnet-4-6` (pass-through) vs
`ANTHROPIC_MODEL` env vs `.claw.json` config vs compiled-in default.

Debug forensics had to re-read argv instead of reading a structured
field. Clawhip orchestrators sending `--model` couldn't confirm the
flag was honored vs falling back to default.

**Fix.** Added two fields to status JSON envelope:
- `model_source`: "flag" | "env" | "config" | "default"
- `model_raw`: user's input before alias resolution (null on default)

Text mode appends a `Model source` line under `Model`, showing the
source and raw input (e.g. `Model source     flag (raw: sonnet)`).

**Resolution order** (mirrors resolve_repl_model but with source
attribution):
1. If `--model` / `--model=` flag supplied → source: flag, raw: flag value
2. Else if ANTHROPIC_MODEL set → source: env, raw: env value
3. Else if `.claw.json` model key set → source: config, raw: config value
4. Else → source: default, raw: null

## Changes

### rust/crates/rusty-claude-cli/src/main.rs

- Added `ModelSource` enum (Flag/Env/Config/Default) with `as_str()`.
- Added `ModelProvenance` struct (resolved, raw, source) with
  three constructors: `default_fallback()`, `from_flag(raw)`, and
  `from_env_or_config_or_default(cli_model)`.
- Added `model_flag_raw: Option<String>` field to `CliAction::Status`.
- Parse loop captures raw input in `--model` and `--model=` arms.
- Extended `parse_single_word_command_alias` to thread
  `model_flag_raw: Option<&str>` through.
- Extended `print_status_snapshot` signature to accept
  `model_flag_raw: Option<&str>`. Resolves provenance at dispatch time
  (flag provenance from arg; else probe env/config/default).
- Extended `status_json_value` signature with
  `provenance: Option<&ModelProvenance>`. On Some, adds `model_source`
  and `model_raw` fields; on None (legacy resume paths), omits them
  for backward compat.
- Extended `format_status_report` signature with optional provenance.
  On Some, renders `Model source` line after `Model`.
- Updated all existing callers (REPL /status, resume /status, tests)
  to pass None (legacy paths don't carry flag provenance).
- Added 2 regression assertions in parse_args test covering both
  `--model sonnet` and `--model=...` forms.

### ROADMAP.md

- Marked #128 CLOSED with re-verification block.
- Filed #148 documenting the provenance gap split, fix shape, and
  acceptance criteria.

## Live verification

$ claw --model sonnet --output-format json status | jq '{model,model_source,model_raw}'
{"model": "claude-sonnet-4-6", "model_source": "flag", "model_raw": "sonnet"}

$ claw --output-format json status | jq '{model,model_source,model_raw}'
{"model": "claude-opus-4-6", "model_source": "default", "model_raw": null}

$ ANTHROPIC_MODEL=haiku claw --output-format json status | jq '{model,model_source,model_raw}'
{"model": "claude-haiku-4-5-20251213", "model_source": "env", "model_raw": "haiku"}

$ echo '{"model":"claude-opus-4-7"}' > .claw.json && claw --output-format json status | jq '{model,model_source,model_raw}'
{"model": "claude-opus-4-7", "model_source": "config", "model_raw": "claude-opus-4-7"}

$ claw --model sonnet status
Status
  Model            claude-sonnet-4-6
  Model source     flag (raw: sonnet)
  Permission mode  danger-full-access
  ...

## Tests

- rusty-claude-cli bin: 177 tests pass (2 new assertions for #148)
- Full workspace green except pre-existing resume_latest flake (unrelated)

Closes ROADMAP #128, #148.
2026-04-21 20:48:46 +09:00
YeonGyu-Kim
4cb8fa059a feat: #147 — reject empty / whitespace-only prompts at CLI fallthrough
## Problem

The `"prompt"` subcommand arm enforced `if prompt.trim().is_empty()`
and returned a specific error. The fallthrough `other` arm in the same
match block — which routes any unrecognized first positional arg to
`CliAction::Prompt` — had no such guard. Result:

$ claw ""
error: missing Anthropic credentials; export ANTHROPIC_AUTH_TOKEN ...

$ claw "   "
error: missing Anthropic credentials; ...

$ claw "" ""
error: missing Anthropic credentials; ...

$ claw --output-format json ""
{"error":"missing Anthropic credentials; ...","type":"error"}

An empty prompt should never reach the credentials check. Worse: with
valid credentials, the literal empty string gets sent to Claude as a
user prompt, either burning tokens for nothing or triggering a model-
side refusal. Same prompt-misdelivery family as #145.

## Root cause

In `parse_subcommand()`, the final `other =>` arm in the top-level
match only guards against typos (#108 guard via `looks_like_subcommand_typo`)
and then unconditionally builds `CliAction::Prompt { prompt: rest.join(" ") }`.
An empty/whitespace-only join passes through.

## Changes

### rust/crates/rusty-claude-cli/src/main.rs

Added the same `if joined.trim().is_empty()` guard already used in the
`"prompt"` arm to the fallthrough path. Error message distinguishes it
from the `prompt` subcommand path:

  empty prompt: provide a subcommand (run `claw --help`) or a
  non-empty prompt string

Runs AFTER the typo guard (so `claw sttaus` still suggests `status`)
and BEFORE CliAction::Prompt construction (so no network call ever
happens for empty inputs).

### Regression tests

Added 4 assertions in the existing parse_args test:
- parse_args([""]) → Err("empty prompt: ...")
- parse_args(["   "]) → Err("empty prompt: ...")
- parse_args(["", ""]) → Err("empty prompt: ...")
- parse_args(["sttaus"]) → Err("unknown subcommand: ...") [verifies #108 typo guard still takes precedence]

### ROADMAP.md

Added Pinpoint #147 documenting the gap, verification, root cause,
fix shape, and acceptance. Joins the prompt-misdelivery cluster
alongside #145.

## Live verification

$ claw ""
error: empty prompt: provide a subcommand (run `claw --help`) or a non-empty prompt string

$ claw "   "
error: empty prompt: provide a subcommand (run `claw --help`) or a non-empty prompt string

$ claw --output-format json ""
{"error":"empty prompt: provide a subcommand ...","type":"error"}

$ claw prompt ""   # unchanged: subcommand-specific error preserved
error: prompt subcommand requires a prompt string

$ claw hello        # unchanged: typo guard still fires
error: unknown subcommand: hello.
  Did you mean     help

$ claw "real prompt here"   # unchanged: real prompts still reach API
error: api returned 401 Unauthorized (with dummy key, as expected)

All empty/whitespace-only paths exit 1. No network call. No misleading
credentials error.

## Tests

- rusty-claude-cli bin: 177 tests pass (4 new assertions)
- Full workspace green except pre-existing resume_latest flake (unrelated)

Closes ROADMAP #147.
2026-04-21 20:35:17 +09:00
YeonGyu-Kim
f877acacbf feat: #146 — wire claw config and claw diff as standalone subcommands
## Problem

`claw config` and `claw diff` are pure-local read-only introspection
commands (config merges .claw.json + .claw/settings.json from disk; diff
shells out to `git diff --cached` + `git diff`). Neither needs a session
context, yet both rejected direct CLI invocation:

$ claw config
error: `claw config` is a slash command. Use `claw --resume SESSION.jsonl /config` ...

$ claw diff
error: `claw diff` is a slash command. ...

This forced clawing operators to spin up a full session just to inspect
static disk state, and broke natural pipelines like
`claw config --output-format json | jq`.

## Root cause

Sibling of #145: `SlashCommand::Config { section }` and
`SlashCommand::Diff` had working renderers (`render_config_report`,
`render_config_json`, `render_diff_report`, `render_diff_json_for`)
exposed for resume sessions, but the top-level CLI parser in
`parse_subcommand()` had no arms for them. Zero-arg `config`/`diff`
hit `parse_single_word_command_alias`'s fallback to
`bare_slash_command_guidance`, producing the misleading guidance.

## Changes

### rust/crates/rusty-claude-cli/src/main.rs

- Added `CliAction::Config { section, output_format }` and
  `CliAction::Diff { output_format }` variants.
- Added `"config"` / `"diff"` arms to the top-level parser in
  `parse_subcommand()`. `config` accepts an optional section name
  (env|hooks|model|plugins) matching SlashCommand::Config semantics.
  `diff` takes no positional args. Both reject extra trailing args
  with a clear error.
- Added `"config" | "diff" => None` to
  `parse_single_word_command_alias` so bare invocations fall through
  to the new parser arms instead of the slash-guidance error.
- Added dispatch in run() that calls existing renderers: text mode uses
  `render_config_report` / `render_diff_report`; JSON mode uses
  `render_config_json` / `render_diff_json_for` with
  `serde_json::to_string_pretty`.
- Added 5 regression assertions in parse_args test covering:
  parse_args(["config"]), parse_args(["config", "env"]),
  parse_args(["config", "--output-format", "json"]),
  parse_args(["diff"]), parse_args(["diff", "--output-format", "json"]).

### ROADMAP.md

Added Pinpoint #146 documenting the gap, verification, root cause,
fix shape, and acceptance. Explicitly notes which other slash commands
(`hooks`, `usage`, `context`, etc.) are NOT candidates because they
are session-state-modifying.

## Live verification

$ claw config   # no config files
Config
  Working directory /private/tmp/cd-146-verify
  Loaded files      0
  Merged keys       0
Discovered files
  user    missing ...
  project missing ...
  local   missing ...
Exit 0.

$ claw config --output-format json
{
  "cwd": "...",
  "files": [...],
  ...
}

$ claw diff   # no git
Diff
  Result           no git repository
  Detail           ...
Exit 0.

$ claw diff --output-format json   # inside claw-code
{
  "kind": "diff",
  "result": "changes",
  "staged": "",
  "unstaged": "diff --git ..."
}
Exit 0.

## Tests

- rusty-claude-cli bin: 177 tests pass (5 new assertions in parse_args)
- Full workspace green except pre-existing resume_latest flake (unrelated)

## Not changed

`hooks`, `usage`, `context`, `tasks`, `theme`, `voice`, `rename`,
`copy`, `color`, `effort`, `branch`, `rewind`, `ide`, `tag`,
`output-style`, `add-dir` — all session-mutating or interactive-only;
correctly remain slash-only.

Closes ROADMAP #146.
2026-04-21 20:07:28 +09:00
YeonGyu-Kim
7d63699f9f feat: #145 — wire claw plugins subcommand to CLI parser (prompt misdelivery fix)
## Problem

`claw plugins` (and `claw plugins list`, `claw plugins --help`,
`claw plugins info <name>`, etc.) fell through the top-level subcommand
match and got routed into the prompt-execution path. Result: a purely
local introspection command triggered an Anthropic API call and surfaced
`missing Anthropic credentials` to the user. With valid credentials, it
would actually send the literal string "plugins" as a user prompt to
Claude, burning tokens for a local query.

$ claw plugins
error: missing Anthropic credentials; export ANTHROPIC_AUTH_TOKEN or ANTHROPIC_API_KEY before calling the Anthropic API

$ ANTHROPIC_API_KEY=dummy claw plugins
⠋ 🦀 Thinking...
✘  Request failed
error: api returned 401 Unauthorized

Meanwhile siblings (`agents`, `mcp`, `skills`) all worked correctly:

$ claw agents
No agents found.
$ claw mcp
MCP
  Working directory ...
  Configured servers 0

## Root cause

`CliAction::Plugins` exists, has a working dispatcher
(`LiveCli::print_plugins`), and is produced inside the REPL via
`SlashCommand::Plugins`. But the top-level CLI parser in
`parse_subcommand()` had arms for `agents`, `mcp`, `skills`, `status`,
`doctor`, `init`, `export`, `prompt`, etc., and **no arm for
`plugins`**. The dispatch never ran from the CLI entry point.

## Changes

### rust/crates/rusty-claude-cli/src/main.rs

Added a `"plugins"` arm to the top-level match in `parse_subcommand()`
that produces `CliAction::Plugins { action, target, output_format }`,
following the same positional convention as `mcp` (`action` = first
positional, `target` = second). Rejects >2 positional args with a clear
error.

Added four regression assertions in the existing `parse_args` test:
- `plugins` alone → `CliAction::Plugins { action: None, target: None }`
- `plugins list` → action: Some("list"), target: None
- `plugins enable <name>` → action: Some("enable"), target: Some(...)
- `plugins --output-format json` → action: None, output_format: Json

### ROADMAP.md

Added Pinpoint #145 documenting the gap, verification, root cause,
fix shape, and acceptance.

## Live verification

$ claw plugins   # no credentials set
Plugins
  example-bundled      v0.1.0      disabled
  sample-hooks         v0.1.0      disabled

$ claw plugins --output-format json   # no credentials set
{
  "action": "list",
  "kind": "plugin",
  "message": "Plugins\n  example-bundled ...\n  sample-hooks ...",
  "reload_runtime": false,
  "target": null
}

Exit 0 in all modes. No network call. No "missing credentials" error.

## Tests

- rusty-claude-cli bin: 177 tests pass (new plugin assertions included)
- Full workspace green except pre-existing resume_latest flake (unrelated)

Closes ROADMAP #145.
2026-04-21 19:36:49 +09:00
YeonGyu-Kim
faeaa1d30c feat: #144 phase 1 + ROADMAP filing — claw mcp degrades gracefully on malformed config
Filing + Phase 1 fix in one commit (sibling of #143).

## Context

With #143 Phase 1 landed (`claw status` degrades), `claw mcp` was the
remaining diagnostic surface that hard-failed on a malformed `.claw.json`.
Same input, same parse error, same partial-success violation. Fresh
dogfood at 18:59 KST caught it on main HEAD `e2a43fc`.

## Changes

### ROADMAP.md
Added Pinpoint #144 documenting the gap and acceptance criteria. Joins
the partial-success / Principle #5 cluster with #143.

### rust/crates/commands/src/lib.rs
`render_mcp_report_for()` + `render_mcp_report_json_for()` now catch the
ConfigError at loader.load() instead of propagating:

- **Text mode** prepends a "Config load error" block (same shape as
  #143's status output) before the MCP listing. The listing still renders
  with empty servers so the output structure is preserved.
- **JSON mode** adds top-level `status: "ok" | "degraded"` +
  `config_load_error: string | null` fields alongside existing fields
  (`kind`, `action`, `working_directory`, `configured_servers`,
  `servers[]`). On clean runs, `status: "ok"` and
  `config_load_error: null`. On parse failure, `status: "degraded"`,
  `config_load_error: "..."`, `servers: []`, exit 0.
- Both list and show actions get the same treatment.

### Regression test
`commands::tests::mcp_degrades_gracefully_on_malformed_mcp_config_144`:
- Injects the same malformed .claw.json as #143 (one valid + one broken
  mcpServers entry).
- Asserts mcp list returns Ok (not Err).
- Asserts top-level status: "degraded" and config_load_error names the
  malformed field path.
- Asserts show action also degrades.
- Asserts clean path returns status: "ok" with config_load_error null.

## Live verification

$ claw mcp --output-format json
{
  "action": "list",
  "kind": "mcp",
  "status": "degraded",
  "config_load_error": ".../.claw.json: mcpServers.missing-command: missing string field command",
  "working_directory": "/Users/yeongyu/clawd",
  "configured_servers": 0,
  "servers": []
}
Exit 0.

## Contract alignment after this commit

All three diagnostic surfaces match now:
- `doctor` — degraded envelope with typed check entries 
- `status` — degraded envelope with config_load_error  (#143)
- `mcp` — degraded envelope with config_load_error  (this commit)

Phase 2 (typed-error object joining taxonomy §4.44) tracked separately
across all three surfaces.

Full workspace test green except pre-existing resume_latest flake (unrelated).

Closes ROADMAP #144 phase 1.
2026-04-21 19:07:17 +09:00
YeonGyu-Kim
e2a43fcd49 feat: #143 phase 1 — claw status degrades gracefully on malformed config
Previously `claw status` hard-failed on any config parse error, emitting
a bare error string and exiting 1. This took down the entire health
surface for a single malformed MCP entry, even though workspace, git,
model, permission, and sandbox state could all be reported independently.

`claw doctor` already degraded gracefully on the exact same input.
This commit matches `claw status` to that contract.

Changes:
- Add `StatusContext::config_load_error: Option<String>` to capture parse
  errors without aborting.
- Rewrite `status_context()` to match on `ConfigLoader::load()`: on Err,
  fall back to default `SandboxConfig` for sandbox resolution and record
  the parse error, then continue populating workspace/git/memory fields.
- JSON output gains top-level `status: "ok" | "degraded"` marker and a
  `config_load_error` string (null on clean runs). All other existing
  fields preserved for backward compat.
- Text output prepends a "Config load error" block with Details + Hint
  when config failed to parse, then a "Status (degraded)" header on the
  main block. Clean runs show the usual "Status" header.
- Doctor path updated to pass the config load error through StatusContext.

Regression test `status_degrades_gracefully_on_malformed_mcp_config_143`:
- Injects a .claw.json with one valid + one malformed mcpServers entry
- Asserts status_context() returns Ok (not Err)
- Asserts config_load_error names the malformed field path
- Asserts workspace/sandbox fields still populated in JSON
- Asserts top-level status is 'degraded'
- Asserts clean config path still returns status: 'ok'

Verified live on /Users/yeongyu/clawd (contains deliberately broken MCP entries):
  $ claw status --output-format json
  { "status": "degraded",
    "config_load_error": ".../mcpServers.missing-command: missing string field command",
    "model": "claude-opus-4-6",
    "workspace": {...},
    "sandbox": {...},
    ... }

Phase 2 (typed error object joining #4.44 taxonomy) tracked separately.

Full workspace test green except pre-existing resume_latest flake (unrelated).

Closes ROADMAP #143 phase 1.
2026-04-21 18:37:42 +09:00
YeonGyu-Kim
541c5bb95d feat: #139 actionable worker-state guidance in claw state error + help
Previously `claw state` errored with "no worker state file found ... — run a
worker first" but there is no `claw worker` subcommand, so claws had no
discoverable path from the error to a fix.

Changes:
- Rewrite the missing-state error to name the two concrete commands that
  produce .claw/worker-state.json:
    * `claw` (interactive REPL, writes state on first turn)
    * `claw prompt <text>` (one non-interactive turn)
  Also tell the user what to rerun: `claw state [--output-format json]`.
- Expand the State --help topic with "Produces state", "Observes state",
  and "Exit codes" lines so the worker-state contract is discoverable
  before the user hits the error.
- Add regression test state_error_surfaces_actionable_worker_commands_139
  asserting the error contains `claw prompt`, REPL mention, and the
  rerun path, plus that the help topic documents the producer contract.

Verified live:
  $ claw state
  error: no worker state file found at .claw/worker-state.json
    Hint: worker state is written by the interactive REPL or a non-interactive prompt.
    Run:   claw               # start the REPL (writes state on first turn)
    Or:    claw prompt <text> # run one non-interactive turn
    Then rerun: claw state [--output-format json]

JSON mode preserves the full hint inside the error envelope so CI/claws
can match on `claw prompt` without losing the canonical prefix.

Full workspace test green except pre-existing resume_latest flake (unrelated).

Closes ROADMAP #139.
2026-04-21 18:04:04 +09:00
YeonGyu-Kim
611eed1537 feat: #142 structured fields in claw init --output-format json
Previously `claw init --output-format json` emitted a valid JSON envelope but
packed the entire human-formatted output into a single `message` string. Claw
scripts had to substring-match human language to tell `created` from `skipped`.

Changes:
- Add InitStatus::json_tag() returning machine-stable "created"|"updated"|"skipped"
  (unlike label() which includes the human " (already exists)" suffix).
- Add InitReport::NEXT_STEP constant so claws can read the next-step hint
  without grepping the message string.
- Add InitReport::artifacts_with_status() to partition artifacts by state.
- Add InitReport::artifact_json_entries() for the structured artifacts[] array.
- Rewrite run_init + init_json_value to emit first-class fields alongside the
  legacy message string (kept for text consumers): project_path, created[],
  updated[], skipped[], artifacts[], next_step, message.
- Update the slash-command Init dispatch to use the same structured JSON.
- Add regression test artifacts_with_status_partitions_fresh_and_idempotent_runs
  asserting both fresh + idempotent runs produce the right partitioning and
  that the machine-stable tag is bare 'skipped' not label()'s phrasing.

Verified output:
- Fresh dir: created[] has 4 entries, skipped[] empty
- Idempotent call: created[] empty, skipped[] has 4 entries
- project_path, next_step as first-class keys
- message preserved verbatim for backward compat

Full workspace test green except pre-existing resume_latest flake (unrelated).

Closes ROADMAP #142.
2026-04-21 17:42:00 +09:00
YeonGyu-Kim
7763ca3260 feat: #141 unify claw <subcommand> --help contract across all 14 subcommands
Previously, `claw <subcommand> --help` had 5 different behaviors:
- 7 subcommands returned subcommand-specific help (correct)
- init/export/state/version silently fell back to global `claw --help`
- system-prompt/dump-manifests errored with `unknown <cmd> option: --help`
- bootstrap-plan printed its phase list instead of help text

Changes:
- Extend LocalHelpTopic enum with Init, State, Export, Version, SystemPrompt,
  DumpManifests, BootstrapPlan variants.
- Extend parse_local_help_action() to resolve those 7 subcommands to their
  local help topic instead of falling through to the main dispatch.
- Remove init/state/export/version from the explicit wants_help=true matcher
  so they reach parse_local_help_action() before being routed to global help.
- Add render_help_topic() entries for the 7 new topics with consistent
  Usage/Purpose/Output/Formats/Related structure.
- Add regression test subcommand_help_flag_has_one_contract_across_all_subcommands_141
  asserting every documented subcommand + both --help and -h variants resolve
  to a HelpTopic with non-empty text that contains a Usage line.

Verification:
- All 14 subcommands now return subcommand-specific help (live dogfood).
- Full workspace test green except pre-existing resume_latest flake.

Closes ROADMAP #141.
2026-04-21 17:36:48 +09:00
YeonGyu-Kim
27ffd75f03 fix: #140 isolate test cwd + env in punctuation_bearing_single_token test
Previously this test inherited the cargo test runner's CWD, which could contain
a stale .claw/settings.json with "permissionMode": "acceptEdits" written by
another test. The deprecated-field resolver then silently downgraded the
default permission mode to WorkspaceWrite, breaking the test's assertion.

Fix: wrap the assertion in with_current_dir() + env_lock() so the test runs in
an isolated temp directory with no stale config.

Full workspace test now passes except for pre-existing resume_latest flake
(unrelated to #140, environment-dependent, tracked separately).

Closes ROADMAP #140.
2026-04-21 16:34:58 +09:00
YeonGyu-Kim
f3f6643fb9 feat: #108 add did-you-mean guard for subcommand typos (prevents silent LLM dispatch)
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-21 15:37:58 +09:00
YeonGyu-Kim
a8beca1463 fix: #136 support --output-format json with --compact flag
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-21 14:47:15 +09:00
YeonGyu-Kim
21adae9570 fix: #137 update test fixtures to use canonical 'opus' alias for main branch consistency
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-21 14:32:49 +09:00
YeonGyu-Kim
50e3fa3a83 docs: add --output-format to diagnostic verb help text
Updated LocalHelpTopic help strings to surface --output-format support:
- Status, Sandbox, Doctor, Acp all now show [--output-format <format>]
- Added 'Formats: text (default), json' line to each

Diagnostic verbs support JSON output but help text didn't advertise it.
Post-#127 fix: help text now matches actual CLI surface.

Verified: cargo build passes, claw doctor --help shows output-format.

Refs: #127
2026-04-20 21:32:02 +09:00
YeonGyu-Kim
a3270db602 fix: #127 reject unrecognized suffix args for diagnostic verbs
Diagnostic verbs (help, version, status, sandbox, doctor, state) now
reject unrecognized suffix arguments at parse time instead of silently
falling through to Prompt dispatch.

Fixes: claw doctor --json (and similar) no longer accepts --json silently
and attempts to send it to the LLM as a prompt. Now properly emits:
'unrecognized argument `--json` for subcommand `doctor`'

Joined parser-level trust gap quintet #108 + #117 + #119 + #122 + #127.
Prevents token burn on rejected arguments.

Verified: cargo build --workspace passes, claw doctor --json errors cleanly.

Refs: #127, ROADMAP
2026-04-20 19:23:35 +09:00
YeonGyu-Kim
12f1f9a74e feat: wire ship.prepared provenance emission at bash execution boundary
Adds ship provenance detection and emission in execute_bash_async():
- Detects git push to main/master commands
- Captures current branch, HEAD commit, git user as actor
- Emits ship.prepared event with ShipProvenance payload
- Logs to stderr as interim routing (event stream integration pending)

This is the first wired provenance event — schema (§4.44.5) now has
runtime emission at actual git operation boundary.

Verified: cargo build --workspace passes.
Next: wire ship.commits_selected, ship.merged, ship.pushed_main events.

Refs: §4.44.5.1, ROADMAP #4.44.5
2026-04-20 17:03:28 +09:00
YeonGyu-Kim
2678fa0af5 fix: #124 --model validation rejects malformed syntax at parse time
Adds validate_model_syntax() that rejects:
- Empty strings
- Strings with spaces (e.g., 'bad model')
- Invalid provider/model format

Accepts:
- Known aliases (opus, sonnet, haiku)
- Valid provider/model format (provider/model)

Wired into parse_args for both --model <value> and --model=<value> forms.
Errors exit with clear message before any API calls (no token burn).

Verified:
- 'claw --model "bad model" version' → error, exit 1
- 'claw --model "" version' → error, exit 1
- 'claw --model opus version' → works
- 'claw --model anthropic/claude-opus-4-6 version' → works

Refs: ROADMAP #124 (debbcbe cluster — parser-level trust gap family)
2026-04-20 16:32:17 +09:00
YeonGyu-Kim
b9990bb27c fix: #122 + #125 doctor consistency and git_state clarity
#122: doctor invocation now checks stale-base condition
- Calls run_stale_base_preflight(None) in render_doctor_report()
- Emits stale-base warnings to stderr when branch is behind main
- Fixes inconsistency: doctor 'ok' vs prompt 'stale base' warning

#125: git_state field reflects non-git directories
- When !in_git_repo, git_state = 'not in git repo' instead of 'clean'
- Fixes contradiction: in_git_repo: false but git_state: 'clean'
- Applied in both doctor text output and status JSON

Verified: cargo build --workspace passes.

Refs: ROADMAP #122 (dd73962), #125 (debbcbe)
2026-04-20 16:13:43 +09:00
YeonGyu-Kim
f33c315c93 fix: #122 doctor invocation now checks stale-base condition
Adds run_stale_base_preflight(None) call to render_doctor_report() so that
claw doctor emits stale-base warnings to stderr when the current branch is
behind main. Previously doctor reported 'ok' even when branch was stale,
creating inconsistency with prompt path warnings.

Fixes silent-state inventory gap: doctor now consistent with prompt/repl
stale-base checking. No behavior change for non-stale branches.

Verified: cargo build --workspace passes, no test failures.

Ref: ROADMAP #122 dogfood filing @ dd73962
2026-04-20 15:49:56 +09:00
YeonGyu-Kim
8a8ca8a355 ROADMAP #4.44.5: Ship/provenance events — implement §4.44.5
Adds structured ship provenance surface to eliminate delivery-path opacity:

New lane events:
- ship.prepared — intent to ship established
- ship.commits_selected — commit range locked
- ship.merged — merge completed with provenance
- ship.pushed_main — delivery to main confirmed

ShipProvenance struct carries:
- source_branch, base_commit
- commit_count, commit_range
- merge_method (direct_push/fast_forward/merge_commit/squash_merge/rebase_merge)
- actor, pr_number

Constructor methods added to LaneEvent for all four ship events.

Tests:
- Wire value serialization for ship events
- Round-trip deserialization
- Canonical event name coverage

Runtime: 465 tests pass
ROADMAP updated with IMPLEMENTED status

This closes the gap where 56 commits pushed to main had no structured
provenance trail — now emits first-class events for clawhip consumption.
2026-04-20 15:06:50 +09:00
YeonGyu-Kim
b0b579ebe9 ROADMAP #133: Blocked-state subphase contract — implement §6.5
Adds BlockedSubphase enum with 7 variants for structured blocked-state reporting:
- blocked.trust_prompt — trust gate blockers
- blocked.prompt_delivery — prompt misdelivery
- blocked.plugin_init — plugin startup failures
- blocked.mcp_handshake — MCP connection issues
- blocked.branch_freshness — stale branch blockers
- blocked.test_hang — test timeout/hang
- blocked.report_pending — report generation stuck

LaneEventBlocker now carries optional subphase field that gets serialized
into LaneEvent data. Enables clawhip to route recovery without pane scraping.

Updates:
- lane_events.rs: BlockedSubphase enum, LaneEventBlocker.subphase field
- lane_events.rs: blocked()/failed() constructors with subphase serialization
- lib.rs: Export BlockedSubphase
- tools/src/lib.rs: classify_lane_blocker() with subphase: None
- Test imports and fixtures updated

Backward-compatible: subphase is Option<>, existing events continue to work.
2026-04-20 15:04:08 +09:00
YeonGyu-Kim
c956f78e8a ROADMAP #4.44.5: Ship/provenance opacity — filed from dogfood
Added structured delivery-path contract to surface branch → merge → main-push
provenance as first-class events. Filed from the 56-commit 2026-04-20 push
that exposed the gap.

Also fixes: ApiError test compilation — add suggested_action: None to 4 sites

- Line ~8414: opaque_provider_wrapper_surfaces_failure_class_session_and_trace
- Line ~8436: retry_exhaustion_uses_retry_failure_class_for_generic_provider_wrapper
- Line ~8499: provider_context_window_errors_are_reframed_with_same_guidance
- Line ~8533: retry_wrapped_context_window_errors_keep_recovery_guidance
2026-04-20 14:35:07 +09:00
Yeachan-Heo
00d0eb61d4 US-024: Add token limit metadata for kimi models
Add ModelTokenLimit entries for kimi-k2.5 and kimi-k1.5 to enable
preflight context window validation. Per Moonshot AI documentation:
- Context window: 256,000 tokens
- Max output: 16,384 tokens

Includes 3 unit tests:
- returns_context_window_metadata_for_kimi_models
- kimi_alias_resolves_to_kimi_k25_token_limits
- preflight_blocks_oversized_requests_for_kimi_models

All tests pass, clippy clean.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-17 04:15:38 +00:00
Yeachan-Heo
d037f9faa8 Fix strip_routing_prefix to handle kimi provider prefix (US-023)
Add "kimi" to the strip_routing_prefix matches so that models like
"kimi/kimi-k2.5" have their prefix stripped before sending to the
DashScope API (consistent with qwen/openai/xai/grok handling).

Also add unit test strip_routing_prefix_strips_kimi_provider_prefix.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-16 19:50:15 +00:00
Yeachan-Heo
cec8d17ca8 Implement US-023: Add automatic routing for kimi models to DashScope
Changes in rust/crates/api/src/providers/mod.rs:
- Add 'kimi' alias to MODEL_REGISTRY resolving to 'kimi-k2.5' with DashScope config
- Add kimi/kimi- prefix routing to DashScope endpoint in metadata_for_model()
- Add resolve_model_alias() handling for kimi -> kimi-k2.5
- Add unit tests: kimi_prefix_routes_to_dashscope, kimi_alias_resolves_to_kimi_k2_5

Users can now use:
- --model kimi (resolves to kimi-k2.5)
- --model kimi-k2.5 (auto-routes to DashScope)
- --model kimi/kimi-k2.5 (explicit provider prefix)

All 127 tests pass, clippy clean.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-16 19:44:21 +00:00
Yeachan-Heo
4cb1db9faa Implement US-022: Enhanced error context for API failures
Add structured error context to API failures:
- Request ID tracking across retries with full context in error messages
- Provider-specific error code mapping with actionable suggestions
- Suggested user actions for common error types (401, 403, 413, 429, 500, 502-504)
- Added suggested_action field to ApiError::Api variant
- Updated enrich_bearer_auth_error to preserve suggested_action

Files changed:
- rust/crates/api/src/error.rs: Add suggested_action field, update Display
- rust/crates/api/src/providers/openai_compat.rs: Add suggested_action_for_status()
- rust/crates/api/src/providers/anthropic.rs: Update error handling

All tests pass, clippy clean.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-16 19:15:00 +00:00
Yeachan-Heo
5e65b33042 US-021: Add request body size pre-flight check for OpenAI-compatible provider 2026-04-16 17:41:57 +00:00
Yeachan-Heo
87b982ece5 US-011: Performance optimization for API request serialization
Added criterion benchmarks and optimized flatten_tool_result_content:
- Added criterion dev-dependency and request_building benchmark suite
- Optimized flatten_tool_result_content to pre-allocate capacity and avoid
  intermediate Vec construction (was collecting to Vec then joining)
- Made key functions public for benchmarking: translate_message,
  build_chat_completion_request, flatten_tool_result_content,
  is_reasoning_model, model_rejects_is_error_field

Benchmark results:
- flatten_tool_result_content/single_text: ~17ns
- translate_message/text_only: ~200ns
- build_chat_completion_request/10 messages: ~16.4µs
- is_reasoning_model detection: ~26-42ns

All 119 unit tests and 29 integration tests pass.
cargo clippy passes

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-16 11:11:45 +00:00
Yeachan-Heo
3e4e1585b5 US-009: Add comprehensive unit tests for kimi model compatibility fix
Added 4 unit tests to verify is_error field handling for kimi models:
- model_rejects_is_error_field_detects_kimi_models: Detects kimi-k2.5, kimi-k1.5, dashscope/kimi-k2.5 (case insensitive)
- translate_message_includes_is_error_for_non_kimi_models: Verifies gpt-4o, grok-3, claude include is_error
- translate_message_excludes_is_error_for_kimi_models: Verifies kimi models exclude is_error (prevents 400 Bad Request)
- build_chat_completion_request_kimi_vs_non_kimi_tool_results: Full integration test for request building

All 119 unit tests and 29 integration tests pass.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-16 10:54:48 +00:00
Yeachan-Heo
866ae7562c Fix formatting in task_packet.rs for CI 2026-04-16 09:35:18 +00:00
Yeachan-Heo
1d5748f71f US-005: Typed task packet format with TaskScope enum
- Add TaskScope enum with Workspace, Module, SingleFile, Custom variants
- Update TaskPacket struct with scope_path and worktree fields
- Add validation for scope-specific requirements
- Fix tests in task_packet.rs, task_registry.rs, and tools/src/lib.rs
- Export TaskScope from runtime crate

Closes US-005 (Phase 4)
2026-04-16 09:28:42 +00:00
Yeachan-Heo
77fb62a9f1 Implement LaneEvent schema extensions for event ordering, provenance, and dedupe (US-002)
Adds comprehensive metadata support to LaneEvent for the canonical lane event schema:

- EventProvenance enum: live_lane, test, healthcheck, replay, transport
- SessionIdentity: title, workspace, purpose, with placeholder support
- LaneOwnership: owner, workflow_scope, watcher_action (Act/Observe/Ignore)
- LaneEventMetadata: seq, provenance, session_identity, ownership, nudge_id,
  event_fingerprint, timestamp_ms
- LaneEventBuilder: fluent API for constructing events with full metadata
- is_terminal_event(): detects Finished, Failed, Superseded, Closed, Merged
- compute_event_fingerprint(): deterministic fingerprint for terminal events
- dedupe_terminal_events(): suppresses duplicate terminal events by fingerprint

Provides machine-readable event provenance, session identity at creation,
monotonic sequence ordering, nudge deduplication, and terminal event suppression.

Adds 10 regression tests covering:
- Monotonic sequence ordering
- Provenance serialization round-trip
- Session identity completeness
- Ownership and workflow scope binding
- Watcher action variants
- Terminal event detection
- Fingerprint determinism and uniqueness
- Terminal event deduplication
- Builder construction with metadata
- Metadata serialization round-trip

Closes Phase 2 (partial) from ROADMAP.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-16 09:12:31 +00:00
Yeachan-Heo
21909da0b5 Implement startup-no-evidence evidence bundle + classifier (US-001)
Adds typed worker.startup_no_evidence event with evidence bundle when worker
startup times out. The classifier attempts to down-rank the vague bucket into
specific failure classifications:
- trust_required
- prompt_misdelivery
- prompt_acceptance_timeout
- transport_dead
- worker_crashed
- unknown

Evidence bundle includes:
- Last known worker lifecycle state
- Pane/command being executed
- Prompt-send timestamp
- Prompt-acceptance state
- Trust-prompt detection result
- Transport health summary
- MCP health summary
- Elapsed seconds since worker creation

Includes 6 regression tests covering:
- Evidence bundle serialization
- Transport dead classification
- Trust required classification
- Prompt acceptance timeout
- Worker crashed detection
- Unknown fallback

Closes Phase 1.6 from ROADMAP.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-16 09:05:33 +00:00
Yeachan-Heo
ac45bbec15 Make ACP/Zed status obvious before users go source-diving
ROADMAP #21, #22, and #23 were already closed on current main, so the next real repo-local backlog item was the ACP/Zed discoverability gap. This adds a local `claw acp` status surface plus aliases, updates help/docs, and separates the shipped discoverability fix from the still-open daemon/protocol follow-up so editor-first users get a crisp answer immediately.

Constraint: No ACP/Zed daemon or protocol server exists in claw-code yet, so the new surface must be explicit status guidance rather than a fake implementation
Rejected: Add a pretend `acp serve` daemon path | would imply supported protocol behavior that does not exist
Rejected: Docs-only clarification | still leaves `claw --help` unable to answer the editor-launch question directly
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Keep ROADMAP discoverability fixes separate from future ACP daemon/protocol work so help text and backlog IDs stay unambiguous
Tested: cargo fmt --check; cargo clippy --workspace --all-targets -- -D warnings; cargo test --workspace; cargo run -q -p rusty-claude-cli -- acp; cargo run -q -p rusty-claude-cli -- --output-format json acp; architect review APPROVED
Not-tested: Real ACP/Zed daemon launch because no protocol-serving surface exists yet
2026-04-16 03:13:50 +00:00
Yeachan-Heo
e874bc6a44 Improve malformed hook failures so operators can diagnose broken JSON
Malformed hook stdout that looks like JSON was collapsing into low-signal failure text during hook execution. This change preserves plain-text hook feedback for normal text hooks, but upgrades malformed JSON-like output into an explicit hook_invalid_json diagnostic that includes phase, tool, command, and bounded stdout/stderr previews. It also adds a regression test for malformed-but-nonempty output.

Constraint: User scoped the implementation to rust/crates/runtime/src/hooks.rs and tests only
Constraint: Existing plain-text hook feedback must remain intact for non-JSON hook output
Rejected: Treat every non-JSON stdout payload as invalid JSON | would break legitimate plain-text hook feedback
Confidence: high
Scope-risk: narrow
Directive: Keep malformed-hook diagnostics bounded and preserve the plain-text fallback for hooks that intentionally emit text
Tested: cargo test --manifest-path rust/Cargo.toml -p runtime hooks::tests:: -- --nocapture
Tested: cargo test --manifest-path rust/Cargo.toml -p runtime -- --nocapture
Tested: cargo clippy --manifest-path rust/Cargo.toml -p runtime --all-targets -- -D warnings
Not-tested: Full workspace clippy/test sweep outside runtime crate
2026-04-13 12:44:52 +00:00
Yeachan-Heo
6a957560bd Make recovery handoffs explain why a lane resumed instead of leaking control prose
Recent OMX dogfooding kept surfacing raw `[OMX_TMUX_INJECT]`
messages as lane results, which told operators that tmux reinjection
happened but not why or what lane/state it applied to. The lane-finished
persistence path now recognizes that control prose, stores structured
recovery metadata, and emits a human-meaningful fallback summary instead
of preserving the raw marker as the primary result.

Constraint: Keep the fix in the existing lane-finished metadata surface rather than inventing a new runtime channel
Rejected: Treat all reinjection prose as ordinary quality-floor mush | loses the recovery cause and target lane operators actually need
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Recovery classification is heuristic; extend the parser only when new operator phrasing shows up in real dogfood evidence
Tested: cargo fmt --all --check
Tested: cargo clippy --workspace --all-targets -- -D warnings
Tested: cargo test --workspace
Tested: LSP diagnostics on rust/crates/tools/src/lib.rs (0 errors)
Tested: Architect review (APPROVE)
Not-tested: Additional reinjection phrasings beyond the currently observed `[OMX_TMUX_INJECT]` / current-mode-state variants
Related: ROADMAP #68
2026-04-12 15:50:39 +00:00
Yeachan-Heo
42bb6cdba6 Keep local clawhip artifacts from tripping routine repo work
Dogfooding kept reproducing OMX team merge conflicts on
`.clawhip/state/prompt-submit.json`, so the init bootstrap now
teaches repos to ignore `.clawhip/` alongside the existing local
`.claw/` artifacts. This also updates the current repo ignore list
so the fix helps immediately instead of only on future `claw init`
runs.

Constraint: Keep the fix narrow and centered on repo-local ignore hygiene
Rejected: Broader team merge-hygiene changes | unnecessary for the proven local root cause
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: If more runtime-local artifact directories appear, extend the shared init gitignore list instead of patching repos ad hoc
Tested: cargo fmt --all --check
Tested: cargo clippy --workspace --all-targets -- -D warnings
Tested: cargo test --workspace
Tested: Architect review (APPROVE)
Not-tested: Existing clones with already-tracked `.clawhip` files still need manual cleanup
Related: ROADMAP #75
2026-04-12 14:47:40 +00:00
Yeachan-Heo
f91d156f85 Keep poisoned test locks from cascading across unrelated regressions
The repo-local backlog was effectively exhausted, so this sweep promoted the
newly observed test-lock poisoning pain point into ROADMAP #74 and fixed it in
place. Test-only env/cwd lock acquisition now recovers poisoned mutexes in the
remaining strict call sites, and each affected surface has a regression that
proves a panic no longer permanently poisons later tests.

Constraint: Keep the fix test-only and avoid widening runtime behavior changes
Rejected: Refactor shared helper signatures across broader call paths | unnecessary churn beyond the remaining strict test sites
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: These guards only recover the mutex; tests that mutate env or cwd still must restore process-global state explicitly
Tested: cargo fmt --all --check
Tested: cargo clippy --workspace --all-targets -- -D warnings
Tested: cargo test --workspace
Tested: Architect review (APPROVE)
Not-tested: Additional fault-injection around partially restored env/cwd state after panic
Related: ROADMAP #74
2026-04-12 13:52:41 +00:00
Yeachan-Heo
6b4bb4ac26 Keep finished lanes from leaving stale reminders armed
The next repo-local sweep target was ROADMAP #66: reminder/cron
state could stay enabled after the associated lane had already
finished, which left stale nudges firing into completed work. The
fix teaches successful lane persistence to disable matching enabled
cron entries and record which reminder ids were shut down on the
finished event.

Constraint: Preserve existing cron/task registries and add the shutdown behavior only on the successful lane-finished path
Rejected: Add a separate reminder-cleanup command that operators must remember to run | leaves the completion leak unfixed at the source
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: If cron-matching heuristics change later, update `disable_matching_crons`, its regression, and the ROADMAP closeout together
Tested: cargo fmt --all --check; cargo clippy --workspace --all-targets -- -D warnings; cargo test --workspace; architect review APPROVE
Not-tested: Cross-process cron/reminder persistence beyond the in-memory registry used in this repo
2026-04-12 12:52:27 +00:00
Yeachan-Heo
e75d67dfd3 Make successful lanes explain what artifacts they actually produced
The next repo-local sweep target was ROADMAP #64: downstream consumers
still had to infer artifact provenance from prose even though the repo
already emitted structured lane events. The fix extends `lane.finished`
metadata with structured artifact provenance so successful completions
can report roadmap ids, files, diff stat, verification state, and commit
sha without relying on narration alone.

Constraint: Preserve the existing commit-created event and lane-finished metadata paths while adding structured provenance to successful completions
Rejected: Introduce a separate artifact event type first | unnecessary for this focused closeout because `lane.finished` already carries structured data and existing consumers can read it there
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: If artifact provenance extraction rules change later, update `extract_artifact_provenance`, its regression payload, and the ROADMAP closeout together
Tested: cargo fmt --all --check; cargo clippy --workspace --all-targets -- -D warnings; cargo test --workspace; architect review APPROVE
Not-tested: Downstream consumers that ignore `lane.finished.data.artifactProvenance` and still parse only prose output
2026-04-12 11:56:00 +00:00
Yeachan-Heo
2e34949507 Keep latest-session timestamps increasing under tight loops
The next repo-local sweep target was ROADMAP #73: repeated backlog
sweeps exposed that session writes could share the same wall-clock
millisecond, which made semantic recency fragile and forced the
resume-latest regression to sleep between saves. The fix makes session
timestamps monotonic within the process and removes the timing hack
from the test so latest-session selection stays stable under tight
loops.

Constraint: Preserve the existing session file format while changing only the timestamp source semantics
Rejected: Keep the sleep-based test workaround | hides the real ordering hazard instead of fixing timestamp generation
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Any future session-recency logic must keep `current_time_millis`, ordering tests, and latest-session expectations aligned
Tested: cargo fmt --all --check; cargo clippy --workspace --all-targets -- -D warnings; cargo test --workspace; architect review APPROVE
Not-tested: Cross-process monotonicity when multiple binaries write sessions concurrently
2026-04-12 10:51:19 +00:00
Yeachan-Heo
8f53524bd3 Make backlog-scan lanes say what they actually selected
The next repo-local sweep target was ROADMAP #65: backlog-scanning
lanes could stop with prose-only summaries naming roadmap items, but
there was no machine-readable record of which items were chosen,
which were skipped, or whether the lane intended to execute, review,
or no-op. The fix teaches completed lane persistence to extract a
structured selection outcome while preserving the existing quality-
floor and review-verdict behavior for other lanes.

Constraint: Keep selection-outcome extraction on the existing `lane.finished` metadata path instead of inventing a separate event stream
Rejected: Add a dedicated selection event type first | unnecessary for this focused closeout because `lane.finished` already persists structured data downstream can read
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: If backlog-scan summary conventions change later, update `extract_selection_outcome`, its regression test, and the ROADMAP closeout wording together
Tested: cargo fmt --all --check; cargo clippy --workspace --all-targets -- -D warnings; cargo test --workspace; architect review APPROVE after roadmap closeout update
Not-tested: Downstream consumers that may still ignore `lane.finished.data.selectionOutcome`
2026-04-12 09:54:37 +00:00
Yeachan-Heo
b5e30e2975 Make completed review lanes emit machine-readable verdicts
The next repo-local sweep target was ROADMAP #67: scoped review lanes
could stop with prose-only output, leaving downstream consumers to infer
approval or rejection from later chatter. The fix teaches completed lane
persistence to recognize review-style `APPROVE`/`REJECT`/`BLOCKED`
results, attach structured verdict metadata to `lane.finished`, and keep
ordinary non-review lanes on the existing quality-floor path.

Constraint: Preserve the existing non-review lane summary path while enriching only review-style completions
Rejected: Add a brand-new lane event type just for review results | unnecessary when `lane.finished` already carries structured metadata and downstream consumers can read it there
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: If review verdict parsing changes later, update `extract_review_outcome`, the finished-event payload fields, and the review-lane regression together
Tested: cargo fmt --all --check; cargo clippy --workspace --all-targets -- -D warnings; cargo test --workspace; architect review APPROVE
Not-tested: External consumers that may still ignore `lane.finished.data.reviewVerdict`
2026-04-12 08:49:40 +00:00
Yeachan-Heo
dbc2824a3e Keep latest session selection tied to real session recency
The next repo-local sweep target was ROADMAP #72: the `latest`
managed-session alias could depend on filesystem mtime before the
session's own persisted recency markers, which made the selection
path vulnerable to coarse or misleading file timestamps. The fix
promotes `updated_at_ms` into the summary/order path, keeps CLI
wrappers in sync, and locks the mtime-vs-session-recency case with
regression coverage.

Constraint: Preserve existing managed-session storage layout while changing only the ordering signal
Rejected: Keep sorting by filesystem mtime and just sleep longer in tests | hides the semantic ordering bug instead of fixing it
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Any future managed-session ordering change must keep runtime and CLI summary structs aligned on the same recency fields
Tested: cargo fmt --all --check; cargo clippy --workspace --all-targets -- -D warnings; cargo test --workspace; architect review APPROVE
Not-tested: Cross-filesystem behavior where persisted session JSON cannot be read and fallback ordering uses mtime only
2026-04-12 07:49:32 +00:00
Yeachan-Heo
f309ff8642 Stop repo lanes from executing the wrong task payload
The next repo-local sweep target was ROADMAP #71: a claw-code lane
accepted an unrelated KakaoTalk/image-analysis prompt even though the
lane itself was supposed to be repo-scoped work. This extends the
existing prompt-misdelivery guardrail with an optional structured task
receipt so worker boot can reject visible wrong-task context before the
lane continues executing.

Constraint: Keep the fix inside the existing worker_boot / WorkerSendPrompt control surface instead of inventing a new external OMX-only protocol
Rejected: Treat wrong-task receipts as generic shell misdelivery | loses the expected-vs-observed task context needed to debug contaminated lanes
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: If task-receipt fields change later, update the WorkerSendPrompt schema, worker payload serialization, and wrong-task regression together
Tested: cargo fmt --all --check; cargo clippy --workspace --all-targets -- -D warnings; cargo test --workspace; architect review APPROVE
Not-tested: External orchestrators that have not yet started populating the optional task_receipt field
2026-04-12 07:00:07 +00:00
Yeachan-Heo
3b806702e7 Make the CLI point users at the real install source
The next repo-local backlog item was ROADMAP #70: users could
mistake third-party pages or the deprecated `cargo install
claw-code` path for the official install route. The CLI now
surfaces the source of truth directly in `claw doctor` and
`claw --help`, and the roadmap closeout records the change.

Constraint: Keep the fix inside repo-local Rust CLI surfaces instead of relying on docs alone
Rejected: Close #70 with README-only wording | the bug was user-facing CLI ambiguity, so the warning needed to appear in runtime help/doctor output
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: If install guidance changes later, update both the doctor check payload and the help-text warning together
Tested: cargo fmt --all --check; cargo clippy --workspace --all-targets -- -D warnings; cargo test --workspace; architect review APPROVE
Not-tested: Third-party websites outside this repo that may still present stale install instructions
2026-04-12 04:50:03 +00:00