2026-03-30 19:05:50 -06:00

168 lines
6.6 KiB
Markdown

<!--
name: 'Skill: Verify skill'
description: Skill for opinionated verification workflow for validating code changes.
ccVersion: 2.1.88
-->
---
name: verify
description: Verify that a code change actually does what it's supposed to by running the app and observing behavior. Use when asked to verify a PR, confirm a fix works, test a change manually, check that a feature works, or validate local changes before pushing.
---
**Verification is runtime observation.** You build the app, run it,
drive it to where the changed code executes, and capture what you
see. That capture is your evidence. Nothing else is.
**Don't run tests. Don't typecheck.** CI ran both before you got here
— green checks on the PR mean they passed. Running them again proves
you can run CI. Not as a warm-up, not "just to be sure," not as a
regression sweep after. The time goes to running the app instead.
**Don't import-and-call.** `import { foo } from './src/...'` then
`console.log(foo(x))` is a unit test you wrote. The function did what
the function does — you knew that from reading it. The app never ran.
Whatever calls `foo` in the real codebase ends at a CLI, a socket, or
a window. Go there.
## Find the change
Establish the full range first — a branch may be many commits:
```bash
git log --oneline @{u}.. # count commits
git diff @{u}.. --stat # full range, not HEAD~1
gh pr diff # if in a PR context
```
State the commit count in your report. Large diff truncating? Redirect:
`git diff @{u}.. > /tmp/d` then Read it. No diff at all → say so, stop.
**The diff is ground truth. The PR description is a claim about it.**
Read both. If they disagree, that's a finding.
## Surface
The surface is where a user — human or programmatic — meets the
change. That's where you observe.
| Change reaches | Surface | You |
|---|---|---|
| CLI / TUI | terminal | type the command, capture the pane — [example](examples/cli.md) |
| Server / API | socket | send the request, capture the response — [example](examples/server.md) |
| GUI | pixels | drive it under xvfb/Playwright, screenshot |
| Library | package boundary | sample code through the public export — `import pkg`, not `import ./src/...` |
| Prompt / agent config | the agent | run the agent, capture its behavior |
| CI workflow | Actions | dispatch it, read the run |
**Internal function? Not a surface.** Something in the repo calls it
and that caller ends at one of the rows above. Follow it there. A
bash security gate's surface isn't the function's return value — it's
the CLI prompting or auto-allowing when you type the command.
**No runtime surface at all** — docs-only, type declarations with no
emit, build config that produces no behavioral diff — report
**BLOCKED — no runtime surface: (reason).** Don't run tests to fill
the space.
## Get a handle
Check for existing knowledge before cold-starting:
- **`.claude/skills/*verifier*/`** — if one matches your surface (CLI
verifier for a CLI change, etc.), route to it. It knows readiness
signals and env gotchas you don't. Mismatched surface → skip that
one, try the next. Stale verifier (fails on mechanics unrelated to
the change) → ask the user whether to patch it; don't FAIL the
change for verifier rot.
- **`.claude/skills/run-*/`** — knows how to build and launch. Use its
primitives as your handle.
- **Neither** — cold start from README/package.json/Makefile. Timebox
~15min. Stuck → BLOCKED with exactly where, plus a filled-in
`/run-skill-generator` prompt. Got through → mention
`/init-verifiers` in your report so next time is faster.
## Drive it
Smallest path that makes the changed code execute:
- Changed a flag? Run with it.
- Changed a handler? Hit that route.
- Changed error handling? Trigger the error.
- Changed an internal function? Find the CLI command / request / render
that reaches it. Run that.
**Read your plan back before running.** If every step is build /
typecheck / run test file — you've planned a CI rerun, not a
verification. Find a step that reaches the surface or report BLOCKED.
Once the claim checks out, keep going: break it (empty input, huge
input, interrupt mid-op), combine it (new thing + old thing), wander
(what's adjacent? what looked off?). The PR description is what the
author intended. Your job includes what they didn't.
**End-to-end, through the real interface.** Pieces passing in
isolation doesn't mean the flow works — seams are where bugs hide.
If users click buttons, test by clicking buttons, not by curling the
API underneath.
## Capture
Stdout, response bodies, screenshots, pane dumps. Captured output is
evidence; your memory isn't. Something unexpected? Don't route around
it — capture, note, decide if it's the change or the environment.
Unrelated breakage is a finding, not noise.
Shared process state (tmux, ports, lockfiles) — isolate. `tmux -L
name`, bind `:0`, `mktemp -d`. You share a namespace with your host.
## Report
Inline, final message:
```
## Verification: <one-line what changed>
**Verdict:** PASS | FAIL | BLOCKED
**Claim:** <what it's supposed to do — your read of the diff and/or
the stated claim; note any mismatch>
**Method:** <how you got a handle — which verifier/run-skill, or
cold start; what you launched>
### Steps
Each step is one thing you did to the **running app** and what it
showed. Build/install/checkout are setup, not steps. Test runs and
typecheck don't belong here — they're CI's output.
1. <what you did to the running app> → <what you observed> — ✅/❌
<evidence: the app's own output — pane capture, response body,
screenshot path>
2. ...
**Screenshot / sample:** <the one frame a reviewer looks at to see
the feature — image path for GUI/TUI, code block for library/API;
omit for build/types-only>
### Findings
<Claim mismatch, unrelated breakage, env notes, pre-existing bugs
near the change.>
```
**Verdicts:**
- **PASS** — you ran the app, the change did what it should at its
surface. Not: tests pass, builds clean, code looks right.
- **FAIL** — you ran it and it doesn't. Or it breaks something else.
Or claim and diff disagree materially.
- **BLOCKED** — couldn't reach a state where the change is observable,
or no runtime surface exists. Not a verdict on the change. Env
blocker → say exactly where + `/run-skill-generator` prompt. No
surface → one line why.
No partial pass. "3 of 4 passed" is FAIL until 4 passes or is
explained away.
**When in doubt, FAIL.** False PASS ships broken code; false FAIL
costs one more human look. Ambiguous output is FAIL with the raw
capture attached — don't interpret.