mirror of
https://github.com/ultraworkers/claw-code.git
synced 2026-04-24 13:08:11 +08:00
Stops manual parity inspection from being a human-noticed concern. When
a developer adds a new subcommand to the claw-code CLI, this test suite
enforces explicit classification:
- CLAWABLE_SURFACES: MUST accept --output-format {text,json}
- OPT_OUT_SURFACES: explicitly exempt with documented rationale
A new command that forgets to opt into one of these two sets FAILS
loudly with TestCommandClassificationCoverage::test_every_registered_
command_is_classified. No silent drift possible.
Technique: argparse introspection at test time walks the _actions tree,
discovers every registered subcommand, and compares against the declared
classification sets. Contract is enforced machine-first instead of
depending on human review.
Three test classes covering three invariants:
TestClawableSurfaceParity (14 tests):
- test_all_clawable_surfaces_accept_output_format: every member of
CLAWABLE_SURFACES has --output-format flag registered
- test_clawable_surface_output_format_choices (parametrised over 13
commands): each must accept exactly {text, json} and default to 'text'
for backward compat
TestCommandClassificationCoverage (3 tests):
- test_every_registered_command_is_classified: any new subcommand
must be explicitly added to CLAWABLE_SURFACES or OPT_OUT_SURFACES
- test_no_command_in_both_sets: sanity check for classification conflicts
- test_all_classified_commands_actually_exist: no phantom commands
(catches stale entries after a command is removed)
TestJsonOutputContractEndToEnd (10 tests):
- test_command_emits_parseable_json (parametrised over 10 clawable
commands): actual subprocess invocation with --output-format json
produces valid parseable JSON on stdout
Classification:
CLAWABLE_SURFACES (13):
Session lifecycle: list-sessions, delete-session, load-session,
flush-transcript
Inspect: show-command, show-tool
Execution: exec-command, exec-tool, route, bootstrap
Diagnostic inventory: command-graph, tool-pool, bootstrap-graph
OPT_OUT_SURFACES (12):
Rich-Markdown reports (future JSON schema): summary, manifest,
parity-audit, setup-report
List filter commands: subsystems, commands, tools
Turn-loop: structured_output is future work
Simulation/debug: remote-mode, ssh-mode, teleport-mode,
direct-connect-mode, deep-link-mode
Full suite: 141 → 168 passing (+27), zero regression.
Closes ROADMAP #171.
Why this matters:
Before: parity was human-monitored; every new command was a drift
risk. The CLUSTER 3 sweep required manually auditing every
subcommand and landing fixes as separate pinpoints.
After: parity is machine-enforced. If a future developer adds a new
command without --output-format, the test suite blocks it
immediately with a concrete error message pointing at the
missing flag.
This is the first step in Gaebal-gajae's identified upper-level work:
operationalised parity instead of aspirational parity.
Related clusters:
- Clawability principle: machine-first protocol enforcement
- Test-first regression guard: extends TestTripletParityConsistency
(#160/#165) and TestFullFamilyParity (#166) from per-cluster
parity to cross-surface parity