mirror of
https://github.com/ultraworkers/claw-code.git
synced 2026-04-25 13:44:06 +08:00
Cycle #30 dogfood found a testing gap: OPT_OUT surfaces were classified in code but their REJECTION behavior was never regression-tested. ## The Gap OPT_OUT_AUDIT.md declares 12 surfaces as intentionally exempt from --output-format. The test suite had: - ✅ test_clawable_surface_has_output_format (CLAWABLE must accept) - ✅ test_every_registered_command_is_classified (no orphans) - ❌ Nothing verifying OPT_OUT surfaces REJECT --output-format If a developer accidentally added --output-format to 'summary' (one of the 12 OPT_OUT surfaces), no test would catch the silent promotion. The classification was governed, but the rejection behavior was NOT. ## What Changed Added TestOptOutSurfaceRejection to test_cli_parity_audit.py with 14 tests: 1. **12 parametrized tests** — one per OPT_OUT surface, verifying each rejects --output-format with an argparse error. 2. **test_opt_out_set_matches_audit_document** — verifies OPT_OUT_SURFACES constant matches the declared 12 surfaces in OPT_OUT_AUDIT.md. 3. **test_opt_out_count_matches_declared** — sanity check that the count stays at 12 as documented. ## Symmetry Achieved Before: only CLAWABLE acceptance tested CLAWABLE accepts --output-format ✅ OPT_OUT behavior: untested After: full parity coverage CLAWABLE accepts --output-format ✅ OPT_OUT rejects --output-format ✅ Audit doc ↔ constant kept in sync ✅ This completes the parity enforcement loop: every new surface is explicitly IN or OUT, and BOTH directions are regression-locked. ## Promotion Path Preserved When a real OPT_OUT surface gains genuine demand (per OPT_OUT_DEMAND_LOG.md): 1. Move from OPT_OUT_SURFACES to CLAWABLE_SURFACES 2. Update OPT_OUT_AUDIT.md with promotion rationale 3. Remove from this test's expected rejections 4. Tests pass (rejection test no longer runs; acceptance test now required) Graceful promotion; no accidental drift. ## Test Count - 222 → 236 passing (+14, zero regressions) - 12 parametrized + 2 metadata = 14 new tests ## Discipline Check Per cycle #24 calibration: - Red-state bug? ✗ (no broken behavior) - Real friction? ✓ (testing gap discovered by dogfood) - Evidence-backed? ✓ (systematic probe revealed missing coverage) This is the cycle #27 taxonomy (structural / quality / cross-channel / text-vs-JSON divergence) extending into classification: not just 'is the envelope right?' but 'is the OPPOSITE-OF-envelope right?' Future cycles can apply the same principle to other classifications: every governed non-goal deserves regression tests that lock its non-goal-ness. Classification: - Real friction: ✓ (cycle #30 dogfood) - Evidence-backed: ✓ (gap discovered by systematic surface audit) - Same-cycle fix: ✓ (maintainership discipline) Source: Jobdori cycle #30 proactive dogfood — probed all 26 subcommands with --output-format json and noticed OPT_OUT rejection pattern was unverified by any dedicated test.