claw-code

elias/claw-code

Fork 0

mirror of https://github.com/ultraworkers/claw-code.git synced 2026-04-24 21:28:11 +08:00

Commit Graph

Author	SHA1	Message	Date
YeonGyu-Kim	11f9e8a5a2	feat: #164 Stage B CLOSURE — turn-loop JSON + cancel_observed coverage + CLAWABLE promotion Closes all three gaebal-gajae-identified closure criteria for #164 Stage B: 1. turn-loop runtime surface exposes cancel_observed consistently 2. cancellation path tests validate safe-to-reuse semantics 3. turn-loop promoted from OPT_OUT to CLAWABLE surface Changes: src/main.py: - turn-loop accepts --output-format {text,json} - JSON envelope includes per-turn cancel_observed + final_cancel_observed - All turn fields exposed: prompt, output, stop_reason, cancel_observed, matched_commands, matched_tools - Exit code 2 on final timeout preserved tests/test_cli_parity_audit.py: - CLAWABLE_SURFACES now contains 14 commands (was 13) - Removed 'turn-loop' from OPT_OUT_SURFACES - Parametrized --output-format test auto-validates turn-loop JSON tests/test_cancel_observed_field.py (new, 9 tests): - TestCancelObservedField (5 tests): field contract - default False - explicit True preserved - normal completion → False - bootstrap JSON exposes field - turn-loop JSON exposes per-turn field - TestCancelObservedSafeReuseSemantics (2 tests): reuse contract - timeout result has cancel_observed=True when signaled - engine.mutable_messages not corrupted after cancelled turn - engine accepts fresh message after cancellation - TestCancelObservedSchemaCompliance (2 tests): SCHEMAS.md contract - cancel_observed is always bool - final_cancel_observed convenience field present Closure criteria validated: - ✅ Field exposed in bootstrap JSON - ✅ Field exposed per-turn in turn-loop JSON - ✅ Field is always bool, never null - ✅ Safe-to-reuse: engine can accept fresh messages after cancellation - ✅ mutable_messages not corrupted by cancelled turn - ✅ turn-loop promoted from OPT_OUT (14 clawable commands now) Protocol now distinguishes at runtime: timeout + cancel_observed=false → infra/wedge (escalate) timeout + cancel_observed=true → cooperative cancellation (safe to retry) Test results: 182 → 192 passing, +10 tests, zero regression, 3 skipped unchanged. Closes #164 Stage B. Stage C (async-native preemption) remains future work.	2026-04-22 19:49:20 +09:00
YeonGyu-Kim	b048de8899	fix: #171 — automate cross-surface CLI parity audit via argparse introspection Stops manual parity inspection from being a human-noticed concern. When a developer adds a new subcommand to the claw-code CLI, this test suite enforces explicit classification: - CLAWABLE_SURFACES: MUST accept --output-format {text,json} - OPT_OUT_SURFACES: explicitly exempt with documented rationale A new command that forgets to opt into one of these two sets FAILS loudly with TestCommandClassificationCoverage::test_every_registered_ command_is_classified. No silent drift possible. Technique: argparse introspection at test time walks the _actions tree, discovers every registered subcommand, and compares against the declared classification sets. Contract is enforced machine-first instead of depending on human review. Three test classes covering three invariants: TestClawableSurfaceParity (14 tests): - test_all_clawable_surfaces_accept_output_format: every member of CLAWABLE_SURFACES has --output-format flag registered - test_clawable_surface_output_format_choices (parametrised over 13 commands): each must accept exactly {text, json} and default to 'text' for backward compat TestCommandClassificationCoverage (3 tests): - test_every_registered_command_is_classified: any new subcommand must be explicitly added to CLAWABLE_SURFACES or OPT_OUT_SURFACES - test_no_command_in_both_sets: sanity check for classification conflicts - test_all_classified_commands_actually_exist: no phantom commands (catches stale entries after a command is removed) TestJsonOutputContractEndToEnd (10 tests): - test_command_emits_parseable_json (parametrised over 10 clawable commands): actual subprocess invocation with --output-format json produces valid parseable JSON on stdout Classification: CLAWABLE_SURFACES (13): Session lifecycle: list-sessions, delete-session, load-session, flush-transcript Inspect: show-command, show-tool Execution: exec-command, exec-tool, route, bootstrap Diagnostic inventory: command-graph, tool-pool, bootstrap-graph OPT_OUT_SURFACES (12): Rich-Markdown reports (future JSON schema): summary, manifest, parity-audit, setup-report List filter commands: subsystems, commands, tools Turn-loop: structured_output is future work Simulation/debug: remote-mode, ssh-mode, teleport-mode, direct-connect-mode, deep-link-mode Full suite: 141 → 168 passing (+27), zero regression. Closes ROADMAP #171. Why this matters: Before: parity was human-monitored; every new command was a drift risk. The CLUSTER 3 sweep required manually auditing every subcommand and landing fixes as separate pinpoints. After: parity is machine-enforced. If a future developer adds a new command without --output-format, the test suite blocks it immediately with a concrete error message pointing at the missing flag. This is the first step in Gaebal-gajae's identified upper-level work: operationalised parity instead of aspirational parity. Related clusters: - Clawability principle: machine-first protocol enforcement - Test-first regression guard: extends TestTripletParityConsistency (#160/#165) and TestFullFamilyParity (#166) from per-cluster parity to cross-surface parity	2026-04-22 19:02:10 +09:00

Author

SHA1

Message

Date

YeonGyu-Kim

11f9e8a5a2

feat: #164 Stage B CLOSURE — turn-loop JSON + cancel_observed coverage + CLAWABLE promotion

Closes all three gaebal-gajae-identified closure criteria for #164 Stage B:

1. turn-loop runtime surface exposes cancel_observed consistently
2. cancellation path tests validate safe-to-reuse semantics
3. turn-loop promoted from OPT_OUT to CLAWABLE surface

Changes:

src/main.py:
- turn-loop accepts --output-format {text,json}
- JSON envelope includes per-turn cancel_observed + final_cancel_observed
- All turn fields exposed: prompt, output, stop_reason, cancel_observed,
  matched_commands, matched_tools
- Exit code 2 on final timeout preserved

tests/test_cli_parity_audit.py:
- CLAWABLE_SURFACES now contains 14 commands (was 13)
- Removed 'turn-loop' from OPT_OUT_SURFACES
- Parametrized --output-format test auto-validates turn-loop JSON

tests/test_cancel_observed_field.py (new, 9 tests):
- TestCancelObservedField (5 tests): field contract
  - default False
  - explicit True preserved
  - normal completion → False
  - bootstrap JSON exposes field
  - turn-loop JSON exposes per-turn field
- TestCancelObservedSafeReuseSemantics (2 tests): reuse contract
  - timeout result has cancel_observed=True when signaled
  - engine.mutable_messages not corrupted after cancelled turn
  - engine accepts fresh message after cancellation
- TestCancelObservedSchemaCompliance (2 tests): SCHEMAS.md contract
  - cancel_observed is always bool
  - final_cancel_observed convenience field present

Closure criteria validated:
- ✅ Field exposed in bootstrap JSON
- ✅ Field exposed per-turn in turn-loop JSON
- ✅ Field is always bool, never null
- ✅ Safe-to-reuse: engine can accept fresh messages after cancellation
- ✅ mutable_messages not corrupted by cancelled turn
- ✅ turn-loop promoted from OPT_OUT (14 clawable commands now)

Protocol now distinguishes at runtime:
  timeout + cancel_observed=false → infra/wedge (escalate)
  timeout + cancel_observed=true → cooperative cancellation (safe to retry)

Test results: 182 → 192 passing, +10 tests, zero regression, 3 skipped unchanged.

Closes #164 Stage B. Stage C (async-native preemption) remains future work.

2026-04-22 19:49:20 +09:00

YeonGyu-Kim

b048de8899

fix: #171 — automate cross-surface CLI parity audit via argparse introspection

Stops manual parity inspection from being a human-noticed concern. When
a developer adds a new subcommand to the claw-code CLI, this test suite
enforces explicit classification:
  - CLAWABLE_SURFACES: MUST accept --output-format {text,json}
  - OPT_OUT_SURFACES: explicitly exempt with documented rationale

A new command that forgets to opt into one of these two sets FAILS
loudly with TestCommandClassificationCoverage::test_every_registered_
command_is_classified. No silent drift possible.

Technique: argparse introspection at test time walks the _actions tree,
discovers every registered subcommand, and compares against the declared
classification sets. Contract is enforced machine-first instead of
depending on human review.

Three test classes covering three invariants:

TestClawableSurfaceParity (14 tests):
  - test_all_clawable_surfaces_accept_output_format: every member of
    CLAWABLE_SURFACES has --output-format flag registered
  - test_clawable_surface_output_format_choices (parametrised over 13
    commands): each must accept exactly {text, json} and default to 'text'
    for backward compat

TestCommandClassificationCoverage (3 tests):
  - test_every_registered_command_is_classified: any new subcommand
    must be explicitly added to CLAWABLE_SURFACES or OPT_OUT_SURFACES
  - test_no_command_in_both_sets: sanity check for classification conflicts
  - test_all_classified_commands_actually_exist: no phantom commands
    (catches stale entries after a command is removed)

TestJsonOutputContractEndToEnd (10 tests):
  - test_command_emits_parseable_json (parametrised over 10 clawable
    commands): actual subprocess invocation with --output-format json
    produces valid parseable JSON on stdout

Classification:
  CLAWABLE_SURFACES (13):
    Session lifecycle: list-sessions, delete-session, load-session,
                       flush-transcript
    Inspect: show-command, show-tool
    Execution: exec-command, exec-tool, route, bootstrap
    Diagnostic inventory: command-graph, tool-pool, bootstrap-graph

  OPT_OUT_SURFACES (12):
    Rich-Markdown reports (future JSON schema): summary, manifest,
                         parity-audit, setup-report
    List filter commands: subsystems, commands, tools
    Turn-loop: structured_output is future work
    Simulation/debug: remote-mode, ssh-mode, teleport-mode,
                      direct-connect-mode, deep-link-mode

Full suite: 141 → 168 passing (+27), zero regression.

Closes ROADMAP #171.

Why this matters:
  Before: parity was human-monitored; every new command was a drift
          risk. The CLUSTER 3 sweep required manually auditing every
          subcommand and landing fixes as separate pinpoints.
  After: parity is machine-enforced. If a future developer adds a new
         command without --output-format, the test suite blocks it
         immediately with a concrete error message pointing at the
         missing flag.

This is the first step in Gaebal-gajae's identified upper-level work:
operationalised parity instead of aspirational parity.

Related clusters:
  - Clawability principle: machine-first protocol enforcement
  - Test-first regression guard: extends TestTripletParityConsistency
    (#160/#165) and TestFullFamilyParity (#166) from per-cluster
    parity to cross-surface parity

2026-04-22 19:02:10 +09:00

2 Commits