Cycle #21 ships governance infrastructure, not implementation. Maintainership mode means sometimes the right deliverable is a decision framework, not code. Problem context: OPT_OUT_AUDIT.md (cycle #18 bonus) established 'demand-backed audit' as the next step. But without a structured way to record demand signals, 'demand-backed' was just a slogan — the next audit cycle would have no evidence to work from. This commit creates the evidentiary base: New file: OPT_OUT_DEMAND_LOG.md - Per-surface entries for all 12 OPT_OUT commands (Groups A/B/C) - Current state: 0 signals across all surfaces (consistent with audit prediction) - Signal entry template with required fields: - Source (who/what) - Use case (concrete orchestration problem) - Markdown-alternative-checked (why existing output insufficient) - Date - Promotion thresholds: - 2+ independent signals for same surface → file promotion pinpoint - 1 signal + existing stable schema → file pinpoint for discussion - 0 signals → stays OPT_OUT (rationale preserved) Decision framework for cycle #22 (audit close): - If 0 signals total: move to PERMANENTLY_OPT_OUT, close audit - If 1-2 signals: file individual promotion pinpoints with evidence - If 3+ signals: reopen audit, question classification itself Updated files: - OPT_OUT_AUDIT.md: Added demand log reference in Related section - CLAUDE.md: Added prerequisites for promotions (must have logged signals), added 'File a demand signal' workflow section Philosophy: 'Prevent speculative expansion' — schema bloat protection discipline. Every new CLAWABLE surface is a maintenance tax. Evidence requirement keeps the protocol lean. OPT_OUT surfaces are intentionally not-clawable until proven otherwise by external demand. Operational impact: Next cycles can now: 1. Watch for real claws hitting OPT_OUT surface limits 2. Log signals in structured format (no ad-hoc filing) 3. Run audit at cycle #22 with actual data, not speculation No code changes. No test changes. Pure governance infrastructure. Related: #18 cycle (OPT_OUT_AUDIT.md), maintainership phase transition.
7.5 KiB
CLAUDE.md — Python Reference Implementation
This file guides work on src/ and tests/ — the Python reference harness for claw-code protocol.
The production CLI lives in rust/; this directory (src/, tests/, .py files) is a protocol validation and dogfood surface.
What this Python harness does
Machine-first orchestration layer — proves that the claw-code JSON protocol is:
- Deterministic and recoverable (every output is reproducible)
- Self-describing (SCHEMAS.md documents every field)
- Clawable (external agents can build ONE error handler for all commands)
Stack
- Language: Python 3.13+
- Dependencies: minimal (no frameworks; pure stdlibs + attrs/dataclasses)
- Test runner: pytest
- Protocol contract: SCHEMAS.md (machine-readable JSON envelope)
Quick start
# 1. Install dependencies (if not already in venv)
python3 -m venv .venv && source .venv/bin/activate
# (dependencies minimal; standard library mostly)
# 2. Run tests
python3 -m pytest tests/ -q
# 3. Try a command
python3 -m src.main bootstrap "hello" --output-format json | python3 -m json.tool
Verification workflow
# Unit tests (fast)
python3 -m pytest tests/ -q 2>&1 | tail -3
# Type checking (optional but recommended)
python3 -m mypy src/ --ignore-missing-imports 2>&1 | tail -5
Repository shape
-
src/— Python reference harness implementing SCHEMAS.md protocolmain.py— CLI entry point; all 14 clawable commandsquery_engine.py— core TurnResult / QueryEngineConfigruntime.py— PortRuntime; turn loop + cancellation (#164 Stage A/B)session_store.py— session persistencetranscript.py— turn transcript assemblycommands.py,tools.py— simulated command/tool treesmodels.py— PermissionDenial, UsageSummary, etc.
-
tests/— comprehensive protocol validation (22 baseline → 192 passing as of 2026-04-22)test_cli_parity_audit.py— proves all 14 clawable commands accept --output-formattest_json_envelope_field_consistency.py— validates SCHEMAS.md contracttest_cancel_observed_field.py— #164 Stage B: cancellation observability + safe-to-reuse semanticstest_run_turn_loop_*.py— turn loop behavior (timeout, cancellation, continuation, permissions)test_submit_message_*.py— budget, cancellation contractstest_*_cli.py— command-specific JSON output validation
-
SCHEMAS.md— canonical JSON contract- Common fields (all envelopes): timestamp, command, exit_code, output_format, schema_version
- Error envelope shape
- Not-found envelope shape
- Per-command success schemas (14 commands documented)
- Turn Result fields (including cancel_observed as of #164 Stage B)
-
.gitignore— excludes.port_sessions/(dogfood-run state)
Key concepts
Clawable surface (14 commands)
Every clawable command must:
- Accept
--output-format {text,json} - Return JSON envelopes matching SCHEMAS.md
- Use common fields (timestamp, command, exit_code, output_format, schema_version)
- Exit 0 on success, 1 on error/not-found, 2 on timeout
Commands: list-sessions, delete-session, load-session, flush-transcript, show-command, show-tool, exec-command, exec-tool, route, bootstrap, command-graph, tool-pool, bootstrap-graph, turn-loop
Validation: test_cli_parity_audit.py auto-tests all 14 for --output-format acceptance.
OPT_OUT surfaces (12 commands)
Explicitly exempt from --output-format requirement (for now):
- Rich-Markdown reports: summary, manifest, parity-audit, setup-report
- List commands with query filters: subsystems, commands, tools
- Simulation/debug: remote-mode, ssh-mode, teleport-mode, direct-connect-mode, deep-link-mode
Future work: audit OPT_OUT surfaces for JSON promotion (post-#164).
Protocol layers
Coverage (#167–#170): All clawable commands emit JSON Enforcement (#171): Parity CI prevents new commands skipping JSON Documentation (#172): SCHEMAS.md locks field contract Alignment (#173): Test framework validates docs ↔ code match Field evolution (#164 Stage B): cancel_observed proves protocol extensibility
Testing & coverage
Run full suite
python3 -m pytest tests/ -q
Run one test file
python3 -m pytest tests/test_cancel_observed_field.py -v
Run one test
python3 -m pytest tests/test_cancel_observed_field.py::TestCancelObservedField::test_default_value_is_false -v
Check coverage (optional)
python3 -m pip install coverage # if not already installed
python3 -m coverage run -m pytest tests/
python3 -m coverage report --skip-covered
Target: >90% line coverage for src/ (currently ~85%).
Common workflows
Add a new clawable command
- Add parser in
main.py(argparse) - Add
--output-formatflag - Emit JSON envelope using
wrap_json_envelope(data, command_name) - Add command to CLAWABLE_SURFACES in test_cli_parity_audit.py
- Document in SCHEMAS.md (schema + example)
- Write test in tests/test_*_cli.py or tests/test_json_envelope_field_consistency.py
- Run full suite to confirm parity
Modify TurnResult or protocol fields
- Update dataclass in
query_engine.py - Update SCHEMAS.md with new field + rationale
- Write test in
tests/test_json_envelope_field_consistency.pythat validates field presence - Update all places that construct TurnResult (grep for
TurnResult() - Update bootstrap/turn-loop JSON builders in main.py
- Run
tests/to ensure no regressions
Promote an OPT_OUT surface to CLAWABLE
Prerequisite: Real demand signal logged in OPT_OUT_DEMAND_LOG.md (threshold: 2+ independent signals per surface). Speculative promotions are not allowed.
Once demand is evidenced:
- Add --output-format flag to argparse
- Emit wrap_json_envelope() output in JSON path
- Move command from OPT_OUT_SURFACES to CLAWABLE_SURFACES
- Document in SCHEMAS.md
- Write test for JSON output
- Run parity audit to confirm no regressions
- Update
OPT_OUT_DEMAND_LOG.mdto mark signal as resolved
File a demand signal (when a claw actually needs JSON from an OPT_OUT surface)
- Open
OPT_OUT_DEMAND_LOG.md - Find the surface's entry under Group A/B/C
- Append a dated entry with Source, Use Case, and Markdown-alternative-checked explanation
- If this is the 2nd signal for the same surface, file a promotion pinpoint in ROADMAP.md
Dogfood principles
The Python harness is continuously dogfood-tested:
- Every cycle ships to
mainwith detailed commit messages - New tests are written before/alongside implementation
- Test suite must pass before pushing (zero-regression principle)
- Commits grouped by pinpoint (#159, #160, ..., #174)
- Failure modes classified per exit code: 0=success, 1=error, 2=timeout
Protocol governance
- SCHEMAS.md is the source of truth — any implementation must match field-for-field
- Tests enforce the contract — drift is caught by test suite
- Field additions are forward-compatible — new fields get defaults, old clients ignore them
- Exit codes are signals — claws use them for conditional logic (0→continue, 1→escalate, 2→timeout)
- Timestamps are audit trails — every envelope includes ISO 8601 UTC time for chronological ordering
Related docs
SCHEMAS.md— JSON protocol specification (read before implementing)ROADMAP.md— macro roadmap and macro pain pointsPHILOSOPHY.md— system design intentPARITY.md— status of Python ↔ Rust protocol equivalence