Cycle #22 ships documentation that operationalizes cycles #178–#179. Problem context: After #178 (parse-error envelope) and #179 (stderr hygiene + real error message), claws can now build a unified error handler for all 14 clawable commands. But there was no guide on how to actually do that. Operators had the pieces; they didn't have the pattern. This file changes that. New file: ERROR_HANDLING.md - Quick reference: exit codes + envelope shapes (0=success, 1=error, 2=timeout) - One-handler pattern: ~80 lines of Python showing how to parse error.kind, check retryable, and decide recovery strategy - Four practical recovery patterns: - Retry on transient errors (filesystem, timeout) - Reuse session after timeout (if cancel_observed=true) - Validate command syntax before dispatch (dry-run --help) - Log errors for observability - Error kinds enumeration (parse, session_not_found, filesystem, runtime, timeout) - Common mistakes to avoid (6 patterns with BAD vs GOOD examples) - Testing your error handler (unit test examples) Operational impact: Orchestration code now has a canonical pattern. Claws can: - Copy-paste the run_claw_command() function (works for all commands) - Classify errors uniformly (no special cases per command) - Decide recovery deterministically (error.kind + retryable + cancel_observed) - Log/monitor/escalate with confidence Related cycles: - #178: Parse-error envelope (commands now emit structured JSON on invalid argv) - #179: Stderr hygiene + real message (JSON mode silences argparse, carries actual error) - #164 Stage B: cancel_observed field (callers know if session is safe for reuse) Updated CLAUDE.md: - Added ERROR_HANDLING.md to 'Related docs' section - Now documents the one-handler pattern as a guideline No code changes. No test changes. Pure documentation. This completes the documentation trail from protocol (SCHEMAS.md) → governance (OPT_OUT_AUDIT.md, OPT_OUT_DEMAND_LOG.md) → practice (ERROR_HANDLING.md).
7.8 KiB
CLAUDE.md — Python Reference Implementation
This file guides work on src/ and tests/ — the Python reference harness for claw-code protocol.
The production CLI lives in rust/; this directory (src/, tests/, .py files) is a protocol validation and dogfood surface.
What this Python harness does
Machine-first orchestration layer — proves that the claw-code JSON protocol is:
- Deterministic and recoverable (every output is reproducible)
- Self-describing (SCHEMAS.md documents every field)
- Clawable (external agents can build ONE error handler for all commands)
Stack
- Language: Python 3.13+
- Dependencies: minimal (no frameworks; pure stdlibs + attrs/dataclasses)
- Test runner: pytest
- Protocol contract: SCHEMAS.md (machine-readable JSON envelope)
Quick start
# 1. Install dependencies (if not already in venv)
python3 -m venv .venv && source .venv/bin/activate
# (dependencies minimal; standard library mostly)
# 2. Run tests
python3 -m pytest tests/ -q
# 3. Try a command
python3 -m src.main bootstrap "hello" --output-format json | python3 -m json.tool
Verification workflow
# Unit tests (fast)
python3 -m pytest tests/ -q 2>&1 | tail -3
# Type checking (optional but recommended)
python3 -m mypy src/ --ignore-missing-imports 2>&1 | tail -5
Repository shape
-
src/— Python reference harness implementing SCHEMAS.md protocolmain.py— CLI entry point; all 14 clawable commandsquery_engine.py— core TurnResult / QueryEngineConfigruntime.py— PortRuntime; turn loop + cancellation (#164 Stage A/B)session_store.py— session persistencetranscript.py— turn transcript assemblycommands.py,tools.py— simulated command/tool treesmodels.py— PermissionDenial, UsageSummary, etc.
-
tests/— comprehensive protocol validation (22 baseline → 192 passing as of 2026-04-22)test_cli_parity_audit.py— proves all 14 clawable commands accept --output-formattest_json_envelope_field_consistency.py— validates SCHEMAS.md contracttest_cancel_observed_field.py— #164 Stage B: cancellation observability + safe-to-reuse semanticstest_run_turn_loop_*.py— turn loop behavior (timeout, cancellation, continuation, permissions)test_submit_message_*.py— budget, cancellation contractstest_*_cli.py— command-specific JSON output validation
-
SCHEMAS.md— canonical JSON contract- Common fields (all envelopes): timestamp, command, exit_code, output_format, schema_version
- Error envelope shape
- Not-found envelope shape
- Per-command success schemas (14 commands documented)
- Turn Result fields (including cancel_observed as of #164 Stage B)
-
.gitignore— excludes.port_sessions/(dogfood-run state)
Key concepts
Clawable surface (14 commands)
Every clawable command must:
- Accept
--output-format {text,json} - Return JSON envelopes matching SCHEMAS.md
- Use common fields (timestamp, command, exit_code, output_format, schema_version)
- Exit 0 on success, 1 on error/not-found, 2 on timeout
Commands: list-sessions, delete-session, load-session, flush-transcript, show-command, show-tool, exec-command, exec-tool, route, bootstrap, command-graph, tool-pool, bootstrap-graph, turn-loop
Validation: test_cli_parity_audit.py auto-tests all 14 for --output-format acceptance.
OPT_OUT surfaces (12 commands)
Explicitly exempt from --output-format requirement (for now):
- Rich-Markdown reports: summary, manifest, parity-audit, setup-report
- List commands with query filters: subsystems, commands, tools
- Simulation/debug: remote-mode, ssh-mode, teleport-mode, direct-connect-mode, deep-link-mode
Future work: audit OPT_OUT surfaces for JSON promotion (post-#164).
Protocol layers
Coverage (#167–#170): All clawable commands emit JSON Enforcement (#171): Parity CI prevents new commands skipping JSON Documentation (#172): SCHEMAS.md locks field contract Alignment (#173): Test framework validates docs ↔ code match Field evolution (#164 Stage B): cancel_observed proves protocol extensibility
Testing & coverage
Run full suite
python3 -m pytest tests/ -q
Run one test file
python3 -m pytest tests/test_cancel_observed_field.py -v
Run one test
python3 -m pytest tests/test_cancel_observed_field.py::TestCancelObservedField::test_default_value_is_false -v
Check coverage (optional)
python3 -m pip install coverage # if not already installed
python3 -m coverage run -m pytest tests/
python3 -m coverage report --skip-covered
Target: >90% line coverage for src/ (currently ~85%).
Common workflows
Add a new clawable command
- Add parser in
main.py(argparse) - Add
--output-formatflag - Emit JSON envelope using
wrap_json_envelope(data, command_name) - Add command to CLAWABLE_SURFACES in test_cli_parity_audit.py
- Document in SCHEMAS.md (schema + example)
- Write test in tests/test_*_cli.py or tests/test_json_envelope_field_consistency.py
- Run full suite to confirm parity
Modify TurnResult or protocol fields
- Update dataclass in
query_engine.py - Update SCHEMAS.md with new field + rationale
- Write test in
tests/test_json_envelope_field_consistency.pythat validates field presence - Update all places that construct TurnResult (grep for
TurnResult() - Update bootstrap/turn-loop JSON builders in main.py
- Run
tests/to ensure no regressions
Promote an OPT_OUT surface to CLAWABLE
Prerequisite: Real demand signal logged in OPT_OUT_DEMAND_LOG.md (threshold: 2+ independent signals per surface). Speculative promotions are not allowed.
Once demand is evidenced:
- Add --output-format flag to argparse
- Emit wrap_json_envelope() output in JSON path
- Move command from OPT_OUT_SURFACES to CLAWABLE_SURFACES
- Document in SCHEMAS.md
- Write test for JSON output
- Run parity audit to confirm no regressions
- Update
OPT_OUT_DEMAND_LOG.mdto mark signal as resolved
File a demand signal (when a claw actually needs JSON from an OPT_OUT surface)
- Open
OPT_OUT_DEMAND_LOG.md - Find the surface's entry under Group A/B/C
- Append a dated entry with Source, Use Case, and Markdown-alternative-checked explanation
- If this is the 2nd signal for the same surface, file a promotion pinpoint in ROADMAP.md
Dogfood principles
The Python harness is continuously dogfood-tested:
- Every cycle ships to
mainwith detailed commit messages - New tests are written before/alongside implementation
- Test suite must pass before pushing (zero-regression principle)
- Commits grouped by pinpoint (#159, #160, ..., #174)
- Failure modes classified per exit code: 0=success, 1=error, 2=timeout
Protocol governance
- SCHEMAS.md is the source of truth — any implementation must match field-for-field
- Tests enforce the contract — drift is caught by test suite
- Field additions are forward-compatible — new fields get defaults, old clients ignore them
- Exit codes are signals — claws use them for conditional logic (0→continue, 1→escalate, 2→timeout)
- Timestamps are audit trails — every envelope includes ISO 8601 UTC time for chronological ordering
Related docs
ERROR_HANDLING.md— Unified error-handling pattern for claws (one handler for all 14 clawable commands)SCHEMAS.md— JSON protocol specification (read before implementing)OPT_OUT_AUDIT.md— Governance for the 12 non-clawable surfacesOPT_OUT_DEMAND_LOG.md— Active survey recording real demand signals (evidence base for decisions)ROADMAP.md— macro roadmap and macro pain pointsPHILOSOPHY.md— system design intentPARITY.md— status of Python ↔ Rust protocol equivalence