YuhaoLin2005 51bced9a1f
Stop hook: verify thinking quality at session end — task completeness, assumptions, stale logs, disk space (delivery-gate) (#2378)
* Restore delivery-gate: Stop hook with learning capture enforcement (auto-closed by fork sync, now on clean branch)

* Fix bot findings: log level→INFO (DISK_REMIND dead code), count_edits full transcript (not truncated), memory-dir-absent warning (not silent pass), SKILL.md description accuracy

* Fix CodeRabbit feedback: treat missing memory-dir as all-stale on complex tasks (fail-close instead of fail-open)

* Trigger bot re-review (no logic changes)

* Fix: handle both stdin formats — raw transcript AND JSON with transcript_path (Greptile feedback)

* Add debug log for memory-dir lookup path

* Fix path encoding: replace colon with dash (not strip), matching Claude Code actual encoding on Windows

* Fix SKILL.md: update How It Works for JSON+transcript_path, add English translation to CLAUDE.md block (Greptile feedback)

* Fix: memory-dir absent → warn but don't block (prevents deadlock for new users per Greptile feedback)

* fix: restore daltino-approved voice (thinking quality/收尾铁律) with technical patches

Reverts 'session hygiene' rebranding. Preserves original approved framing
while keeping technical improvements:
- JSON transcript_path parsing documentation
- filesystem mtime staleness check
- 'skip tests for now' rationalization pattern
- disk critically low explicit block condition

* fix: remove stdout JSON echo — Stop hooks write feedback to stderr, not stdout

Previously sys.stdout.write(raw) echoed the raw hook JSON payload to stdout,
which Claude Code displays as the hook's response message. When the hook
blocked (exit 2), Claude saw {"transcript_path":"...","session_id":"..."}
instead of the actual blocking reason from stderr.

This made the gate functionally silent from Claude's perspective — it could
not guide Claude to the corrective action (update growth-log / free disk).

Fix per Greptile feedback: stop echo, let stderr messages reach Claude.

* fix: remove duplicate disk-critical log line

* docs(delivery-gate): v1.1.0 — accurate scope (deterministic checks, not reasoning), warning vs block table, CI/CD analogy, limitations section, self-audit pairing

* fix(delivery-gate): expand rationalization regex coverage (R3/R4) — match "we can fix" and "integration tests" variants

* chore: bump version to 1.1.1 to re-trigger CI checks
2026-06-29 19:22:55 -07:00

4.9 KiB

name, description, version, metadata
name description version metadata
delivery-gate Stop hook that blocks Claude from finishing until quality checks pass. Detects rationalization patterns (surface text heuristics), stale learning logs (filesystem mtime), and low disk space. Complements self-audit by mechanically enforcing learning capture habits. 1.1.1
origin
ECC

Delivery Gate — Mechanical Quality Gate for Claude Code

A Stop hook that checks three things before Claude can finish a session, using only deterministic checks — file modification timestamps, disk usage, and regex patterns on the transcript text. No AI inference.

This is distinct from reasoning gates (like self-audit): delivery-gate checks machine-verifiable facts; self-audit checks output quality across four reasoning dimensions. Together they form defense in depth:

  • delivery-gate: "Was the learning library touched today? Is disk space safe?"
  • self-audit: "Is the file content correct, complete, and honest?"

This is the same pattern as CI pipeline gates — automated, deterministic checks that verify machine-readable facts rather than trusting self-reported status.

What It Checks

Check Mechanism On Hit
Rationalization patterns Regex on transcript tail Warning only (never blocks)
Stale learning libraries mtime on 5 configurable paths Warning if some stale; Block if >=3 stale OR growth-log stale + complex task
Disk space < 50GB shutil.disk_usage Warning
Disk space < 15GB shutil.disk_usage Block (exit 2)

Rationalization detection warns about patterns like "skip tests for now" and "pre-existing bug" — surface signals that thinking may have been cut short. It never blocks on its own, because regex heuristics can false-positive. The blocking conditions are: disk critical, >=3 learning libs stale, OR growth-log specifically stale (all require complex task >=3 edits).

Why

Claude Code's built-in checks cover code quality (build → type → lint → test). But there's a different failure mode: the agent produces working code while the session hygiene was neglected — learning not captured, rationalized shortcuts, disk running out silently.

Over many sessions of "ship and forget," the human hasn't grown. This hook enforces the habit: complex task → must touch learning libraries.

Install

cp quality-gate.py ~/.claude/scripts/

Add to ~/.claude/settings.json:

{
  "hooks": {
    "Stop": [{
      "hooks": [{
        "type": "command",
        "command": "python3 ~/.claude/scripts/quality-gate.py",
        "timeout": 5000
      }]
    }]
  }
}

Learning Libraries

Create these files in your project's memory directory. The hook checks if at least one was updated today:

memory/
├── growth-log/          # Daily learning entries (directory)
├── decisions/log.md     # Decision log
├── output-index.md      # Index of session outputs
├── ratings-tracker.md   # Skill ratings over time
└── tooling_capabilities.md  # Known tools inventory

Customize the LIBS dict to match your own file structure.

Configuration

Edit quality-gate.py:

Variable Default Purpose
RATIONALIZE 4 patterns Regex patterns for rationalization detection
LIBS 5 libraries Files/dirs to check for today's updates
COMPLEX_THRESHOLD 3 Edit/Write calls to classify as complex
DISK_WARN_GB 50 Warn below this
DISK_CRIT_GB 15 Block below this

Examples

Simple session — allowed:

edit_count=1 (< 3, not complex) → exit 0

Complex task, learning captured — allowed:

edit_count=5 (complex) → checks LIBS → growth-log updated today → exit 0

Complex task, no learning — BLOCKED:

edit_count=4 (complex) → checks LIBS → all 5 stale → exit 2
stderr: "Blocked: complex task completed but no learning captured today."

Low disk space — BLOCKED:

disk_free=12GB < 15GB critical → exit 2
stderr: "Blocked: disk space at 12GB (threshold: 15GB)."

Limitations

The hook enforces the habit of touching learning libraries, not the quality of what was recorded. If output-index.md is updated but growth-log is skipped, the hook passes (1 of 5 libraries touched). This is by design: mechanical gates check machine-verifiable facts. For content quality verification, pair with self-audit.

Compatibility

  • Python 3.8+ (uses from __future__ import annotations)
  • Cross-platform: Windows, macOS, Linux
  • Zero dependencies beyond stdlib

Quality

This code went through 4 rounds of automated code review (CodeRabbit + Greptile) with 9 real bugs found and fixed.

See Also

  • self-audit — Reasoning quality gate (completeness/consistency/groundedness/honesty)
  • verification-loop — Code quality checks (build/type/lint/test)
  • gateguard — PreToolUse safety gate