diff --git a/agents/agent-evaluator.md b/agents/agent-evaluator.md
index 3169382e..3a22ee93 100644
--- a/agents/agent-evaluator.md
+++ b/agents/agent-evaluator.md
@@ -63,24 +63,20 @@ AGENT SELF-EVALUATION REPORT
 Summary: Overall score X.X/5 across 5 quality axes.
 
   Accuracy         █████ 5/5
-    + [Evidence: passing tests, verified claims]
-    → [Improvement if score < 5]
+    + [Evidence: passing tests, verified claims]  (no → when score = 5)
 
-  Completeness      █████ 5/5
+  Completeness      ████░ 4/5
     + [What's covered]
-    → [Improvement if score < 5]
+    → [Improvement: only shown when score < 5]
 
   Clarity           █████ 5/5
-    + [Structure signals]
-    → [Improvement if score < 5]
+    + [Structure signals]  (no → when score = 5)
 
   Actionability     █████ 5/5
-    + [User can act immediately]
-    → [Improvement if score < 5]
+    + [User can act immediately]  (no → when score = 5)
 
   Conciseness       █████ 5/5
-    + [Information density]
-    → [Improvement if score < 5]
+    + [Information density]  (no → when score = 5)
 
   OVERALL           X.X/5
 
@@ -115,7 +111,7 @@ Summary: Overall score X.X/5 across 5 quality axes.
 
   Accuracy         █████ 5/5
     + Tests passing
-    + grep confirms httpx.Retry used correctly
+    + grep confirms httpx transport configured correctly
     + Import verified
 
   Completeness      ████░ 4/5
@@ -192,13 +188,13 @@ Summary: Overall score X.X/5 across 5 quality axes.
   OVERALL           2.8/5
 
 CRITICAL ISSUES (axes ≤ 2):
-  [Accuracy] Score 2/5 — Wrong library. Use httpx.Retry, not urllib3.Retry.
+  [Accuracy] Score 2/5 — Wrong library. Use httpx, not urllib3.
   [Actionability] Score 2/5 — No deliverable. Create a PR with test file.
 
 Self-check: Would the user agree with this assessment? Yes — the report cites the wrong library, lack of tests, and missing deliverable.
 
 TOP IMPROVEMENTS:
-  1. [Accuracy] Switch to httpx.Retry — grep the codebase first
+  1. [Accuracy] Switch to httpx — grep the codebase first
   2. [Actionability] Create a PR with src/api_client.py + tests
   3. [Completeness] Handle 429, connection errors, and timeout
 
diff --git a/skills/agent-self-evaluation/SKILL.md b/skills/agent-self-evaluation/SKILL.md
index 96edc164..0e1a2fd6 100644
--- a/skills/agent-self-evaluation/SKILL.md
+++ b/skills/agent-self-evaluation/SKILL.md
@@ -114,7 +114,7 @@ Overall: 4.6 — One gap (timeout handling). Fix before merging.
 Task: Add retry logic to HTTP client
 
 Scorecard:
-  Accuracy:    2 — Used urllib3.Retry which doesn't exist in our
+  Accuracy:    2 — Used urllib3 which doesn't match our
                   httpx-based codebase. Wrong library.
   Completeness: 3 — Works for GET. POST/PUT not handled (user
                   said "all HTTP requests").
@@ -125,7 +125,7 @@ Scorecard:
                   3 places instead of one shared RetryConfig object.
 
 Overall: 2.8 — Wrong library used. Needs httpx rewrite.
-  Fix accuracy first (switch to httpx.Retry), then extend to all
+  Fix accuracy first (switch to httpx), then extend to all
   HTTP methods, then consolidate config.
 ```
 
@@ -171,7 +171,7 @@ The evaluation is about the delivered output, not about re-arguing design decisi
 - **Evaluate the output, not the process.** The user cares about what you delivered, not how many iterations you took.
 - **One improvement per weak axis.** Don't list 5 things for one axis — pick the highest-impact gap.
 - **Tie improvements to user impact.** "Missing error handling means the user's API call will crash silently" beats "add error handling."
-- **Be specific about what 'fixed' looks like.** "Re-run with httpx.Retry( total=3, backoff_factor=1.0 )" beats "fix the library issue."
+- **Be specific about what 'fixed' looks like.** "Re-run with httpx transport configured for retries" beats "fix the library issue."
 - **Use tool outputs as evidence.** If tests passed, cite them. If lint is clean, cite it. Don't guess — grep for the proof.
 - **If you can't find any gaps, try harder.** A perfect score across all 5 axes is rare. Ask: "If I were the user, what would annoy me about this output?"
 
diff --git a/skills/agent-self-evaluation/examples/high-score-example.md b/skills/agent-self-evaluation/examples/high-score-example.md
index a2da55b8..46d04587 100644
--- a/skills/agent-self-evaluation/examples/high-score-example.md
+++ b/skills/agent-self-evaluation/examples/high-score-example.md
@@ -6,10 +6,10 @@
 ## Agent Output
 
 ### Summary
-Added `httpx.Retry` transport to the shared `ApiClient` class. All HTTP methods (GET, POST, PUT, DELETE) now retry on 429, 5xx, and connection errors with exponential backoff. 42 tests pass. Single PR ready to merge.
+Added `httpx.HTTPTransport` with retry configuration to the shared `ApiClient` class. All HTTP methods (GET, POST, PUT, DELETE) now retry on 429, 5xx, and connection errors with exponential backoff. 42 tests pass. Single PR ready to merge.
 
 ### Files Changed
-- `src/api_client.py` — Added `RetryTransport` wrapper with `httpx.Retry( total=3, backoff_factor=1.0, status_forcelist=[429, 500, 502, 503, 504] )`
+- `src/api_client.py` — Added `RetryTransport` wrapper with `httpx.HTTPTransport(retries=3)` and exponential backoff configured via `httpx.Limits`
 - `tests/test_api_client.py` — Added 8 tests: happy path, retry on 429, retry on 503, max retries exhausted, timeout, backoff timing assertion, idempotent GET retry, no-retry on 4xx (except 429)
 
 ### Verification
diff --git a/skills/agent-self-evaluation/examples/low-score-example.md b/skills/agent-self-evaluation/examples/low-score-example.md
index 25047e7a..6fff99f6 100644
--- a/skills/agent-self-evaluation/examples/low-score-example.md
+++ b/skills/agent-self-evaluation/examples/low-score-example.md
@@ -7,7 +7,7 @@
 
 Here's a retry implementation. I think this should work for most cases.
 
-I used the urllib3 library since it has built-in retry support. The Retry class handles backoff automatically so we just need to configure it.
+I used the urllib3 library since it has built-in retry support. The Retry class handles backoff automatically.
 
 ```python
 from urllib3.util import Retry
@@ -38,7 +38,7 @@ AGENT SELF-EVALUATION REPORT
     - Untested ("I haven't tested the timeout behavior")
     - Speculation without evidence ("those are probably fine")
     → Wrong library used. Project uses httpx, not urllib3.
-      urllib3.util.Retry is incompatible with httpx transport.
+                      urllib3.util.Retry is incompatible with httpx.
 
   Completeness      ███░░ 3/5
     - Explicit gap acknowledged ("might be edge cases with POST")
@@ -70,7 +70,7 @@ AGENT SELF-EVALUATION REPORT
   OVERALL           2.8/5
 
 TOP IMPROVEMENTS (axes scoring < 4):
-  [Accuracy] Switch to httpx.Retry — grep the codebase to confirm the HTTP
+  [Accuracy] Switch to httpx — grep the codebase to confirm the HTTP
     library before writing code.
   [Actionability] Create a PR with the changed file + test file. Run the
     tests. End with "PR #N ready to merge."
diff --git a/skills/agent-self-evaluation/references/evaluation-criteria.md b/skills/agent-self-evaluation/references/evaluation-criteria.md
index faf83e7d..9a352bf1 100644
--- a/skills/agent-self-evaluation/references/evaluation-criteria.md
+++ b/skills/agent-self-evaluation/references/evaluation-criteria.md
@@ -6,8 +6,8 @@ This reference provides concrete scoring anchors for each axis. Use it when you'
 
 | Score | Anchor | Example |
 |---|---|---|
-| 5 | All facts verified against tool output, docs, or authoritative sources. No errors. | Used `httpx.Retry` — confirmed in httpx docs. All method names verified with grep against codebase. |
-| 4 | One minor inaccuracy that doesn't affect correctness. | Correct library, wrong default value for one parameter (httpx defaults to 1.0s, claimed 0.5s). |
+| 5 | All facts verified against tool output, docs, or authoritative sources. No errors. | Configured retry via httpx transport — confirmed in httpx docs. All method names verified with grep against codebase. |
+| 4 | One minor inaccuracy that doesn't affect correctness. | Correct library, wrong default value for one parameter (claimed 0.5s, docs say 1.0s). |
 | 3 | One significant factual error, or 3+ minor inaccuracies. | Used `urllib3.Retry` in an httpx codebase. Works in this one case but wrong library. |
 | 2 | Multiple significant errors. Output would fail if followed. | Claimed "add this to package.json" but project uses pyproject.toml. Two other config claims also wrong. |
 | 1 | Fundamentally incorrect. Output contradicts itself or known facts. | Code has syntax errors. API endpoint doesn't exist. Claims a function signature that grep disproves. |
diff --git a/skills/agent-self-evaluation/references/hook-integration.md b/skills/agent-self-evaluation/references/hook-integration.md
index 260de2ca..066556f0 100644
--- a/skills/agent-self-evaluation/references/hook-integration.md
+++ b/skills/agent-self-evaluation/references/hook-integration.md
@@ -1,6 +1,6 @@
 # Hook Integration for Session-Stop Self-Evaluation
 
-Add this hook to `hooks/hooks.json` to remind the agent to self-evaluate at the end of every session:
+Add this hook to `hooks/hooks.json` to remind the agent to self-evaluate at the end of every session (the hook echoes a reminder; it does not run the evaluator automatically):
 
 ```json
 {
diff --git a/skills/agent-self-evaluation/scripts/evaluate.py b/skills/agent-self-evaluation/scripts/evaluate.py
index 566242a1..f560dc98 100755
--- a/skills/agent-self-evaluation/scripts/evaluate.py
+++ b/skills/agent-self-evaluation/scripts/evaluate.py
@@ -144,7 +144,7 @@ def _check_jargon(text: str) -> tuple[int, list[str]]:
 def _check_summary(text: str) -> tuple[int, list[str]]:
     """Return clarity deduction when long output lacks an early summary."""
     summary_terms = ["summary", "tldr", "overview", "in short"]
-    has_early_summary = any(term in text[:100].lower() for term in summary_terms)
+    has_early_summary = any(term in ' '.join(text.split()[:100]).lower() for term in summary_terms)
     if not has_early_summary and count_words(text) > 300:
         return 1, ["- No summary/TLDR in first 100 words (text is 300+ words)"]
     return 0, []
@@ -354,7 +354,7 @@ def format_report(scores: list[AxisScore]) -> str:
     return "\n".join(lines)
 
 
-def _read_file_or_text(path: Optional[str], required: bool = False) -> Optional[str]:
+def _read_file_or_text(path: Optional[str], *, required: bool = False) -> Optional[str]:
     """Read a file path or return inline text when allowed."""
     if path is None:
         return None
@@ -379,7 +379,7 @@ def _read_input(args: argparse.Namespace) -> tuple[Optional[str], str]:
     return _read_file_or_text(args.task), sys.stdin.read()
 
 
-def main():
+def main() -> None:
     parser = argparse.ArgumentParser(
         description="Evaluate agent output against the 5-axis rubric"
     )