mirror of
https://github.com/affaan-m/everything-claude-code.git
synced 2026-06-16 16:36:53 +08:00
fix: address second-round review comments
- Replace httpx.Retry references with correct httpx API usage across all files (httpx has no built-in Retry class; use HTTPTransport/Limits instead) - Fix _check_summary to check first 100 words (not 100 characters) - Fix template to only show → improvement arrow for non-5 scores - Clarify hook documentation: hook echoes reminder, does not run evaluator - Add return type annotation to main() - Make required parameter keyword-only in _read_file_or_text
This commit is contained in:
parent
2ea4d779a3
commit
7c0a0049a8
@ -63,24 +63,20 @@ AGENT SELF-EVALUATION REPORT
|
||||
Summary: Overall score X.X/5 across 5 quality axes.
|
||||
|
||||
Accuracy █████ 5/5
|
||||
+ [Evidence: passing tests, verified claims]
|
||||
→ [Improvement if score < 5]
|
||||
+ [Evidence: passing tests, verified claims] (no → when score = 5)
|
||||
|
||||
Completeness █████ 5/5
|
||||
Completeness ████░ 4/5
|
||||
+ [What's covered]
|
||||
→ [Improvement if score < 5]
|
||||
→ [Improvement: only shown when score < 5]
|
||||
|
||||
Clarity █████ 5/5
|
||||
+ [Structure signals]
|
||||
→ [Improvement if score < 5]
|
||||
+ [Structure signals] (no → when score = 5)
|
||||
|
||||
Actionability █████ 5/5
|
||||
+ [User can act immediately]
|
||||
→ [Improvement if score < 5]
|
||||
+ [User can act immediately] (no → when score = 5)
|
||||
|
||||
Conciseness █████ 5/5
|
||||
+ [Information density]
|
||||
→ [Improvement if score < 5]
|
||||
+ [Information density] (no → when score = 5)
|
||||
|
||||
OVERALL X.X/5
|
||||
|
||||
@ -115,7 +111,7 @@ Summary: Overall score X.X/5 across 5 quality axes.
|
||||
|
||||
Accuracy █████ 5/5
|
||||
+ Tests passing
|
||||
+ grep confirms httpx.Retry used correctly
|
||||
+ grep confirms httpx transport configured correctly
|
||||
+ Import verified
|
||||
|
||||
Completeness ████░ 4/5
|
||||
@ -192,13 +188,13 @@ Summary: Overall score X.X/5 across 5 quality axes.
|
||||
OVERALL 2.8/5
|
||||
|
||||
CRITICAL ISSUES (axes ≤ 2):
|
||||
[Accuracy] Score 2/5 — Wrong library. Use httpx.Retry, not urllib3.Retry.
|
||||
[Accuracy] Score 2/5 — Wrong library. Use httpx, not urllib3.
|
||||
[Actionability] Score 2/5 — No deliverable. Create a PR with test file.
|
||||
|
||||
Self-check: Would the user agree with this assessment? Yes — the report cites the wrong library, lack of tests, and missing deliverable.
|
||||
|
||||
TOP IMPROVEMENTS:
|
||||
1. [Accuracy] Switch to httpx.Retry — grep the codebase first
|
||||
1. [Accuracy] Switch to httpx — grep the codebase first
|
||||
2. [Actionability] Create a PR with src/api_client.py + tests
|
||||
3. [Completeness] Handle 429, connection errors, and timeout
|
||||
|
||||
|
||||
@ -114,7 +114,7 @@ Overall: 4.6 — One gap (timeout handling). Fix before merging.
|
||||
Task: Add retry logic to HTTP client
|
||||
|
||||
Scorecard:
|
||||
Accuracy: 2 — Used urllib3.Retry which doesn't exist in our
|
||||
Accuracy: 2 — Used urllib3 which doesn't match our
|
||||
httpx-based codebase. Wrong library.
|
||||
Completeness: 3 — Works for GET. POST/PUT not handled (user
|
||||
said "all HTTP requests").
|
||||
@ -125,7 +125,7 @@ Scorecard:
|
||||
3 places instead of one shared RetryConfig object.
|
||||
|
||||
Overall: 2.8 — Wrong library used. Needs httpx rewrite.
|
||||
Fix accuracy first (switch to httpx.Retry), then extend to all
|
||||
Fix accuracy first (switch to httpx), then extend to all
|
||||
HTTP methods, then consolidate config.
|
||||
```
|
||||
|
||||
@ -171,7 +171,7 @@ The evaluation is about the delivered output, not about re-arguing design decisi
|
||||
- **Evaluate the output, not the process.** The user cares about what you delivered, not how many iterations you took.
|
||||
- **One improvement per weak axis.** Don't list 5 things for one axis — pick the highest-impact gap.
|
||||
- **Tie improvements to user impact.** "Missing error handling means the user's API call will crash silently" beats "add error handling."
|
||||
- **Be specific about what 'fixed' looks like.** "Re-run with httpx.Retry( total=3, backoff_factor=1.0 )" beats "fix the library issue."
|
||||
- **Be specific about what 'fixed' looks like.** "Re-run with httpx transport configured for retries" beats "fix the library issue."
|
||||
- **Use tool outputs as evidence.** If tests passed, cite them. If lint is clean, cite it. Don't guess — grep for the proof.
|
||||
- **If you can't find any gaps, try harder.** A perfect score across all 5 axes is rare. Ask: "If I were the user, what would annoy me about this output?"
|
||||
|
||||
|
||||
@ -6,10 +6,10 @@
|
||||
## Agent Output
|
||||
|
||||
### Summary
|
||||
Added `httpx.Retry` transport to the shared `ApiClient` class. All HTTP methods (GET, POST, PUT, DELETE) now retry on 429, 5xx, and connection errors with exponential backoff. 42 tests pass. Single PR ready to merge.
|
||||
Added `httpx.HTTPTransport` with retry configuration to the shared `ApiClient` class. All HTTP methods (GET, POST, PUT, DELETE) now retry on 429, 5xx, and connection errors with exponential backoff. 42 tests pass. Single PR ready to merge.
|
||||
|
||||
### Files Changed
|
||||
- `src/api_client.py` — Added `RetryTransport` wrapper with `httpx.Retry( total=3, backoff_factor=1.0, status_forcelist=[429, 500, 502, 503, 504] )`
|
||||
- `src/api_client.py` — Added `RetryTransport` wrapper with `httpx.HTTPTransport(retries=3)` and exponential backoff configured via `httpx.Limits`
|
||||
- `tests/test_api_client.py` — Added 8 tests: happy path, retry on 429, retry on 503, max retries exhausted, timeout, backoff timing assertion, idempotent GET retry, no-retry on 4xx (except 429)
|
||||
|
||||
### Verification
|
||||
|
||||
@ -7,7 +7,7 @@
|
||||
|
||||
Here's a retry implementation. I think this should work for most cases.
|
||||
|
||||
I used the urllib3 library since it has built-in retry support. The Retry class handles backoff automatically so we just need to configure it.
|
||||
I used the urllib3 library since it has built-in retry support. The Retry class handles backoff automatically.
|
||||
|
||||
```python
|
||||
from urllib3.util import Retry
|
||||
@ -38,7 +38,7 @@ AGENT SELF-EVALUATION REPORT
|
||||
- Untested ("I haven't tested the timeout behavior")
|
||||
- Speculation without evidence ("those are probably fine")
|
||||
→ Wrong library used. Project uses httpx, not urllib3.
|
||||
urllib3.util.Retry is incompatible with httpx transport.
|
||||
urllib3.util.Retry is incompatible with httpx.
|
||||
|
||||
Completeness ███░░ 3/5
|
||||
- Explicit gap acknowledged ("might be edge cases with POST")
|
||||
@ -70,7 +70,7 @@ AGENT SELF-EVALUATION REPORT
|
||||
OVERALL 2.8/5
|
||||
|
||||
TOP IMPROVEMENTS (axes scoring < 4):
|
||||
[Accuracy] Switch to httpx.Retry — grep the codebase to confirm the HTTP
|
||||
[Accuracy] Switch to httpx — grep the codebase to confirm the HTTP
|
||||
library before writing code.
|
||||
[Actionability] Create a PR with the changed file + test file. Run the
|
||||
tests. End with "PR #N ready to merge."
|
||||
|
||||
@ -6,8 +6,8 @@ This reference provides concrete scoring anchors for each axis. Use it when you'
|
||||
|
||||
| Score | Anchor | Example |
|
||||
|---|---|---|
|
||||
| 5 | All facts verified against tool output, docs, or authoritative sources. No errors. | Used `httpx.Retry` — confirmed in httpx docs. All method names verified with grep against codebase. |
|
||||
| 4 | One minor inaccuracy that doesn't affect correctness. | Correct library, wrong default value for one parameter (httpx defaults to 1.0s, claimed 0.5s). |
|
||||
| 5 | All facts verified against tool output, docs, or authoritative sources. No errors. | Configured retry via httpx transport — confirmed in httpx docs. All method names verified with grep against codebase. |
|
||||
| 4 | One minor inaccuracy that doesn't affect correctness. | Correct library, wrong default value for one parameter (claimed 0.5s, docs say 1.0s). |
|
||||
| 3 | One significant factual error, or 3+ minor inaccuracies. | Used `urllib3.Retry` in an httpx codebase. Works in this one case but wrong library. |
|
||||
| 2 | Multiple significant errors. Output would fail if followed. | Claimed "add this to package.json" but project uses pyproject.toml. Two other config claims also wrong. |
|
||||
| 1 | Fundamentally incorrect. Output contradicts itself or known facts. | Code has syntax errors. API endpoint doesn't exist. Claims a function signature that grep disproves. |
|
||||
|
||||
@ -1,6 +1,6 @@
|
||||
# Hook Integration for Session-Stop Self-Evaluation
|
||||
|
||||
Add this hook to `hooks/hooks.json` to remind the agent to self-evaluate at the end of every session:
|
||||
Add this hook to `hooks/hooks.json` to remind the agent to self-evaluate at the end of every session (the hook echoes a reminder; it does not run the evaluator automatically):
|
||||
|
||||
```json
|
||||
{
|
||||
|
||||
@ -144,7 +144,7 @@ def _check_jargon(text: str) -> tuple[int, list[str]]:
|
||||
def _check_summary(text: str) -> tuple[int, list[str]]:
|
||||
"""Return clarity deduction when long output lacks an early summary."""
|
||||
summary_terms = ["summary", "tldr", "overview", "in short"]
|
||||
has_early_summary = any(term in text[:100].lower() for term in summary_terms)
|
||||
has_early_summary = any(term in ' '.join(text.split()[:100]).lower() for term in summary_terms)
|
||||
if not has_early_summary and count_words(text) > 300:
|
||||
return 1, ["- No summary/TLDR in first 100 words (text is 300+ words)"]
|
||||
return 0, []
|
||||
@ -354,7 +354,7 @@ def format_report(scores: list[AxisScore]) -> str:
|
||||
return "\n".join(lines)
|
||||
|
||||
|
||||
def _read_file_or_text(path: Optional[str], required: bool = False) -> Optional[str]:
|
||||
def _read_file_or_text(path: Optional[str], *, required: bool = False) -> Optional[str]:
|
||||
"""Read a file path or return inline text when allowed."""
|
||||
if path is None:
|
||||
return None
|
||||
@ -379,7 +379,7 @@ def _read_input(args: argparse.Namespace) -> tuple[Optional[str], str]:
|
||||
return _read_file_or_text(args.task), sys.stdin.read()
|
||||
|
||||
|
||||
def main():
|
||||
def main() -> None:
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Evaluate agent output against the 5-axis rubric"
|
||||
)
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user