mirror of
https://github.com/affaan-m/everything-claude-code.git
synced 2026-06-16 16:36:53 +08:00
- Replace httpx.Retry references with correct httpx API usage across all files (httpx has no built-in Retry class; use HTTPTransport/Limits instead) - Fix _check_summary to check first 100 words (not 100 characters) - Fix template to only show → improvement arrow for non-5 scores - Clarify hook documentation: hook echoes reminder, does not run evaluator - Add return type annotation to main() - Make required parameter keyword-only in _read_file_or_text
87 lines
3.6 KiB
Markdown
87 lines
3.6 KiB
Markdown
# Low-Score Example: Adding Retry Logic
|
|
|
|
## Task
|
|
> Add retry logic with exponential backoff to all HTTP requests in the API client. Max 3 retries, initial delay 1s, max delay 30s.
|
|
|
|
## Agent Output
|
|
|
|
Here's a retry implementation. I think this should work for most cases.
|
|
|
|
I used the urllib3 library since it has built-in retry support. The Retry class handles backoff automatically.
|
|
|
|
```python
|
|
from urllib3.util import Retry
|
|
|
|
# Add this somewhere in your HTTP client
|
|
retry = Retry(
|
|
total=3,
|
|
backoff_factor=1.0,
|
|
status_forcelist=[500, 502, 503, 504]
|
|
)
|
|
```
|
|
|
|
You'll want to test this before deploying. There might be edge cases with POST requests but those are probably fine. I haven't tested the timeout behavior but it should work.
|
|
|
|
Let me know if you need anything else!
|
|
|
|
---
|
|
|
|
## Evaluation Report
|
|
|
|
```
|
|
============================================================
|
|
AGENT SELF-EVALUATION REPORT
|
|
============================================================
|
|
|
|
Accuracy ██░░░ 2/5
|
|
- Hedged claim without verification ("I think this should work")
|
|
- Untested ("I haven't tested the timeout behavior")
|
|
- Speculation without evidence ("those are probably fine")
|
|
→ Wrong library used. Project uses httpx, not urllib3.
|
|
urllib3.util.Retry is incompatible with httpx.
|
|
|
|
Completeness ███░░ 3/5
|
|
- Explicit gap acknowledged ("might be edge cases with POST")
|
|
- Limited scope noted (only mentioned 5xx, not 429 or connection errors)
|
|
→ User asked for "all HTTP requests." Only partial coverage:
|
|
missing 429 handling, connection errors, timeout handling.
|
|
|
|
Clarity ████░ 4/5
|
|
+ Uses code blocks
|
|
→ Code is readable but no explanation of where to add it
|
|
("somewhere in your HTTP client" is vague).
|
|
|
|
Actionability ██░░░ 2/5
|
|
- Defers work to user ("you'll want to test this")
|
|
- Vague suggestion without specifics
|
|
→ No PR, no file created, no test written. User has to:
|
|
1. Figure out where to add the code
|
|
2. Fix the library mismatch (httpx not urllib3)
|
|
3. Write tests
|
|
4. Handle POST idempotency
|
|
5. Test timeout behavior
|
|
|
|
Conciseness ███░░ 3/5
|
|
- Meta-commentary adds words without information
|
|
("Let me know if you need anything else!")
|
|
→ 120 words. Low word count but low information density.
|
|
Half the text is hedging and disclaimers, not substance.
|
|
|
|
OVERALL 2.8/5
|
|
|
|
TOP IMPROVEMENTS (axes scoring < 4):
|
|
[Accuracy] Switch to httpx — grep the codebase to confirm the HTTP
|
|
library before writing code.
|
|
[Actionability] Create a PR with the changed file + test file. Run the
|
|
tests. End with "PR #N ready to merge."
|
|
[Completeness] List what's covered AND what's not. If POST retry is
|
|
unsafe, say so explicitly with reasoning.
|
|
```
|
|
|
|
### Why This Scores Poorly
|
|
|
|
1. **Accuracy fails at the most basic level** — wrong library. One `grep httpx src/` would have caught this. The hedging language ("I think", "probably", "should work") signals the agent knows it's guessing.
|
|
2. **Not actionable.** The user received a code snippet and a list of things they need to do. The agent did the easy part (suggesting a library) and deferred the hard parts (testing, integration, edge cases) to the user.
|
|
3. **Completeness gaps are acknowledged but not fixed.** "Might be edge cases" is worse than not mentioning them — it shows awareness of the gap and a choice not to address it.
|
|
4. **Information density is low.** 120 words, of which ~60 are hedging/disclaimers/politeness. The actual substance (3 lines of code) could have been delivered in 40 words with verification.
|