mirror of https://github.com/affaan-m/everything-claude-code.git synced 2026-06-16 16:36:53 +08:00

Hawthorn bd45947941 feat(skills,agents): add agent-self-evaluation skill and agent-evaluator persona

Add structured 5-axis self-evaluation framework for agent output quality:
- Accuracy, Completeness, Clarity, Actionability, Conciseness
- Evidence-based scoring with concrete improvement suggestions
- Standalone Python evaluator script with keyword heuristics
- Detailed scoring anchors reference guide
- High-score and low-score annotated examples
- Reusable evaluation report template
- Optional hook integration for session-stop evaluation

Agent persona (agent-evaluator) provides a dedicated subagent
for applying the rubric to agent output with tool-backed verification.

All files tested: Python script runs, examples score correctly
(high 4.2, low 3.4), frontmatter parses clean, 183 lines (under 500).

2026-06-10 16:56:18 +05:30

1.8 KiB

Raw Blame History

Hook Integration for Session-Stop Self-Evaluation

Add this hook to hooks/hooks.json to automatically trigger self-evaluation at the end of every session:

{
  "hooks": {
    "Stop": [
      {
        "matcher": "true",
        "hooks": [
          {
            "type": "command",
            "command": "echo '[Self-Eval] Session complete. Consider running agent-self-evaluation to rate your output.'"
          }
        ],
        "description": "Remind agent to self-evaluate at session end"
      }
    ]
  }
}

Integration with the Python Evaluator

The scripts/evaluate.py script can be used as a standalone tool:

# Pipe agent output directly
echo "Your agent response here" | python3 skills/agent-self-evaluation/scripts/evaluate.py

# From files
python3 skills/agent-self-evaluation/scripts/evaluate.py --task task.txt --output response.txt

To integrate it into hooks, capture the last agent output to a file first, then run the evaluator:

{
  "PostToolUse": [
    {
      "matcher": "tool == \"Bash\" && tool_input.command matches \"(test|pytest|npm test|go test)\"",
      "hooks": [
        {
          "type": "command",
          "command": "echo '[Self-Eval] Tests completed. Consider running agent-self-evaluation.'"
        }
      ],
      "description": "Remind agent to self-evaluate after test runs"
    }
  ]
}

These hooks are opt-in. Add them to your local hooks/hooks.json if you want automated evaluation prompts.

Manual Usage (Recommended)

The most reliable approach is manual invocation — the agent runs self-evaluation as part of its workflow when the agent-self-evaluation skill is active, without requiring hook configuration. The skill's "When to Activate" section already covers trigger conditions (multi-file changes, debugging sessions, design documents).

1.8 KiB Raw Blame History

Hook Integration for Session-Stop Self-Evaluation

Integration with the Python Evaluator

Manual Usage (Recommended)

1.8 KiB

Raw Blame History