eval-harness
Formal evaluation framework for Claude Code sessions implementing eval-driven development (EDD) principles
Content Preview
--- name: eval-harness description: Formal evaluation framework for Claude Code sessions implementing eval-driven development (EDD) principles origin: ECC tools: Read, Write, Edit, Bash, Grep, Glob --- # Eval Harness Skill A formal evaluation framework for Claude Code sessions, implementing eval-driven development (EDD) principles. ## When to Activate - Setting up eval-driven development (EDD) for AI-assisted workflows - Defining pass/fail criteria for Claude Code task completion - Measuring
How to Use
Recommended: Install to project (local)
mkdir -p .claude/skills
curl -o .claude/skills/eval-harness.md \
https://raw.githubusercontent.com/affaan-m/everything-claude-code/main/.agents/skills/eval-harness/SKILL.mdSkill is scoped to this project only. Add .claude/skills/ to your .gitignoreif you don't want to commit it.
Alternative: Clone full repo
git clone https://github.com/affaan-m/everything-claude-codeThen reference at .agents/skills/eval-harness/SKILL.md
Related Skills
agent-eval
Head-to-head comparison of coding agents (Claude Code, Aider, Codex, etc.) on custom tasks with pass rate, cost, time, and consistency metrics
agent-evalagenteval
by affaan-m · everything-claude-code
ai-engineering-toolkit
6 production-ready AI engineering workflows: prompt evaluation (8-dimension scoring), context budget planning, RAG pipeline design, agent security audit (65-point checklist), eval harness building, and product sense coaching.
securityprompt-engineeringragsecurity
by sickn33 (Antigravity) · antigravity-awesome-skills
hugging-face-evaluation
Add and manage evaluation results in Hugging Face model cards. Supports extracting eval tables from README content, importing scores from Artificial Analysis API, and running custom model evaluations with vLLM/lighteval. Works with the model-index metadata format.
developmenthuggingfaceevaluation
by sickn33 (Antigravity) · antigravity-awesome-skills
supply-chain-risk-auditor
Identifies dependencies at heightened risk of exploitation or takeover. Use when assessing supply chain attack surface, evaluating dependency health, or scoping security engagements.
securitysupplychainrisk
by sickn33 (Antigravity) · antigravity-awesome-skills