agent-eval
Head-to-head comparison of coding agents (Claude Code, Aider, Codex, etc.) on custom tasks with pass rate, cost, time, and consistency metrics
Content Preview
--- name: agent-eval description: Head-to-head comparison of coding agents (Claude Code, Aider, Codex, etc.) on custom tasks with pass rate, cost, time, and consistency metrics origin: ECC tools: Read, Write, Edit, Bash, Grep, Glob --- # Agent Eval Skill A lightweight CLI tool for comparing coding agents head-to-head on reproducible tasks. Every "which coding agent is best?" comparison runs on vibes — this tool systematizes it. ## When to Activate - Comparing coding agents (Claude Code, Aide
How to Use
Recommended: Install to project (local)
mkdir -p .claude/skills
curl -o .claude/skills/agent-eval.md \
https://raw.githubusercontent.com/affaan-m/everything-claude-code/main/skills/agent-eval/SKILL.mdSkill is scoped to this project only. Add .claude/skills/ to your .gitignoreif you don't want to commit it.
Alternative: Clone full repo
git clone https://github.com/affaan-m/everything-claude-codeThen reference at skills/agent-eval/SKILL.md
Related Skills
agent-evaluation
You're a quality engineer who has seen agents that aced benchmarks fail spectacularly in production. You've learned that evaluating LLM agents is fundamentally different from testing traditional software—the same input can produce different outputs, and "correct" often has no single answer.
data-aiagentevaluation
by sickn33 (Antigravity) · antigravity-awesome-skills
agent-sdk-dev/agent-sdk-verifier-py
Use this agent to verify that a Python Agent SDK application is properly configured, follows SDK best practices and documentation recommendations, and is ready for deployment or testing. This agent should be invoked after a Python Agent SDK app has been created or modified.
agentagentagent-sdk-devagent-sdk-verifier-py
by Anthropic · anthropic-official-plugins
agent-sdk-dev/agent-sdk-verifier-ts
Use this agent to verify that a TypeScript Agent SDK application is properly configured, follows SDK best practices and documentation recommendations, and is ready for deployment or testing. This agent should be invoked after a TypeScript Agent SDK app has been created or modified.
agentagentagent-sdk-devagent-sdk-verifier-ts
by Anthropic · anthropic-official-plugins
hookify/conversation-analyzer
Use this agent when analyzing conversation transcripts to find behaviors worth preventing with hooks. Examples: <example>Context: User is running /hookify command without arguments\nuser: "/hookify"\nassistant: "I'll analyze the conversation to find behaviors you want to prevent"\n<commentary>The /hookify command without arguments triggers conversation analysis to find unwanted behaviors.</commentary></example><example>Context: User wants to create hooks from recent frustrations\nuser: "Can you look back at this conversation and help me create hooks for the mistakes you made?"\nassistant: "I'll use the conversation-analyzer agent to identify the issues and suggest hooks."\n<commentary>User explicitly asks to analyze conversation for mistakes that should be prevented.</commentary></example>
agentagenthookifyconversation-analyzer
by Anthropic · anthropic-official-plugins