agent-eval

Head-to-head comparison of coding agents (Claude Code, Aider, Codex, etc.) on custom tasks with pass rate, cost, time, and consistency metrics

Content Preview
---
name: agent-eval
description: Head-to-head comparison of coding agents (Claude Code, Aider, Codex, etc.) on custom tasks with pass rate, cost, time, and consistency metrics
origin: ECC
tools: Read, Write, Edit, Bash, Grep, Glob
---

# Agent Eval Skill

A lightweight CLI tool for comparing coding agents head-to-head on reproducible tasks. Every "which coding agent is best?" comparison runs on vibes — this tool systematizes it.

## When to Activate

- Comparing coding agents (Claude Code, Aide
How to Use

Recommended: Install to project (local)

mkdir -p .claude/skills
curl -o .claude/skills/agent-eval.md \
  https://raw.githubusercontent.com/affaan-m/everything-claude-code/main/skills/agent-eval/SKILL.md

Skill is scoped to this project only. Add .claude/skills/ to your .gitignoreif you don't want to commit it.

Alternative: Clone full repo

git clone https://github.com/affaan-m/everything-claude-code

Then reference at skills/agent-eval/SKILL.md

Related Skills

agent-evaluation
You're a quality engineer who has seen agents that aced benchmarks fail spectacularly in production. You've learned that evaluating LLM agents is fundamentally different from testing traditional software—the same input can produce different outputs, and "correct" often has no single answer.
data-aiagentevaluation

by sickn33 (Antigravity) · antigravity-awesome-skills

agent-sdk-dev/agent-sdk-verifier-py
Use this agent to verify that a Python Agent SDK application is properly configured, follows SDK best practices and documentation recommendations, and is ready for deployment or testing. This agent should be invoked after a Python Agent SDK app has been created or modified.
agentagentagent-sdk-devagent-sdk-verifier-py

by Anthropic · anthropic-official-plugins

agent-sdk-dev/agent-sdk-verifier-ts
Use this agent to verify that a TypeScript Agent SDK application is properly configured, follows SDK best practices and documentation recommendations, and is ready for deployment or testing. This agent should be invoked after a TypeScript Agent SDK app has been created or modified.
agentagentagent-sdk-devagent-sdk-verifier-ts

by Anthropic · anthropic-official-plugins

hookify/conversation-analyzer
Use this agent when analyzing conversation transcripts to find behaviors worth preventing with hooks. Examples: <example>Context: User is running /hookify command without arguments\nuser: "/hookify"\nassistant: "I'll analyze the conversation to find behaviors you want to prevent"\n<commentary>The /hookify command without arguments triggers conversation analysis to find unwanted behaviors.</commentary></example><example>Context: User wants to create hooks from recent frustrations\nuser: "Can you look back at this conversation and help me create hooks for the mistakes you made?"\nassistant: "I'll use the conversation-analyzer agent to identify the issues and suggest hooks."\n<commentary>User explicitly asks to analyze conversation for mistakes that should be prevented.</commentary></example>
agentagenthookifyconversation-analyzer

by Anthropic · anthropic-official-plugins