evaluating-machine-learning-models
Build this skill allows AI assistant to evaluate machine learning models using a comprehensive suite of metrics. it should be used when the user requests model performance analysis, validation, or testing. AI assistant can use this skill to assess model accuracy, p... Use when appropriate context de
Content Preview
--- name: evaluating-machine-learning-models description: | Build this skill allows AI assistant to evaluate machine learning models using a comprehensive suite of metrics. it should be used when the user requests model performance analysis, validation, or testing. AI assistant can use this skill to assess model accuracy, p... Use when appropriate context detected. Trigger with relevant phrases based on skill purpose. allowed-tools: Read, Write, Edit, Grep, Glob, Bash(cmd:*) version: 1.0.0 aut
How to Use
Recommended: Install to project (local)
mkdir -p .claude/skills
curl -o .claude/skills/evaluating-machine-learning-models.md \
https://raw.githubusercontent.com/jeremylongshore/claude-code-plugins-plus-skills/main/plugins/ai-ml/model-evaluation-suite/skills/evaluating-machine-learning-models/SKILL.mdSkill is scoped to this project only. Add .claude/skills/ to your .gitignoreif you don't want to commit it.
Alternative: Clone full repo
git clone https://github.com/jeremylongshore/claude-code-plugins-plus-skillsThen reference at plugins/ai-ml/model-evaluation-suite/skills/evaluating-machine-learning-models/SKILL.md
Related Skills
Evaluating Machine Learning Models
This skill allows Claude to evaluate machine learning models using a comprehensive suite of metrics. It should be used when the user requests model performance analysis, validation, or testing. Claude can use this skill to assess model accuracy, precision, recall, F1-score, and other relevant metric
skill-adapterevaluating machine learning models
by jeremylongshore · plugins-plus-skills
eval-harness
Formal evaluation framework for Claude Code sessions implementing eval-driven development (EDD) principles
eval-harnessevalharness
by affaan-m · everything-claude-code
agent-eval
Head-to-head comparison of coding agents (Claude Code, Aider, Codex, etc.) on custom tasks with pass rate, cost, time, and consistency metrics
agent-evalagenteval
by affaan-m · everything-claude-code
agent-evaluation
You're a quality engineer who has seen agents that aced benchmarks fail spectacularly in production. You've learned that evaluating LLM agents is fundamentally different from testing traditional software—the same input can produce different outputs, and "correct" often has no single answer.
data-aiagentevaluation
by sickn33 (Antigravity) · antigravity-awesome-skills