llm_evaluation_frameworks
Concrete metrics, scoring methods, comparison tables, and A/B testing frameworks.
Content Preview
# LLM Evaluation Frameworks Concrete metrics, scoring methods, comparison tables, and A/B testing frameworks. ## Frameworks Index 1. [Evaluation Metrics Overview](#1-evaluation-metrics-overview) 2. [Text Generation Metrics](#2-text-generation-metrics) 3. [RAG-Specific Metrics](#3-rag-specific-metrics) 4. [Human Evaluation Frameworks](#4-human-evaluation-frameworks) 5. [A/B Testing for Prompts](#5-ab-testing-for-prompts) 6. [Benchmark Datasets](#6-benchmark-datasets) 7. [Evaluation Pipeline De
How to Use
Recommended: Install to project (local)
mkdir -p .claude/skills
curl -o .claude/skills/llm_evaluation_frameworks.md \
https://raw.githubusercontent.com/alirezarezvani/claude-skills/main/engineering-team/senior-prompt-engineer/references/llm_evaluation_frameworks.mdSkill is scoped to this project only. Add .claude/skills/ to your .gitignoreif you don't want to commit it.
Alternative: Clone full repo
git clone https://github.com/alirezarezvani/claude-skillsThen reference at engineering-team/senior-prompt-engineer/references/llm_evaluation_frameworks.md
Related Skills
prompt-engineer
A master prompt engineer who architects and optimizes sophisticated LLM interactions. Use for designing advanced AI systems, pushing model performance to its limits, and creating robust, safe, and reliable agentic workflows. Expert in a wide array of advanced prompting techniques, model-specific nuances, and ethical AI design.
agentspromptengineerllm
by qdhenry · claude-command-suite
cs-senior-engineer
Senior Engineer agent for architecture decisions, code review, DevOps, and API design. Orchestrates engineering and engineering-team skills for technical implementation work. Spawn when users need system design, code quality review, CI/CD pipeline setup, or infrastructure decisions.
agentsseniorengineeragent
by alirezarezvani · alirezarezvani-claude-skills
cs-engineering-lead
Engineering Team Lead agent for coordinating QA, security, data engineering, ML, and frontend/backend teams. Orchestrates engineering-team skills for team-level technical decisions. Spawn when users need team coordination, tech stack evaluation, incident response, or cross-functional engineering work.
agentsengineeringleadagent
by alirezarezvani · alirezarezvani-claude-skills
feature_engineering_patterns
World-class feature engineering patterns for senior data scientist.
engineering-teamfeatureengineeringpatterns
by alirezarezvani · alirezarezvani-claude-skills