eval-advisory

eval-advisor is a skill for advising, brainstorming, designing, developing, reviewing, and improving AI evaluation systems for LLM applications.

It guides teams through practical evaluation workflows:

Running error analysis before writing evals
Choosing the right evaluator type (code assertions, LLM-as-judge, guardrails)
Validating LLM judges with human labels using TPR/TNR
Sampling and analyzing traces effectively
Generating structured synthetic test data when production traces are limited
Avoiding common eval anti-patterns (generic metrics, Likert scales, unvalidated judges, 100% pass-rate suites)

Repository Structure

eval-advisor/SKILL.md: Main skill instructions and trigger guidance
EVAL_MASTER.md: Canonical 12-workflow routing and file-loading index
eval-advisor/references/: Deep-dive reference docs (error analysis, evaluator types, judge validation, sampling, synthetic data, anti-patterns)
eval-advisor/workflows/: Actionable checklists and decision trees
eval-advisor/templates/: Reusable templates for failure taxonomies, judge prompts, and synthetic data prompts

What This Skill Is For

Use this skill when designing new evals, auditing existing eval suites, selecting evaluation strategies, or diagnosing quality failures in AI systems.

The core philosophy is:

Look at real failures first (error analysis)
Use application-specific, binary pass/fail criteria
Prefer the cheapest reliable evaluator
Validate any LLM judge rigorously before relying on it

Source and Attribution

This skill was created from public material released by Hamel Husain.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
eval-advisor		eval-advisor
.gitignore		.gitignore
AGENTS.md		AGENTS.md
EVAL_MASTER.md		EVAL_MASTER.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

eval-advisory

Repository Structure

What This Skill Is For

Source and Attribution

About

Uh oh!

Releases

Contributors

Uh oh!

License

sonomirco/eval-advisory

Folders and files

Latest commit

History

Repository files navigation

eval-advisory

Repository Structure

What This Skill Is For

Source and Attribution

About

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases

Contributors

Uh oh!