Opus 4.6 vs Sonnet 4.6 vs Haiku 4.5 across five common task types. Find the best cost-to-quality ratio for each kind of work.
| Model | Input (per 1M tokens) | Output (per 1M tokens) | Relative Cost |
|---|---|---|---|
| Opus 4.6 | $5.00 | $25.00 | 1x (baseline) |
| Sonnet 4.6 | $3.00 | $15.00 | ~1.7x cheaper |
| Haiku 4.5 | $1.00 | $5.00 | 5x cheaper |
Each task type was performed with all three models under the same conditions:
- Same codebase (React + TypeScript, ~50 files)
- Same CLAUDE.md (Standard template, ~100 lines, ~700 tokens)
- Same prompt wording
- Quality rated 1-5 (5 = perfect, no follow-ups needed; 1 = unusable or required complete redo)
Recommendation key:
- Best Value — the model with the best quality-per-dollar ratio for this task type
- Best Quality — the model that produces the best output regardless of cost
- Avoid — the model is not a good fit for this task type
Task: Rename the variable userData to userProfile across 8 files, update all imports, and fix a minor indentation inconsistency in 3 files.
| Metric | Opus 4.6 | Sonnet 4.6 | Haiku 4.5 |
|---|---|---|---|
| Turns to complete | 2 | 2 | 2 |
| Input tokens (total) | 24,200 | 24,200 | 24,200 |
| Output tokens (total) | 3,800 | 3,900 | 4,100 |
| Estimated cost | $0.22 | $0.13 | $0.05 |
| Quality (1-5) | 5 | 5 | 5 |
| Follow-ups needed | 0 | 0 | 0 |
All three models handle mechanical refactoring equally well. Token counts are nearly identical because the task is straightforward — there is no complex reasoning to differentiate the models.
| Recommendation | |
|---|---|
| Best Value | Haiku 4.5 — 5x cheaper than Opus, identical quality |
| Best Quality | Tie — all models score 5/5 |
| Avoid | Opus 4.6 — paying $0.22 for a task Haiku does for $0.05 is wasteful |
Rule of thumb: If a task can be described as "find X, replace with Y" or "apply this mechanical change," Haiku is always the right choice.
Task: Create a new DataTable component with sortable columns, pagination, loading skeleton, empty state, and TypeScript generics for row data. Include CSS modules and prop documentation.
| Metric | Opus 4.6 | Sonnet 4.6 | Haiku 4.5 |
|---|---|---|---|
| Turns to complete | 3 | 4 | 7 |
| Input tokens (total) | 48,600 | 52,100 | 71,400 |
| Output tokens (total) | 12,800 | 11,400 | 14,200 |
| Estimated cost | $0.56 | $0.33 | $0.14 |
| Quality (1-5) | 5 | 4 | 3 |
| Follow-ups needed | 0 | 1 (minor type fix) | 3 (generic types wrong, pagination logic off, missing loading state) |
This task involves moderate complexity — TypeScript generics, multiple interactive states, and integration with project patterns. The models diverge clearly.
- Opus gets generics right on the first try and produces well-structured code with thoughtful edge case handling. Zero follow-ups.
- Sonnet produces good code but makes a minor TypeScript error in the generic constraint (
extends objectinstead of the project'sextends Record<string, unknown>convention). One quick follow-up. - Haiku struggles with TypeScript generics, produces a simpler pagination implementation that does not handle edge cases (page count of 0, single-page data sets), and omits the loading skeleton. Three follow-up turns inflate its total token count, eroding some of the per-token cost advantage.
| Recommendation | |
|---|---|
| Best Value | Sonnet 4.6 — 1.7x cheaper than Opus, nearly equal quality (4/5), one minor fix |
| Best Quality | Opus 4.6 — perfect output, zero iterations needed |
| Avoid | Haiku 4.5 for complex components — follow-up turns erode cost savings and quality suffers |
Rule of thumb: For components with complex types, state logic, or multiple interaction patterns, Sonnet is the sweet spot. Use Opus when the cost of getting it wrong (rework time) exceeds the $0.23 premium.
Task: Users report that the dashboard loads slowly after login. The developer suspects an N+1 query or unnecessary re-renders. Investigate, identify the root cause, and propose a fix.
| Metric | Opus 4.6 | Sonnet 4.6 | Haiku 4.5 |
|---|---|---|---|
| Turns to complete | 4 | 6 | 9 |
| Input tokens (total) | 68,200 | 86,400 | 112,500 |
| Output tokens (total) | 9,600 | 11,800 | 15,200 |
| Estimated cost | $0.58 | $0.44 | $0.19 |
| Quality (1-5) | 5 | 4 | 2 |
| Follow-ups needed | 0 | 1 (missed secondary cause) | 4 (chased wrong hypothesis twice) |
Bug investigation is where model intelligence matters most. The task requires reading code, forming hypotheses, and narrowing down from multiple potential causes.
- Opus reads the dashboard component, the data fetching hook, and the API route, then correctly identifies both causes: (1) an N+1 query in the API route where user preferences are fetched per-item, and (2) an unnecessary
useEffectre-render triggered by an unstable object reference. Four focused turns, no wasted investigation. - Sonnet identifies the N+1 query but misses the re-render issue initially. After the developer confirms the N+1 fix helps but does not fully resolve the slowness, Sonnet investigates further and finds the re-render problem. Good result, slightly more turns.
- Haiku fixates on the wrong hypothesis twice — first suspects a missing
useMemo, then suspects a slow CSS animation — before the developer manually steers it toward the data fetching layer. Even then, it identifies only the N+1 query and misses the re-render issue entirely. Low quality output for a debugging task.
| Recommendation | |
|---|---|
| Best Value | Sonnet 4.6 — finds the primary issue quickly, reasonable cost, acceptable that it needs a nudge for the secondary issue |
| Best Quality | Opus 4.6 — finds both causes in a single investigation pass |
| Avoid | Haiku 4.5 for debugging — low accuracy means more developer time spent steering, which defeats the purpose |
Rule of thumb: For bug investigation, model intelligence directly translates to fewer turns and better hypotheses. Opus pays for itself if the bug is complex. Sonnet is good for straightforward bugs. Haiku is only suitable for "the error message tells you exactly what's wrong" situations.
Task: Design the data model and API structure for a new "Teams" feature — team creation, member management, role-based permissions, and invitation flow. Output a plan with entity schemas, API endpoints, and migration strategy.
| Metric | Opus 4.6 | Sonnet 4.6 | Haiku 4.5 |
|---|---|---|---|
| Turns to complete | 3 | 4 | 6 |
| Input tokens (total) | 52,400 | 58,100 | 74,800 |
| Output tokens (total) | 14,800 | 12,600 | 10,200 |
| Estimated cost | $0.63 | $0.36 | $0.13 |
| Quality (1-5) | 5 | 4 | 2 |
| Follow-ups needed | 0 | 1 (edge case in permission model) | 3 (missing invitation flow, weak permission model, no migration strategy) |
Architecture planning requires the model to reason about system design, anticipate edge cases, and produce a coherent plan that holds together across multiple concerns.
- Opus produces a comprehensive plan covering: entity schemas with proper foreign keys and indexes, RESTful API endpoints with authentication middleware, a role-permission matrix with inheritance, the full invitation flow (create, send email, accept, expire), and a 3-step migration strategy that handles existing users. The plan addresses edge cases like "what happens if a user is invited to a team they already belong to" and "how to handle the last admin leaving a team."
- Sonnet covers all major areas but uses a flat permission model (role string) instead of a flexible permission matrix. After one follow-up asking about granular permissions, it revises to a proper RBAC design. Migration strategy is solid. Missing the "last admin" edge case.
- Haiku provides a minimal entity schema and basic CRUD endpoints. Missing: invitation flow (just says "add invitation endpoint"), no permission inheritance, no migration strategy, no edge case handling. Requires significant developer augmentation to be usable.
| Recommendation | |
|---|---|
| Best Value | Sonnet 4.6 — covers 90% of what Opus produces at 57% of the cost |
| Best Quality | Opus 4.6 — the plan is production-ready without modification |
| Avoid | Haiku 4.5 — output requires so much developer augmentation that it saves neither time nor money |
Rule of thumb: For architecture and design tasks, Opus produces plans you can hand directly to a developer (or to Claude for implementation). Sonnet produces good first drafts that need minor refinement. Haiku produces outlines, not plans.
Task: Review a 200-line pull request that adds a caching layer to the API client. Check for correctness, performance issues, security concerns, test coverage, and adherence to project patterns.
| Metric | Opus 4.6 | Sonnet 4.6 | Haiku 4.5 |
|---|---|---|---|
| Turns to complete | 2 | 2 | 3 |
| Input tokens (total) | 36,800 | 36,800 | 42,100 |
| Output tokens (total) | 6,200 | 5,400 | 4,800 |
| Estimated cost | $0.34 | $0.19 | $0.07 |
| Quality (1-5) | 5 | 4 | 3 |
| Issues found | 7 (3 critical, 2 moderate, 2 minor) | 5 (2 critical, 2 moderate, 1 minor) | 3 (1 critical, 1 moderate, 1 minor) |
| Finding | Severity | Opus 4.6 | Sonnet 4.6 | Haiku 4.5 |
|---|---|---|---|---|
| Cache key collision risk (objects with same JSON but different key order) | Critical | Found | Found | Missed |
| Missing cache invalidation on write operations | Critical | Found | Found | Found |
| No TTL expiration — stale data served indefinitely | Critical | Found | Missed | Missed |
| Cache size unbounded — potential memory leak | Moderate | Found | Found | Found |
| Error in cache miss path — swallows network errors | Moderate | Found | Found | Missed |
Inconsistent naming (cacheData vs cachedData) |
Minor | Found | Missed | Missed |
| Missing JSDoc on public API methods | Minor | Found | Found | Found |
- Opus catches all 7 issues including the subtle TTL problem and the cache key collision risk (which requires understanding that
JSON.stringify({a:1, b:2})andJSON.stringify({b:2, a:1})produce different strings). High-quality review that a senior developer would produce. - Sonnet catches 5 of 7 issues. Misses the TTL concern and the naming inconsistency. Still a strong review that catches the most impactful problems.
- Haiku catches only 3 issues and misses both critical cache correctness problems (key collision and TTL). The review reads more like a surface-level scan — catches obvious issues but misses subtle logic bugs.
| Recommendation | |
|---|---|
| Best Value | Sonnet 4.6 — catches the high-impact issues at 56% of Opus's cost |
| Best Quality | Opus 4.6 — catches subtle correctness issues that could become production bugs |
| Avoid | Haiku 4.5 for security/correctness review — missing critical issues makes it counterproductive |
Rule of thumb: Use Opus for code reviews involving security, caching, concurrency, or financial logic. Use Sonnet for general code review. Haiku can review formatting and style compliance, but not logic.
| Task Type | Haiku 4.5 | Sonnet 4.6 | Opus 4.6 | Recommendation |
|---|---|---|---|---|
| Formatting / renaming | 5/5 — $0.05 | 5/5 — $0.13 | 5/5 — $0.22 | Use Haiku |
| Component creation | 3/5 — $0.14 | 4/5 — $0.33 | 5/5 — $0.56 | Use Sonnet (Opus for complex types) |
| Bug investigation | 2/5 — $0.19 | 4/5 — $0.44 | 5/5 — $0.58 | Use Sonnet (Opus for hard bugs) |
| Architecture planning | 2/5 — $0.13 | 4/5 — $0.36 | 5/5 — $0.63 | Use Sonnet (Opus for critical design) |
| Code review | 3/5 — $0.07 | 4/5 — $0.19 | 5/5 — $0.34 | Use Sonnet (Opus for security review) |
Another way to look at it — quality points per dollar:
| Task Type | Haiku (quality/$) | Sonnet (quality/$) | Opus (quality/$) |
|---|---|---|---|
| Formatting / renaming | 100.0 | 38.5 | 22.7 |
| Component creation | 21.4 | 12.1 | 8.9 |
| Bug investigation | 10.5 | 9.1 | 8.6 |
| Architecture planning | 15.4 | 11.1 | 7.9 |
| Code review | 42.9 | 21.1 | 14.7 |
Higher is better. Haiku dominates for simple tasks. Sonnet wins for everything else on a quality-per-dollar basis. Opus is now much closer in cost to Sonnet, making it a more viable choice when quality matters — the cost premium over Sonnet is only ~1.7x (down from 5x previously).
Is the task mechanical (rename, format, simple find-and-replace)?
├── Yes → Use Haiku 4.5
└── No
├── Does the task involve security, financial logic, or complex architecture?
│ ├── Yes → Use Opus 4.6
│ └── No
│ ├── Is the task straightforward with clear requirements?
│ │ ├── Yes → Use Haiku 4.5
│ │ └── No → Use Sonnet 4.6
│ └── Are you on a tight budget?
│ ├── Yes → Use Sonnet 4.6 (with plan mode to reduce iterations)
│ └── No → Use Opus 4.6 (now only ~1.7x more than Sonnet)
In Claude Code, switch models per-task:
# Start a session with Haiku for quick tasks
claude --model haiku
# Switch mid-session for a harder task (use /model command)
/model sonnet
# Or start separate sessions per model
claude --model opus # for architecture planning
claude --model haiku # for formatting cleanup afterwardAssuming 80 tasks per month with the distribution: 30% simple, 25% component/feature, 20% bugs, 15% planning, 10% review:
| Strategy | Monthly Cost | Quality |
|---|---|---|
| Always Opus | $36.00 | Excellent |
| Always Sonnet | $22.60 | Good |
| Always Haiku | $9.16 | Poor (for complex tasks) |
| Smart routing (Haiku for simple, Sonnet for medium, Opus for critical) | $23.90 | Good to Excellent |
Smart model routing saves 34% versus always-Opus while maintaining high quality where it matters. Note that with current pricing, Opus is only ~1.7x more expensive than Sonnet, so the cost gap between strategies is narrower than it used to be. The main benefit of smart routing is using Haiku for simple tasks where all models perform equally.