One-page quick reference. Print it, bookmark it, pin it. Every strategy links to a detailed guide.
| Model | Input / 1M tokens | Output / 1M tokens | Cache Hit / 1M | Relative Cost |
|---|---|---|---|---|
| Opus 4.6 | $5.00 | $25.00 | $0.50 | 1x (baseline) |
| Sonnet 4.6 | $3.00 | $15.00 | $0.30 | ~1.7x cheaper |
| Haiku 4.5 | $1.00 | $5.00 | $0.10 | 5x cheaper |
| Opus 4.6 (1M context) | $10.00 (2x) | $37.50 (1.5x) | $1.00 | 2x baseline |
| Sonnet 4.6 (1M context) | $6.00 (2x) | $22.50 (1.5x) | $0.60 | ~1.2x baseline |
| Opus 4.6 (Fast Mode) | $30.00 (6x) | $150.00 (6x) | N/A | 6x baseline |
Output tokens cost 5x more than input tokens across all models. Reducing Claude's verbosity is high-leverage.
1M context: Applies when input exceeds 200K tokens. ALL tokens are billed at the premium rate (not just those over 200K). Haiku 4.5 does not support 1M context.
Fast Mode: Opus 4.6 only (research preview). 6x standard rates but includes 1M context at no extra long-context charge. Not available with Batch API.
Plans: Pro $20/mo, Max 5x $100/mo, Max 20x $200/mo. Batch API: 50% discount. Cache write: 1.25x (5-min TTL), 2x (1-hour TTL).
Off-Peak 2x Usage: Anthropic periodically runs promotional events that double usage limits outside peak hours (typically 8 AM - 2 PM ET) and on all weekends. If you're outside the US, your entire workday likely falls in the 2x window. Watch the Anthropic blog for announcements.
CLI Cost Controls:
--max-budget-usd <amount>caps spending per session.--fallback-model <model>auto-switches to a cheaper model when the primary is overloaded.
| # | Strategy | Savings | Effort | Explanation | Guide |
|---|---|---|---|---|---|
| 1 | Use cheaper models for simple tasks | 20-40% | 1 min | Run claude --model haiku for formatting, simple fixes, file lookups, and boilerplate - Haiku handles ~70% of routine work at 1/5th the cost of Opus |
Model Selection |
| 2 | Delegate work to subagents | 20-40% | 5 min | Subagent tool calls get their own isolated context; large file searches and multi-file reads happen outside your main conversation, keeping your primary context small | Workflow Patterns |
| 3 | Use Plan Mode before coding | 15-25% | 0 min | Press Shift+Tab to toggle Plan Mode - Claude thinks through the approach before writing code, preventing expensive trial-and-error cycles that waste output tokens |
Workflow Patterns |
| 4 | Trim CLAUDE.md to under 4,000 characters | 10-20% | 15 min | Every line loads as input tokens on every turn. Content beyond 4,000 chars/file is silently truncated. Total budget across all instruction files: 12,000 chars. Cut ruthlessly | Context Optimization |
| 5 | Preserve prompt cache | 10-25% | 5 min | Cached input tokens cost 90% less; keep CLAUDE.md and system context stable between turns - avoid editing CLAUDE.md mid-session, and keep conversation flow linear | Understanding Costs |
| # | Strategy | Savings | Effort | Explanation | Guide |
|---|---|---|---|---|---|
| 6 | Configure .claudeignore |
5-15% | 2 min | Prevent Claude from indexing node_modules/, dist/, .git/, lock files, and build artifacts - these add thousands of tokens when Claude searches your project |
Context Optimization |
| 7 | Use /compact regularly |
10-20% | 0 min | Run /compact when conversation gets long (20+ turns) to summarize history and reset context window - prevents the exponential cost growth of long sessions |
Context Optimization |
| 8 | Set budget caps | 0%* | 1 min | Use claude --max-budget-usd 5 or configure in settings to prevent runaway sessions - does not save tokens directly but prevents surprise bills |
Understanding Costs |
| 9 | Create custom slash commands | 10-15% | 10 min | Define reusable commands in .claude/commands/ for repeated workflows - avoids re-explaining the same instructions across sessions, saving input tokens each time |
Workflow Patterns |
| 10 | Use batch operations | 15-30% | 5 min | Group related changes into single prompts instead of one-at-a-time requests - "rename X in all 12 files" beats 12 individual "rename X in this file" turns | Workflow Patterns |
| # | Strategy | Savings | Effort | Explanation | Guide |
|---|---|---|---|---|---|
| 11 | Write concise prompts | 5-10% | Ongoing | Be specific and direct - "Add null check to processOrder in src/orders.ts line 47" beats "Can you look at the orders file and maybe add some error handling?" |
Context Optimization |
| 12 | Avoid reading entire large files | 5-15% | Ongoing | Point Claude to specific line ranges or functions instead of letting it Read a 2000-line file - use "read lines 100-150 of X" or reference functions by name |
Context Optimization |
| 13 | Start new sessions for new tasks | 10-20% | Ongoing | Fresh sessions have minimal context; a 50-turn session carries all prior history as input - start clean when switching tasks to avoid paying for irrelevant context | Understanding Costs |
| 14 | Use memory files over inline repeats | 5-10% | 5 min | Put project conventions in CLAUDE.md once rather than repeating "use single quotes" or "always add tests" in every prompt - say it once, reference forever | Context Optimization |
| 15 | Monitor with /usage |
Awareness | 0 min | Run /usage periodically to see token consumption in your current session - knowing where tokens go is the first step to reducing them |
Understanding Costs |
Is the task...
├── Complex architecture, debugging, or multi-file refactor? → Opus 4.6
├── Standard feature work, code review, writing tests? → Sonnet 4.6
├── Simple fix, formatting, boilerplate, file lookup? → Haiku 4.5
└── Not sure? → Start with Sonnet 4.6
Switch models mid-session: Type /model and select, or start with claude --model sonnet.
Note: Opus 4.6 is now priced at $5/$25 - the same price Sonnet used to be. The gap between models is smaller, so switching down to Haiku ($1/$5) provides a 5x savings, not 19x as it was historically.
| Feature | Anthropic API | AWS Bedrock | Google Vertex AI | Claude Code |
|---|---|---|---|---|
| Standard pricing | Base rates | Same (global) / +10% (regional) | Same (global) / +10% (regional) | Included in plan |
| 1M context | Yes (Opus, Sonnet) | Yes | Yes | Yes |
| Fast Mode | Yes (Opus only) | Check availability | Check availability | Yes (/fast) |
| Batch API (50% off) | Yes | Yes | Yes | N/A |
| Prompt caching | Yes | Yes | Yes | Automatic |
| Max context | 1M | 1M | 1M | 1M |
Bedrock / Vertex: Same models, same capabilities. Global (cross-region) inference matches API pricing. Regional inference profiles add ~10%. Choose based on your cloud provider, compliance needs, and existing infrastructure.
Turn Cost = (Input Tokens x Input Price) + (Output Tokens x Output Price)
Where Input Tokens =
System Prompt (~3,500 tokens, fixed)
+ CLAUDE.md (~7 tokens/line x number of lines)
+ Conversation History (grows each turn)
+ Tool Results (file contents, search results, command output)
+ MCP Responses (if using MCP servers)
Session Cost = Sum of all turns
- Prompt Cache Savings (up to 90% on repeated input)
node_modules/
dist/
build/
.next/
coverage/
*.lock
package-lock.json
yarn.lock
pnpm-lock.yaml
*.min.js
*.min.css
*.map
.git/
*.pyc
__pycache__/
.env
.env.*
*.log
# Daily development with budget cap
claude --model sonnet --max-budget-usd 5
# Quick fixes with cheapest model
claude --model haiku --max-budget-usd 1
# Complex work with Opus but capped
claude --model opus --max-budget-usd 20# Project: MyApp
Tech: TypeScript, React 19, Node 22, PostgreSQL
Style: ESLint + Prettier (run `npm run lint` before committing)
Tests: Vitest - run `npm test` for unit, `npm run e2e` for Playwright
Build: `npm run build` - must pass before PR
## Key Rules
- Prefer editing existing files over creating new ones
- Always add tests for new functions
- Use existing patterns from nearby files as referenceThat is 10 lines. It gives Claude everything it needs. Every extra line costs you tokens on every turn.
1. Start session → claude --model sonnet --max-budget-usd 5
2. Complex problem? → /model opus (switch up temporarily)
3. Plan first → Shift+Tab to toggle Plan Mode
4. Be specific → Reference exact files, line numbers, function names
5. Batch changes → Group related edits into one prompt
6. Monitor → /usage (check token consumption)
7. Getting long? → /compact (summarize and reset context)
8. Simple task? → /model haiku (switch down temporarily)
9. New topic? → Start a fresh session
10. Done → Check /usage - learn your patterns
| Fact | Number |
|---|---|
| Opus 4.6 output is ___ per 1M tokens | $25 |
| Haiku 4.5 is ___ cheaper than Opus 4.6 on input | 5x |
| Output tokens cost ___ more than input | 5x |
| Prompt cache discount | 90% |
| CLAUDE.md loads on every ___ | turn |
| CLAUDE.md max size per file | 4,000 characters (truncated beyond) |
| Total instruction file budget | 12,000 characters (across all CLAUDE.md files) |
| 1 line of code is roughly ___ tokens | ~10 |
| Token estimation rule of thumb | ~1 token per 4 bytes of text |
| 150-line CLAUDE.md per turn is roughly | ~1,050 tokens |
| 50-turn session CLAUDE.md cost (Sonnet 4.6) | ~$0.16 |
| 50-turn session CLAUDE.md cost (Opus 4.6) | ~$0.26 |
| Average tool result size | 500-5,000 tokens |
| Compaction trigger threshold | ~10,000 tokens of compactable content |
| Messages preserved after /compact | 4 most recent |
| Opus 4.6 max output per turn | 32K tokens |
| Sonnet/Haiku max output per turn | 64K tokens |
Already spending too much? Do these right now:
- Switch to Haiku for the rest of the session:
/model haiku - Run
/compactto shrink conversation history - Start a new session if context is bloated beyond recovery
- Set a hard cap:
claude --max-budget-usd 2for the next session - Audit your CLAUDE.md - delete anything Claude does not need on every turn
| Resource | Link |
|---|---|
| Understanding Costs (deep dive) | guides/01-understanding-costs.md |
| Context Optimization | guides/02-context-optimization.md |
| Model Selection Guide | guides/03-model-selection.md |
| Workflow Patterns | guides/04-workflow-patterns.md |
| Team Budgeting | guides/05-team-budgeting.md |
| CLAUDE.md Templates | templates/CLAUDE.md/ |
| Token Estimator Tool | tools/token-estimator/ |
| Usage Analyzer Tool | tools/usage-analyzer/ |
This cheatsheet covers the strategies. For the reasoning and benchmarks behind each one, read the full guides.