Claude Code Cost Optimization Cheatsheet

One-page quick reference. Print it, bookmark it, pin it. Every strategy links to a detailed guide.

Token Pricing At a Glance

Model	Input / 1M tokens	Output / 1M tokens	Cache Hit / 1M	Relative Cost
Opus 4.6	$5.00	$25.00	$0.50	1x (baseline)
Sonnet 4.6	$3.00	$15.00	$0.30	~1.7x cheaper
Haiku 4.5	$1.00	$5.00	$0.10	5x cheaper
Opus 4.6 (1M context)	$10.00 (2x)	$37.50 (1.5x)	$1.00	2x baseline
Sonnet 4.6 (1M context)	$6.00 (2x)	$22.50 (1.5x)	$0.60	~1.2x baseline
Opus 4.6 (Fast Mode)	$30.00 (6x)	$150.00 (6x)	N/A	6x baseline

Output tokens cost 5x more than input tokens across all models. Reducing Claude's verbosity is high-leverage.

1M context: Applies when input exceeds 200K tokens. ALL tokens are billed at the premium rate (not just those over 200K). Haiku 4.5 does not support 1M context.

Fast Mode: Opus 4.6 only (research preview). 6x standard rates but includes 1M context at no extra long-context charge. Not available with Batch API.

Plans: Pro $20/mo, Max 5x $100/mo, Max 20x $200/mo. Batch API: 50% discount. Cache write: 1.25x (5-min TTL), 2x (1-hour TTL).

Off-Peak 2x Usage: Anthropic periodically runs promotional events that double usage limits outside peak hours (typically 8 AM - 2 PM ET) and on all weekends. If you're outside the US, your entire workday likely falls in the 2x window. Watch the Anthropic blog for announcements.

CLI Cost Controls: --max-budget-usd <amount> caps spending per session. --fallback-model <model> auto-switches to a cheaper model when the primary is overloaded.

All Strategies - Ranked by Impact

Tier 1: High Impact (Do These First)

#	Strategy	Savings	Effort	Explanation	Guide
1	Use cheaper models for simple tasks	20-40%	1 min	Run `claude --model haiku` for formatting, simple fixes, file lookups, and boilerplate - Haiku handles ~70% of routine work at 1/5th the cost of Opus	Model Selection
2	Delegate work to subagents	20-40%	5 min	Subagent tool calls get their own isolated context; large file searches and multi-file reads happen outside your main conversation, keeping your primary context small	Workflow Patterns
3	Use Plan Mode before coding	15-25%	0 min	Press `Shift+Tab` to toggle Plan Mode - Claude thinks through the approach before writing code, preventing expensive trial-and-error cycles that waste output tokens	Workflow Patterns
4	Trim CLAUDE.md to under 4,000 characters	10-20%	15 min	Every line loads as input tokens on every turn. Content beyond 4,000 chars/file is silently truncated. Total budget across all instruction files: 12,000 chars. Cut ruthlessly	Context Optimization
5	Preserve prompt cache	10-25%	5 min	Cached input tokens cost 90% less; keep CLAUDE.md and system context stable between turns - avoid editing CLAUDE.md mid-session, and keep conversation flow linear	Understanding Costs

Tier 2: Medium Impact (Set Up Once)

#	Strategy	Savings	Effort	Explanation	Guide
6	Configure `.claudeignore`	5-15%	2 min	Prevent Claude from indexing `node_modules/`, `dist/`, `.git/`, lock files, and build artifacts - these add thousands of tokens when Claude searches your project	Context Optimization
7	Use `/compact` regularly	10-20%	0 min	Run `/compact` when conversation gets long (20+ turns) to summarize history and reset context window - prevents the exponential cost growth of long sessions	Context Optimization
8	Set budget caps	0%*	1 min	Use `claude --max-budget-usd 5` or configure in settings to prevent runaway sessions - does not save tokens directly but prevents surprise bills	Understanding Costs
9	Create custom slash commands	10-15%	10 min	Define reusable commands in `.claude/commands/` for repeated workflows - avoids re-explaining the same instructions across sessions, saving input tokens each time	Workflow Patterns
10	Use batch operations	15-30%	5 min	Group related changes into single prompts instead of one-at-a-time requests - "rename X in all 12 files" beats 12 individual "rename X in this file" turns	Workflow Patterns

Tier 3: Ongoing Habits (Compound Over Time)

#	Strategy	Savings	Effort	Explanation	Guide
11	Write concise prompts	5-10%	Ongoing	Be specific and direct - "Add null check to `processOrder` in `src/orders.ts` line 47" beats "Can you look at the orders file and maybe add some error handling?"	Context Optimization
12	Avoid reading entire large files	5-15%	Ongoing	Point Claude to specific line ranges or functions instead of letting it `Read` a 2000-line file - use "read lines 100-150 of X" or reference functions by name	Context Optimization
13	Start new sessions for new tasks	10-20%	Ongoing	Fresh sessions have minimal context; a 50-turn session carries all prior history as input - start clean when switching tasks to avoid paying for irrelevant context	Understanding Costs
14	Use memory files over inline repeats	5-10%	5 min	Put project conventions in CLAUDE.md once rather than repeating "use single quotes" or "always add tests" in every prompt - say it once, reference forever	Context Optimization
15	Monitor with `/usage`	Awareness	0 min	Run `/usage` periodically to see token consumption in your current session - knowing where tokens go is the first step to reducing them	Understanding Costs

Model Selection Quick Decision

Is the task...
├── Complex architecture, debugging, or multi-file refactor? → Opus 4.6
├── Standard feature work, code review, writing tests?      → Sonnet 4.6
├── Simple fix, formatting, boilerplate, file lookup?       → Haiku 4.5
└── Not sure?                                               → Start with Sonnet 4.6

Switch models mid-session: Type /model and select, or start with claude --model sonnet.

Note: Opus 4.6 is now priced at $5/$25 - the same price Sonnet used to be. The gap between models is smaller, so switching down to Haiku ($1/$5) provides a 5x savings, not 19x as it was historically.

Platform Comparison

Feature	Anthropic API	AWS Bedrock	Google Vertex AI	Claude Code
Standard pricing	Base rates	Same (global) / +10% (regional)	Same (global) / +10% (regional)	Included in plan
1M context	Yes (Opus, Sonnet)	Yes	Yes	Yes
Fast Mode	Yes (Opus only)	Check availability	Check availability	Yes (`/fast`)
Batch API (50% off)	Yes	Yes	Yes	N/A
Prompt caching	Yes	Yes	Yes	Automatic
Max context	1M	1M	1M	1M

Bedrock / Vertex: Same models, same capabilities. Global (cross-region) inference matches API pricing. Regional inference profiles add ~10%. Choose based on your cloud provider, compliance needs, and existing infrastructure.

Cost Formula

Turn Cost = (Input Tokens x Input Price) + (Output Tokens x Output Price)

Where Input Tokens =
    System Prompt (~3,500 tokens, fixed)
  + CLAUDE.md (~7 tokens/line x number of lines)
  + Conversation History (grows each turn)
  + Tool Results (file contents, search results, command output)
  + MCP Responses (if using MCP servers)

Session Cost = Sum of all turns
             - Prompt Cache Savings (up to 90% on repeated input)

Quick Copy-Paste Configs

Minimal `.claudeignore`

node_modules/
dist/
build/
.next/
coverage/
*.lock
package-lock.json
yarn.lock
pnpm-lock.yaml
*.min.js
*.min.css
*.map
.git/
*.pyc
__pycache__/
.env
.env.*
*.log

Budget-Conscious Launch Command

# Daily development with budget cap
claude --model sonnet --max-budget-usd 5

# Quick fixes with cheapest model
claude --model haiku --max-budget-usd 1

# Complex work with Opus but capped
claude --model opus --max-budget-usd 20

Cost-Saving CLAUDE.md Header

# Project: MyApp

Tech: TypeScript, React 19, Node 22, PostgreSQL
Style: ESLint + Prettier (run `npm run lint` before committing)
Tests: Vitest - run `npm test` for unit, `npm run e2e` for Playwright
Build: `npm run build` - must pass before PR

## Key Rules
- Prefer editing existing files over creating new ones
- Always add tests for new functions
- Use existing patterns from nearby files as reference

That is 10 lines. It gives Claude everything it needs. Every extra line costs you tokens on every turn.

Session Workflow for Minimum Cost

1. Start session       → claude --model sonnet --max-budget-usd 5
2. Complex problem?    → /model opus (switch up temporarily)
3. Plan first          → Shift+Tab to toggle Plan Mode
4. Be specific         → Reference exact files, line numbers, function names
5. Batch changes       → Group related edits into one prompt
6. Monitor             → /usage (check token consumption)
7. Getting long?       → /compact (summarize and reset context)
8. Simple task?        → /model haiku (switch down temporarily)
9. New topic?          → Start a fresh session
10. Done               → Check /usage - learn your patterns

Numbers Worth Memorizing

Fact	Number
Opus 4.6 output is ___ per 1M tokens	$25
Haiku 4.5 is ___ cheaper than Opus 4.6 on input	5x
Output tokens cost ___ more than input	5x
Prompt cache discount	90%
CLAUDE.md loads on every ___	turn
CLAUDE.md max size per file	4,000 characters (truncated beyond)
Total instruction file budget	12,000 characters (across all CLAUDE.md files)
1 line of code is roughly ___ tokens	~10
Token estimation rule of thumb	~1 token per 4 bytes of text
150-line CLAUDE.md per turn is roughly	~1,050 tokens
50-turn session CLAUDE.md cost (Sonnet 4.6)	~$0.16
50-turn session CLAUDE.md cost (Opus 4.6)	~$0.26
Average tool result size	500-5,000 tokens
Compaction trigger threshold	~10,000 tokens of compactable content
Messages preserved after /compact	4 most recent
Opus 4.6 max output per turn	32K tokens
Sonnet/Haiku max output per turn	64K tokens

Emergency Cost Reduction

Already spending too much? Do these right now:

Switch to Haiku for the rest of the session: /model haiku
Run /compact to shrink conversation history
Start a new session if context is bloated beyond recovery
Set a hard cap: claude --max-budget-usd 2 for the next session
Audit your CLAUDE.md - delete anything Claude does not need on every turn

Links

Resource	Link
Understanding Costs (deep dive)	guides/01-understanding-costs.md
Context Optimization	guides/02-context-optimization.md
Model Selection Guide	guides/03-model-selection.md
Workflow Patterns	guides/04-workflow-patterns.md
Team Budgeting	guides/05-team-budgeting.md
CLAUDE.md Templates	templates/CLAUDE.md/
Token Estimator Tool	tools/token-estimator/
Usage Analyzer Tool	tools/usage-analyzer/

This cheatsheet covers the strategies. For the reasoning and benchmarks behind each one, read the full guides.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Claude Code Cost Optimization Cheatsheet

Token Pricing At a Glance

All Strategies - Ranked by Impact

Tier 1: High Impact (Do These First)

Tier 2: Medium Impact (Set Up Once)

Tier 3: Ongoing Habits (Compound Over Time)

Model Selection Quick Decision

Platform Comparison

Cost Formula

Quick Copy-Paste Configs

Minimal `.claudeignore`

Budget-Conscious Launch Command

Cost-Saving CLAUDE.md Header

Session Workflow for Minimum Cost

Numbers Worth Memorizing

Emergency Cost Reduction

Links

FilesExpand file tree

cheatsheet.md

Latest commit

History

cheatsheet.md

File metadata and controls

Claude Code Cost Optimization Cheatsheet

Token Pricing At a Glance

All Strategies - Ranked by Impact

Tier 1: High Impact (Do These First)

Tier 2: Medium Impact (Set Up Once)

Tier 3: Ongoing Habits (Compound Over Time)

Model Selection Quick Decision

Platform Comparison

Cost Formula

Quick Copy-Paste Configs

Minimal .claudeignore

Budget-Conscious Launch Command

Cost-Saving CLAUDE.md Header

Session Workflow for Minimum Cost

Numbers Worth Memorizing

Emergency Cost Reduction

Links

Minimal `.claudeignore`