Cost Reduction

Token Optimization Guide

Heavy Claude Code usage can get expensive fast. These strategies reduce token spend by 40–60% without degrading output quality.

Why Token Optimization Matters

Claude charges per token — input and output. A naive agent setup loads every file, every agent definition, and every memory entry into every session. On a busy day that can mean millions of tokens across dozens of tasks.

40–60%

Typical cost reduction from optimization

10x

Cost difference between Haiku and Opus

85%

Of tokens are often wasted context

The core insight: Most tokens in a Claude Code session are input tokens (files you read, memory you load, prompts you send). Output tokens are usually a fraction. Optimizing input context is where the biggest savings come from.

Optimization Strategies

Context Pruning

Up to 40%

Remove stale context before it accumulates. Run /sync mid-session to compress memory. Remove files from context that are no longer relevant.

Tip: Use /safe-clear to flush context and reload only what is needed for the current task.

Selective File Reading

Up to 30%

Read only the specific functions or sections you need, not entire files. Prefer grep and targeted reads over full file dumps.

Tip: Ask Claude to read lines 40-80 of a file rather than the whole thing.

Compact Prompts

Up to 20%

Eliminate filler phrases. "Please could you kindly help me with..." → "Fix:". Task descriptions should be dense, not polite.

Tip: Commands in .claude/commands/ are read every session — keep them under 200 words each.

Model Routing

Up to 60%

Use cheaper, faster models for simple tasks (formatting, linting, summarising) and reserve the full model for reasoning-heavy work.

Tip: Haiku for retrieval tasks, Sonnet for coding, Opus for architecture decisions.

Model Routing Reference

Not every task needs the most capable (and expensive) model. Routing tasks to the right model is the single highest-leverage optimization.

Model	Best for	Relative cost
`claude-haiku-3-5`	Formatting, summarising, retrieval, classification	1×
`claude-sonnet-4-5`	Coding, debugging, code review, analysis	5×
`claude-opus-4`	Architecture decisions, complex reasoning, planning	15×

Before / After Comparisons

Session start

~90% reduction

Before

Loading 12 agent definitions + full codebase context = 85k tokens

After

Load memory.md + relevant agent only = 8k tokens

File reading

~95% reduction

Before

Read entire src/lib/db.ts (400 lines) to find one function

After

Grep for function name, read 20 lines of context

Code review

~60% reduction

Before

Pass full PR diff + full file context to reviewer agent

After

Pass diff only; agent fetches context lines on demand

Measuring Token Spend

Use the /cco commands to monitor token usage across sessions:

# Show token usage for the current session
/cco budget

# Show token breakdown by task
/cco report

# Export usage data for a date range
/cco export --from 2026-03-01 --to 2026-03-27

Run /cco report weekly. Look for tasks with disproportionately high input token counts — those are the candidates for optimization first.

Get Token-Optimized by Default

The AI Starter Package is built with token efficiency as a core design principle. Context pruning, model routing, and compact command definitions are pre-configured.

View Pricing

Why Token Optimization Matters

40–60%

Typical cost reduction from optimization

10x

Cost difference between Haiku and Opus

85%

Of tokens are often wasted context

Optimization Strategies

Context Pruning

Up to 40%

Remove stale context before it accumulates. Run /sync mid-session to compress memory. Remove files from context that are no longer relevant.

Tip: Use /safe-clear to flush context and reload only what is needed for the current task.

Selective File Reading

Up to 30%

Read only the specific functions or sections you need, not entire files. Prefer grep and targeted reads over full file dumps.

Tip: Ask Claude to read lines 40-80 of a file rather than the whole thing.

Compact Prompts

Up to 20%

Eliminate filler phrases. "Please could you kindly help me with..." → "Fix:". Task descriptions should be dense, not polite.

Tip: Commands in .claude/commands/ are read every session — keep them under 200 words each.

Model Routing

Up to 60%

Use cheaper, faster models for simple tasks (formatting, linting, summarising) and reserve the full model for reasoning-heavy work.

Tip: Haiku for retrieval tasks, Sonnet for coding, Opus for architecture decisions.

Model Routing Reference

Not every task needs the most capable (and expensive) model. Routing tasks to the right model is the single highest-leverage optimization.

Model	Best for	Relative cost
`claude-haiku-3-5`	Formatting, summarising, retrieval, classification	1×
`claude-sonnet-4-5`	Coding, debugging, code review, analysis	5×
`claude-opus-4`	Architecture decisions, complex reasoning, planning	15×

Before / After Comparisons

Session start

~90% reduction

Before

Loading 12 agent definitions + full codebase context = 85k tokens

After

Load memory.md + relevant agent only = 8k tokens

File reading

~95% reduction

Before

Read entire src/lib/db.ts (400 lines) to find one function

After

Grep for function name, read 20 lines of context

Code review

~60% reduction

Before

Pass full PR diff + full file context to reviewer agent

After

Pass diff only; agent fetches context lines on demand

Measuring Token Spend

Use the /cco commands to monitor token usage across sessions:

# Show token usage for the current session /cco budget # Show token breakdown by task /cco report # Export usage data for a date range /cco export --from 2026-03-01 --to 2026-03-27

Run /cco report weekly. Look for tasks with disproportionately high input token counts — those are the candidates for optimization first.