Loading...
Loading...
Heavy Claude Code usage can get expensive fast. These strategies reduce token spend by 40–60% without degrading output quality.
Claude charges per token — input and output. A naive agent setup loads every file, every agent definition, and every memory entry into every session. On a busy day that can mean millions of tokens across dozens of tasks.
Remove stale context before it accumulates. Run /sync mid-session to compress memory. Remove files from context that are no longer relevant.
Read only the specific functions or sections you need, not entire files. Prefer grep and targeted reads over full file dumps.
Eliminate filler phrases. "Please could you kindly help me with..." → "Fix:". Task descriptions should be dense, not polite.
Use cheaper, faster models for simple tasks (formatting, linting, summarising) and reserve the full model for reasoning-heavy work.
Not every task needs the most capable (and expensive) model. Routing tasks to the right model is the single highest-leverage optimization.
| Model | Best for | Relative cost |
|---|---|---|
claude-haiku-3-5 | Formatting, summarising, retrieval, classification | 1× |
claude-sonnet-4-5 | Coding, debugging, code review, analysis | 5× |
claude-opus-4 | Architecture decisions, complex reasoning, planning | 15× |
Use the /cco commands to monitor token usage across sessions:
# Show token usage for the current session /cco budget # Show token breakdown by task /cco report # Export usage data for a date range /cco export --from 2026-03-01 --to 2026-03-27
Run /cco report weekly. Look for tasks with disproportionately high input token counts — those are the candidates for optimization first.
The AI Starter Package is built with token efficiency as a core design principle. Context pruning, model routing, and compact command definitions are pre-configured.
View Pricing