Loading...
Loading...
Every character Claude reads and writes consumes tokens. Input tokens cost money when Claude reads your files, and output tokens cost more when Claude generates responses. A single session that reads 20 large files and generates detailed analysis can consume millions of tokens — the equivalent of several dollars in API costs.
The goal is not to use fewer tokens blindly but to use them efficiently. Read only what you need, generate only what is useful, and route tasks to the cheapest model that can handle them. Token optimization is about getting the same quality output at a fraction of the cost.
The context window is finite. Once it fills up, Claude loses the ability to reference earlier information. Proactive context management means being deliberate about what enters the window:
memory.md under 100 lines. Archive stale information to daily notesA well-managed context window means Claude has access to relevant information for the entire session instead of forgetting critical details halfway through.
Prompt compression reduces the token count of your instructions without losing meaning. The techniques are straightforward:
Not every task needs the most powerful model. Smart routing matches task complexity to model capability:
A team that routes 70% of tasks to Sonnet and only escalates the hardest 10% to Opus will spend a fraction of what a team that uses Opus for everything spends — with nearly identical results.
Prompt caching lets you avoid re-sending the same context on every request. When your CLAUDE.md, type definitions, and memory files are cached, Claude only processes new content in each turn. This can reduce input token costs by 50-90% for long sessions.
Structure your files for maximum cache hits: keep stable content (rules, architecture decisions) in files that rarely change, and volatile content (current task, progress) in separate files that update frequently. The stable files stay cached while only the volatile files consume fresh tokens.
Measure your current usage and apply three optimizations:
Want automated token tracking? The AI Brain Pro package includes a PostToolUse hook that logs token consumption per session and alerts you when costs spike. View pricing →