Context Engineering
Prompt engineering optimizes a single message. Context engineering optimizes the entire information pipeline — what gets loaded, when, in what form, and at what cost.
This is a preview — the full guide is in our Ebook Bundle.
The bundle includes the complete Context Engineering playbook, implementation templates, worked examples with Redis and Pinecone, and all 1,730+ agent skills.
What Context Engineering Is
An AI model's output is bounded by its context window — everything it can "see" at once. Context engineering is the discipline of managing that window deliberately: deciding what information to include, how to compress it, how to retrieve it on demand, and how to cache it efficiently across calls.
Where prompt engineering asks "how do I phrase this request?", context engineering asks "what should the model know right now, and how did that information get here?". It spans memory architecture, retrieval systems, caching layers, and token budget allocation.
The 4 Pillars of Context Engineering
A framework for thinking about information management in AI systems.
Context Windows & Token Management
- —Budget allocation — assign token budgets per task type, not per session
- —Selective inclusion — load only what the current task actually requires
- —Compression — summarize stale context rather than discarding or retaining it verbatim
- —Model routing — Haiku for retrieval, Sonnet for coding, Opus for architecture
Memory Systems
- —Working memory — active session context; fast but bounded (our Tier 1: memory.md)
- —Episodic memory — chronological history of past events (our Tier 6: daily notes)
- —Semantic memory — facts, rules, and generalised knowledge (our Tier 3/5: knowledge base)
- —Procedural memory — how to do things; agent-specific patterns (our Tier 2/4: agent memory)
Retrieval-Augmented Generation (RAG)
- —Document chunking — split content at semantic boundaries, not arbitrary character counts
- —Vector embeddings — encode meaning, not keywords; use tools like Pinecone or ChromaDB
- —Similarity search — retrieve the top-k most relevant chunks at query time
- —Re-ranking — apply a cross-encoder pass to reorder results before passing to the model
Caching Strategies
- —Prompt caching — cache static prefixes (system prompts, docs) to avoid re-encoding them
- —Response caching — store outputs for deterministic queries; skip redundant inference calls
- —Semantic caching — match near-duplicate queries using embedding similarity, not exact strings
- —Multi-tier hierarchies — L1 in-memory cache → L2 Redis → L3 vector store fallback
How AI Starter Package Implements This
Context engineering is baked into the AI Starter Package at every layer — not bolted on after the fact.
7-Tier Memory Architecture
Our memory system maps the four cognitive memory types to concrete file-based tiers. Working memory (Tier 1–2), episodic memory (Tier 6), semantic memory (Tier 3 + 5), and procedural memory (Tier 4) are all present, bounded, and auto-maintained.
Read the Memory System Guide →Auditor-Gated Knowledge Promotion
The Auditor agent reviews candidate learnings in knowledge-nominations.md before promoting them to knowledge-base.md. Every entry requires a [Source:] tag. The knowledge base stays high-signal because noise is blocked at the gate, not cleaned up after the fact.
Token Optimization by Default
Context pruning, selective file reads, and model routing are pre-configured. Sessions load only the relevant agent and current memory — not the full agent library. Typical cost reduction: 40–60%.
Read the Token Optimization Guide →Context Health Monitoring via Hooks
12 automated hooks fire on git, file, and session events. The PreCompact hook saves state before auto-compaction. The SessionStart hook restores context and surfaces degradation signals. Use /safe-clear proactively when tool call count exceeds ~30 or output quality degrades.
Anti-Patterns to Avoid
These patterns are common, easy to fall into, and quietly expensive.
Context Bloat
High impactNaive Caching
High impactSynchronous Bottlenecks
Medium impactInadequate Versioning
Medium impactMulti-Agent Context Routing
In an agent swarm, context does not live in one place — it flows between agents as tasks are handed off. Each agent should receive only the context slice it needs, not a full dump of everything every parent agent knows.
The orchestrator is the only agent that holds the full task context. Subagents receive purpose-built slices. This keeps each agent's window lean, reduces cost, and prevents cross-contamination of concerns between agents.
Referenced Tools
| Tool | Role in context engineering | Category |
|---|---|---|
Redis | Semantic cache + multi-tier L2 store | Caching |
Pinecone | Managed vector database for embeddings at scale | RAG |
ChromaDB | Local-first vector store, zero-infra for dev | RAG |
LangChain | Chunking pipelines, retrieval chains, document loaders | Orchestration |
Claude Code MCP | Structured entity + relation memory via knowledge graph | Memory |
Get Context Engineering Pre-Built
The AI Starter Package ships with the full 4-pillar context engineering framework implemented: 7-tier memory, auditor-gated knowledge, token-optimised loading, health monitoring hooks, and multi-agent context routing out of the box.
View Pricing