Proprietary Framework

Context Engineering

Prompt engineering optimizes a single message. Context engineering optimizes the entire information pipeline — what gets loaded, when, in what form, and at what cost.

This is a preview — the full guide is in our Ebook Bundle.

The bundle includes the complete Context Engineering playbook, implementation templates, worked examples with Redis and Pinecone, and all 143 agent skills.

Get the Bundle

What Context Engineering Is

An AI model's output is bounded by its context window — everything it can "see" at once. Context engineering is the discipline of managing that window deliberately: deciding what information to include, how to compress it, how to retrieve it on demand, and how to cache it efficiently across calls.

Where prompt engineering asks "how do I phrase this request?", context engineering asks "what should the model know right now, and how did that information get here?". It spans memory architecture, retrieval systems, caching layers, and token budget allocation.

4 pillars

Windows, memory, RAG, caching

40–60%

Typical cost reduction when applied

7 tiers

Our memory architecture covering all 4 cognitive types

The 4 Pillars of Context Engineering

A framework for thinking about information management in AI systems.

Context Windows & Token Management

—Budget allocation — assign token budgets per task type, not per session
—Selective inclusion — load only what the current task actually requires
—Compression — summarize stale context rather than discarding or retaining it verbatim
—Model routing — Haiku for retrieval, Sonnet for coding, Opus for architecture

Key insight: Up to 85% of tokens in a naive agent session are wasted context. Budget-aware loading cuts costs 40–60% without touching output quality.

Memory Systems

—Working memory — active session context; fast but bounded (our Tier 1: memory.md)
—Episodic memory — chronological history of past events (our Tier 6: daily notes)
—Semantic memory — facts, rules, and generalised knowledge (our Tier 3/5: knowledge base)
—Procedural memory — how to do things; agent-specific patterns (our Tier 2/4: agent memory)

Key insight: Our 7-tier architecture maps directly onto all four cognitive memory types — implemented entirely in plain-text files, version-controlled alongside your codebase.

Retrieval-Augmented Generation (RAG)

—Document chunking — split content at semantic boundaries, not arbitrary character counts
—Vector embeddings — encode meaning, not keywords; use tools like Pinecone or ChromaDB
—Similarity search — retrieve the top-k most relevant chunks at query time
—Re-ranking — apply a cross-encoder pass to reorder results before passing to the model

Key insight: RAG keeps the context window lean while giving the model access to arbitrarily large knowledge bases. The quality of chunking determines the ceiling of retrieval accuracy.

Caching Strategies

—Prompt caching — cache static prefixes (system prompts, docs) to avoid re-encoding them
—Response caching — store outputs for deterministic queries; skip redundant inference calls
—Semantic caching — match near-duplicate queries using embedding similarity, not exact strings
—Multi-tier hierarchies — L1 in-memory cache → L2 Redis → L3 vector store fallback

Key insight: Redis-backed semantic caches can reduce inference calls by 30–70% on knowledge-heavy workloads. The key is pairing a similarity threshold with a TTL — cache hits degrade silently without both.

How AI Starter Package Implements This

Context engineering is baked into the AI Starter Package at every layer — not bolted on after the fact.

7-Tier Memory Architecture

Our memory system maps the four cognitive memory types to concrete file-based tiers. Working memory (Tier 1–2), episodic memory (Tier 6), semantic memory (Tier 3 + 5), and procedural memory (Tier 4) are all present, bounded, and auto-maintained.

Read the Memory System Guide →

Auditor-Gated Knowledge Promotion

The Auditor agent reviews candidate learnings in knowledge-nominations.md before promoting them to knowledge-base.md. Every entry requires a [Source:] tag. The knowledge base stays high-signal because noise is blocked at the gate, not cleaned up after the fact.

Token Optimization by Default

Context pruning, selective file reads, and model routing are pre-configured. Sessions load only the relevant agent and current memory — not the full agent library. Typical cost reduction: 40–60%.

Read the Token Optimization Guide →

Context Health Monitoring via Hooks

12 automated hooks fire on git, file, and session events. The PreCompact hook saves state before auto-compaction. The SessionStart hook restores context and surfaces degradation signals. Use /safe-clear proactively when tool call count exceeds ~30 or output quality degrades.

Anti-Patterns to Avoid

These patterns are common, easy to fall into, and quietly expensive.

Context Bloat

High impact

Symptom: Loading all files, all agents, all memory every session

Fix: Load only what the current task requires. Use /safe-clear to flush and reload minimal context.

Naive Caching

High impact

Symptom: Caching responses without relevance scoring or TTL

Fix: Pair every cache entry with a similarity threshold and an expiry. Stale cache hits are worse than cache misses.

Synchronous Bottlenecks

Medium impact

Symptom: Blocking on vector search or memory reads before generating output

Fix: Pre-fetch context in parallel. Retrieval and generation can overlap in a well-designed pipeline.

Inadequate Versioning

Medium impact

Symptom: Mutating memory files without tracking changes or provenance

Fix: Every knowledge-base entry needs a [Source:] tag. Use git to version memory files. Auditor-gate promotions.

Multi-Agent Context Routing

In an agent swarm, context does not live in one place — it flows between agents as tasks are handed off. Each agent should receive only the context slice it needs, not a full dump of everything every parent agent knows.

Orchestrator ├── loads: memory.md + task brief (2k tokens) ├── spawns: Code Agent │ └── receives: diff + relevant types + task spec (8k tokens) ├── spawns: Review Agent │ └── receives: diff + style rules + prior review notes (5k tokens) └── spawns: Docs Agent └── receives: changed functions + doc templates (3k tokens) Total context per agent: ~5k avg vs. naive full-context pass: ~40k per agent

The orchestrator is the only agent that holds the full task context. Subagents receive purpose-built slices. This keeps each agent's window lean, reduces cost, and prevents cross-contamination of concerns between agents.

Referenced Tools

Tool	Role in context engineering	Category
`Redis`	Semantic cache + multi-tier L2 store	Caching
`Pinecone`	Managed vector database for embeddings at scale	RAG
`ChromaDB`	Local-first vector store, zero-infra for dev	RAG
`LangChain`	Chunking pipelines, retrieval chains, document loaders	Orchestration
`Claude Code MCP`	Structured entity + relation memory via knowledge graph	Memory

Get Context Engineering Pre-Built

The AI Starter Package ships with the full 4-pillar context engineering framework implemented: 7-tier memory, auditor-gated knowledge, token-optimised loading, health monitoring hooks, and multi-agent context routing out of the box.

View Pricing