Context Engineering: AI Agent Optimization Guide

Anthropic reveals advanced strategies for managing context in AI agents, from token optimization to long-horizon task handling and multi-agent architectures.

by HowAIWorks Team
aiagentscontext-engineeringanthropicclaudeoptimizationai-developmentprompt-engineeringmemory

Introduction

As AI development matures, the focus is shifting from crafting perfect prompts to a more fundamental challenge: context engineering. In a comprehensive engineering blog post published on September 29, 2025, Anthropic's engineering team revealed that building effective AI agents is less about finding the right words and more about answering a critical question: "What configuration of context is most likely to generate our model's desired behavior?"

Released alongside Claude Sonnet 4.5, this guidance represents years of experience building sophisticated agents like Claude Code and marks a significant evolution in AI development methodology. Context represents the set of tokens included when sampling from a large language model (LLM), and the engineering challenge is optimizing the utility of those tokens against inherent LLM constraints to consistently achieve desired outcomes.

This methodology complements Anthropic's recent guidance on writing effective tools for agents and the Claude Agent SDK, forming a comprehensive framework for building production-ready AI agents.

From Prompt Engineering to Context Engineering

The Evolution of AI Development

Prompt Engineering emerged as the primary focus in early LLM applications, centered on writing and organizing instructions for optimal outcomes. This approach worked well for one-shot classification and simple text generation tasks.

Context Engineering represents the natural progression, encompassing strategies for curating and maintaining the optimal set of tokens during inference. This includes:

  • System instructions
  • Tool definitions
  • Model Context Protocol (MCP) integrations
  • External data sources
  • Message history
  • Memory systems

As agents operate over multiple turns and longer time horizons, they generate exponentially more data that could be relevant for the next inference step. Context engineering addresses the challenge of cyclically refining this information to maintain only what's necessary.

Why This Shift Matters

The difference is fundamental:

  • Prompt engineering: Discrete task of writing a single prompt
  • Context engineering: Iterative process where curation happens at each inference step

This shift reflects the move from simple, stateless interactions to sophisticated agents that maintain state, use tools, and operate over extended periods.

The Attention Budget: Why Context is Finite

Understanding Context Rot

Research on "needle-in-a-haystack" benchmarking has uncovered a critical phenomenon: context rot. As the number of tokens in the context window increases, the model's ability to accurately recall information from that context decreases.

This degradation occurs across all models, though some exhibit more gentle decline than others. The implication is clear: context must be treated as a finite resource with diminishing marginal returns.

Architectural Constraints

The attention scarcity stems from the transformer architecture itself:

N² Complexity: Every token in the context can attend to every other token, creating n² pairwise relationships for n tokens. As context length increases, the model's ability to capture these relationships gets stretched thin.

Training Distribution: Models develop attention patterns from training data where shorter sequences are more common than longer ones. This means models have less experience with, and fewer specialized parameters for, long-range dependencies.

Position Encoding: Techniques like position encoding interpolation allow models to handle longer sequences by adapting them to originally trained smaller contexts, though with some degradation in token position understanding.

These factors create a performance gradient rather than a hard cliff - models remain capable at longer contexts but may show reduced precision for information retrieval and long-range reasoning.

Anatomy of Effective Context

The Guiding Principle

Good context engineering means finding the smallest possible set of high-signal tokens that maximize the likelihood of the desired outcome. Here's how this applies to each component:

1. System Prompts: The Right Altitude

System prompts should use simple, direct language at the "right altitude" - a balance between two extremes:

Too Low (Brittle):

  • Hardcoded complex logic
  • Excessive if-else instructions
  • Fragile, difficult to maintain
  • Over-specification

Too High (Vague):

  • Overly general guidance
  • False assumptions about shared context
  • Unclear expectations
  • Under-specification

Optimal Approach:

  • Specific enough to guide behavior effectively
  • Flexible enough to provide strong heuristics
  • Organized into distinct sections (background, instructions, tool guidance, output description)
  • Use XML tags or Markdown headers for clear delineation
  • Start minimal, add based on observed failure modes

2. Tools: Promoting Efficiency

Tools define the contract between agents and their information/action space. Effective tools:

Design Principles:

  • Minimal overlap in functionality
  • Self-contained and robust to errors
  • Crystal clear intended use
  • Descriptive, unambiguous parameters
  • Token-efficient return values
  • Encourage efficient agent behaviors

Common Failure Modes:

  • Bloated tool sets covering too much functionality
  • Ambiguous decision points about which tool to use
  • Unclear boundaries between similar tools
  • Verbose or low-signal outputs

Best Practice: If a human engineer can't definitively say which tool should be used in a given situation, an AI agent can't be expected to do better.

3. Examples: Quality Over Quantity

Few-shot prompting remains a best practice, but implementation matters:

Avoid:

  • Laundry lists of edge cases
  • Attempting to cover every possible scenario
  • Redundant or similar examples

Prefer:

  • Diverse, canonical examples
  • Clear portrayal of expected behavior
  • Representative edge cases
  • "Pictures worth a thousand words" - examples are often more powerful than lengthy descriptions

Agentic Search: Just-in-Time Context Retrieval

The Traditional Approach

Many AI applications employ embedding-based pre-inference retrieval to surface important context. This involves:

  • Creating vector embeddings of information
  • Storing in vector databases
  • Retrieving similar content based on query embeddings

Limitations:

  • Requires maintaining indices
  • Can become stale
  • Limited by pre-defined retrieval strategies
  • Doesn't adapt to agent's actual needs

The Agentic Alternative

Modern agents can explore their environment autonomously through tools, discovering context just-in-time:

How It Works:

  • Agents use tools like glob, grep, list_directory, read_file
  • Each interaction yields context that informs the next decision
  • File sizes suggest complexity
  • Naming conventions hint at purpose
  • Timestamps proxy for relevance
  • Layer-by-layer understanding

Advantages:

  • No stale indexing
  • Self-managed context window
  • Focused on relevant subsets
  • Adapts to actual task requirements
  • Bypasses complex syntax trees

Trade-offs:

  • Runtime exploration is slower than pre-computed retrieval
  • Requires opinionated engineering to ensure proper navigation
  • Without guidance, agents can waste context on dead-ends

The Hybrid Strategy

The most effective approach often combines both methods:

Example: Claude Code:

  • CLAUDE.md files dropped into context up front (speed)
  • Primitives like glob and grep for just-in-time navigation (flexibility)
  • Balances speed with adaptability

Decision Factors:

  • Task characteristics
  • Content dynamism
  • Performance requirements
  • Agent capabilities

As model capabilities improve, the trend moves toward letting intelligent models act intelligently with progressively less human curation.

Long-Horizon Task Management

The Challenge

Long-horizon tasks require agents to maintain coherence and goal-directed behavior over sequences where token count exceeds the context window. Examples include:

  • Large codebase migrations (tens of minutes)
  • Comprehensive research projects (hours)
  • Extended software development tasks (30+ hours)

Waiting for larger context windows isn't a complete solution - context pollution and information relevance concerns persist at all scales.

Technique 1: Compaction

What It Is: Summarizing conversation history and reinitiating with a compressed version.

Implementation:

  • Pass message history to the model for summarization
  • Preserve critical details: architectural decisions, unresolved bugs, implementation details
  • Discard redundant content: tool outputs, repeated messages
  • Continue with compressed context plus recently accessed resources

Example from Claude Code:

  • Summarize and compress critical details
  • Keep five most recently accessed files
  • Maintain continuity without hitting limits

Best Practices:

  • Maximize recall first (capture everything relevant)
  • Then improve precision (eliminate superfluous content)
  • Tune on complex agent traces
  • Clear tool calls and results from deep history

Tool Result Clearing: One of the safest forms of compaction - once a tool result is used, why keep the raw output deep in history? This feature is now available on the Claude Developer Platform, launched in September 2025.

Example Compaction Approach:

# Simplified compaction strategy
def compact_context(message_history, max_tokens=100000):
    if count_tokens(message_history) < max_tokens:
        return message_history
    
    # Keep recent messages and system prompts
    system_messages = [m for m in message_history if m.role == 'system']
    recent_messages = message_history[-10:]  # Keep last 10 messages
    
    # Summarize the middle section
    middle_section = message_history[len(system_messages):-10]
    summary = {
        'role': 'system',
        'content': f'Previous context summary: {summarize_messages(middle_section)}'
    }
    
    return system_messages + [summary] + recent_messages

Technique 2: Structured Note-Taking

What It Is: Agents regularly write notes to persistent memory outside the context window.

Implementation:

  • Agent maintains notes files (e.g., NOTES.md, TODO.md)
  • Tracks progress across complex tasks
  • Maintains critical context and dependencies
  • Reads notes after context resets

Example Note Structure:

# Agent Memory - Project Alpha

## Current Objective
Implement user authentication system with OAuth2 support

## Progress
- ✅ Set up database schema (users, sessions tables)
- ✅ Implemented password hashing with bcrypt
- 🔄 Working on OAuth2 provider integration
- ⏳ Pending: Email verification flow

## Key Decisions
- Using JWT tokens with 24-hour expiration
- Refresh tokens stored in secure HTTP-only cookies
- Rate limiting: 5 login attempts per 15 minutes

## Next Steps
1. Complete Google OAuth2 integration
2. Add email verification endpoint
3. Write integration tests for auth flow

Example: Claude Playing Pokémon:

  • Maintains precise tallies across thousands of game steps
  • "For the last 1,234 steps I've been training in Route 1, Pikachu gained 8 levels toward target of 10"
  • Develops maps of explored regions
  • Remembers key achievements and strategic patterns
  • Continues multi-hour sequences after resets

Anthropic Memory Tool:

  • Public beta on Claude Developer Platform
  • File-based system for storing information outside context
  • Build knowledge bases over time
  • Maintain project state across sessions
  • Reference previous work without keeping everything in context
  • Released alongside Claude Sonnet 4.5 in September 2025

Technique 3: Multi-Agent Architectures

What It Is: Specialized sub-agents handle focused tasks with clean context windows, coordinated by a main agent.

How It Works:

  • Main agent maintains high-level plan and coordination
  • Sub-agents perform deep technical work
  • Each sub-agent explores extensively (tens of thousands of tokens)
  • Returns condensed summary (1,000-2,000 tokens)
  • Clear separation of concerns

Advantages:

  • Detailed search context isolated within sub-agents
  • Lead agent focuses on synthesis and analysis
  • Parallel exploration where beneficial
  • Reduced context pollution in main agent

Use Case: Research Systems: Anthropic's multi-agent research system showed substantial improvements over single-agent systems on complex research tasks.

Choosing the Right Technique

Match the technique to task characteristics:

  • Compaction: Maintains conversational flow for extensive back-and-forth
  • Note-taking: Excels for iterative development with clear milestones
  • Multi-agent: Handles complex research/analysis requiring parallel exploration

Practical Implementation Strategies

Start Simple

Anthropic's recurring advice: "Do the simplest thing that works."

  1. Test minimal prompts with the best available model
  2. Observe performance on your specific tasks
  3. Add clear instructions based on failure modes
  4. Iterate based on real-world usage

Monitor Key Metrics

Track context usage and efficiency:

  • Token consumption per task
  • Success rates at different context lengths
  • Tool usage patterns
  • Time to task completion
  • Error rates and types

Iterative Refinement

Context engineering is an ongoing process:

  1. Deploy with minimal viable context
  2. Measure performance and behavior
  3. Identify bottlenecks and failures
  4. Optimize based on data
  5. Repeat continuously

Balance Autonomy and Guidance

As models improve:

  • Give more autonomy to capable models
  • Reduce prescriptive engineering
  • Let intelligent models act intelligently
  • Maintain safety boundaries
  • Iterate based on capability advances

Industry Applications

Software Development

Context engineering enables:

  • Extended coding sessions (30+ hours of focused work)
  • Large codebase navigation and modification
  • Multi-file refactoring with coherence
  • Bug tracking across long debugging sessions
  • Documentation generation from scattered sources

Research and Analysis

Long-horizon capabilities support:

  • Comprehensive literature reviews
  • Multi-source information synthesis
  • Extended hypothesis exploration
  • Complex data analysis workflows
  • Report generation from extensive research

Customer Support

Context management improves:

  • Multi-turn problem resolution
  • Account history maintenance
  • Cross-department information coordination
  • Escalation context preservation
  • Long-term customer relationship tracking

Future Implications

Evolving Best Practices

As models become more capable:

  • Less prescriptive prompting needed
  • More autonomous exploration possible
  • Adaptive context strategies emerge
  • Dynamic tool selection improves
  • Self-optimizing systems develop

Remaining Challenges

Even with improving models:

  • Context will remain a precious resource
  • Attention budget management stays critical
  • Token efficiency remains important
  • Information relevance continues to matter
  • Strategic curation provides value

The Path Forward

The field is converging on a simple agent definition: LLMs autonomously using tools in a loop.

As underlying models improve:

  • Autonomy can scale proportionally
  • Agents navigate nuanced problem spaces
  • Error recovery becomes more robust
  • Complex multi-step tasks become feasible
  • Human oversight requirements decrease

Key Takeaways

For Developers

  1. Think holistically about entire context state, not just prompts
  2. Treat context as finite resource with diminishing returns
  3. Start minimal and add based on observed failures
  4. Enable agent exploration through well-designed tools
  5. Choose appropriate techniques for long-horizon tasks
  6. Iterate continuously based on real-world performance

For Organizations

  1. Invest in context engineering as core AI competency
  2. Monitor token efficiency as key performance metric
  3. Design for long-horizon tasks from the start
  4. Balance speed with adaptability in retrieval strategies
  5. Prepare for increasing autonomy as models improve
  6. Maintain safety boundaries while enabling exploration

Core Principles

  • Smallest possible set of high-signal tokens
  • Right altitude for instructions - specific yet flexible
  • Efficient tools that promote good agent behaviors
  • Quality examples over exhaustive edge cases
  • Just-in-time retrieval when feasible
  • Persistent memory for extended tasks
  • Clear separation of concerns in multi-agent systems

Conclusion

Context engineering represents a fundamental shift in building with LLMs. As we move from simple prompt optimization to sophisticated agent development, success depends on thoughtfully curating what information enters the model's limited attention budget at each step.

Whether implementing compaction for long-horizon tasks, designing token-efficient tools, or enabling just-in-time exploration, the guiding principle remains constant: find the smallest set of high-signal tokens that maximize the likelihood of your desired outcome.

The techniques outlined by Anthropic will continue evolving as models improve. While smarter models require less prescriptive engineering and operate with more autonomy, treating context as a precious, finite resource will remain central to building reliable, effective agents.

As the field progresses, the most successful AI systems will be those that master the art of context engineering - understanding not just what to tell an AI agent, but what information to provide, when to provide it, and how to maintain coherence across extended interactions.

Sources


Want to deepen your understanding of AI agent development? Explore our AI fundamentals courses, check out our glossary of AI terms, or discover AI development tools in our comprehensive catalog. For more on Anthropic's AI models, visit our Claude model pages.

Frequently Asked Questions

Context engineering is the evolution of prompt engineering, focusing on curating and maintaining the optimal set of tokens during LLM inference. While prompt engineering focuses on writing effective prompts, context engineering manages the entire context state including system instructions, tools, external data, and message history.
Despite increasing context windows, LLMs experience 'context rot' - as token count increases, the model's ability to accurately recall information decreases. This stems from architectural constraints where every token attends to every other token, creating n² relationships that stretch the model's attention budget thin.
Three main techniques are used: compaction (summarizing conversation history), structured note-taking (maintaining external memory files), and multi-agent architectures (specialized sub-agents handling focused tasks with clean context windows).
Agentic search allows agents to explore their environment just-in-time through tools like glob and grep, rather than relying on pre-computed embeddings. This avoids issues like stale indexing and gives agents autonomy to discover context as needed.

Continue Your AI Journey

Explore our lessons and glossary to deepen your understanding.