Cross-cutting patterns from looking at 44 frameworks. The individual analyses cover each framework in isolation; this is what emerges when you look at all of them together. The architecture patterns section is useful as a mental model for categorizing frameworks. The context engineering section is, I think, the most valuable part of this entire project: it maps specific gaps against specific patterns and explains why the ecosystem systematically fails at context management. The consolidation section is prediction, which means it'll age badly, but the evidence is real as of February 2026.
Last researched: 2026-02-16
Five patterns have emerged across the ecosystem, and most frameworks fit cleanly into one of them.
Representative Frameworks: LangGraph, Mastra, Google ADK, Flowise
Graph-based architectures treat agents as nodes in a directed graph, with edges representing possible transitions between states. This pattern has become the dominant approach for production systems because it provides explicit control over execution flow.
Key Strengths:
- Deterministic execution - The graph structure makes agent behavior predictable and testable
- Visual debugging - Graphs can be rendered and inspected, making complex flows understandable
- Checkpointing - State can be persisted at any node, enabling recovery from failures
- Conditional branching - Decision logic is explicit rather than emergent
Key Tradeoffs:
- Rigidity - Complex conversational agents can feel constrained by the graph structure
- Boilerplate - Even simple agents require defining nodes and edges explicitly
- Learning curve - Developers must think in terms of state machines
When to Choose: Production agents requiring audit trails, compliance documentation, or complex multi-step workflows with many decision points. LangGraph is the mature choice here; Mastra offers a more modern TypeScript-native alternative.
When to Avoid: Rapid prototyping, simple conversational agents, or use cases requiring highly emergent behavior.
Representative Frameworks: CrewAI, OpenAI Agents SDK, AutoGen, AG2
Multi-agent systems treat complex problems as collaborations between specialized agents. Rather than building one generalist agent, you build a team where each member has specific expertise.
Key Strengths:
- Specialization - Each agent can be optimized for a specific task
- Parallelization - Multiple agents can work simultaneously on different aspects
- Modularity - Agents can be developed, tested, and deployed independently
- Natural modeling - Mirrors how human teams solve complex problems
Key Tradeoffs:
- Coordination overhead - Agents need mechanisms to share information and avoid conflicts
- Debugging complexity - Failures can cascade across agent boundaries
- Context management - Shared context vs. isolated context becomes a critical design decision
Pattern Variations:
- Handoffs (OpenAI Agents SDK): Explicit transfer of control between agents. Simple and predictable but requires the developer to define all handoff points.
- Crews (CrewAI): Agents collaborate through a shared task queue. More emergent but can be harder to debug.
- Hierarchical (AutoGen): Manager agents delegate to worker agents. Provides structure but can become bottlenecks.
When to Choose: Complex tasks requiring diverse expertise (research, code review, content creation), problems that decompose naturally into subtasks, or systems mimicking human team workflows.
When to Avoid: Simple tasks where a single agent suffices, latency-sensitive applications (coordination adds overhead), or problems without clear decomposition.
Representative Frameworks: Letta (MemGPT), LangChain with custom memory
Memory-first architectures treat persistent state as a core primitive rather than an add-on. Letta pioneered this approach with its tiered memory system inspired by operating system memory management.
Key Strengths:
- Long-horizon agents - Agents can maintain context across days, weeks, or longer
- Self-improvement - Agents can learn from past interactions and improve over time
- Resource optimization - Automatic archival and retrieval keeps context windows efficient
- Persistence - Agent state survives restarts and can be migrated between sessions
Key Tradeoffs:
- Complexity - Memory management adds significant architectural complexity
- Retrieval failures - If the wrong memories are retrieved, agent behavior degrades
- Storage costs - Persistent memory requires database/storage infrastructure
Technical Deep Dive: Letta's three-tier memory system (core memory, archival memory, recall memory) is the most sophisticated implementation. Core memory is always available, archival stores long-term history, and recall handles intermediate retrieval. This allows agents to maintain coherence across thousands of turns while keeping the immediate context window focused.
When to Choose: Personal assistants, long-running customer service agents, educational tutors, or any application where conversation continuity matters.
When to Avoid: Stateless interactions, simple question-answering, or applications where storage costs are prohibitive.
Representative Frameworks: Haystack, LlamaIndex, LangChain RAG templates
Pipeline architectures focus on data flow: ingest, process, index, retrieve, generate. These frameworks excel at knowledge-intensive applications.
Key Strengths:
- Data integration - Hundreds of connectors for data sources
- Processing flexibility - Documents can be chunked, embedded, and enriched
- Retrieval optimization - Multiple retrieval strategies (dense, sparse, hybrid)
- Proven patterns - RAG is well-understood with established best practices
Key Tradeoffs:
- Not agent-native - Agents feel like an afterthought rather than a core primitive
- Static pipelines - Dynamic agent behavior is harder to implement
- Complexity at scale - Managing indices, embeddings, and retrieval becomes operational overhead
When to Choose: Question-answering systems, knowledge bases, document analysis, or any application where retrieval from a corpus is central.
When to Avoid: Dynamic agent interactions, tool use-heavy applications, or multi-agent systems.
Representative Frameworks: Smolagents, Claude Code, E2B-based agents
These frameworks prioritize agents that can take actions in the world: execute code, browse the web, interact with APIs. The agent is defined by what it can do, not how it reasons.
Key Strengths:
- Action-oriented - Direct integration with real-world systems
- Code generation - Agents can write and execute code safely
- Verification - Code execution produces verifiable results
- Flexibility - Arbitrary Python/TypeScript code execution
Key Tradeoffs:
- Security concerns - Executing agent-generated code requires sandboxing
- Debugging difficulty - Failures in external systems can be hard to trace
- Non-determinism - Tool outputs vary, making testing challenging
When to Choose: Coding assistants, automation agents, data processing pipelines, or applications requiring integration with external systems.
When to Avoid: Simple conversational agents, high-security environments where code execution is restricted, or applications requiring deterministic outputs.
The analysis reveals a systemic failure across the ecosystem: context engineering is treated as an implementation detail rather than a core concern. This section examines how each framework handles the 8 contextpatterns.com patterns and identifies specific gaps.
| Pattern | Strong Support | Partial Support | Missing | Analysis |
|---|---|---|---|---|
| Select, Don't Dump | LlamaIndex, Haystack | LangChain (retrievers) | Most frameworks | Only RAG-centric frameworks treat context curation as primary. Others append everything. |
| Write Outside the Window | Letta, Phidata | LangChain (vector stores) | Vercel AI SDK, Instructor | Persistent storage is available but requires manual integration. No framework makes it automatic. |
| Progressive Disclosure | LlamaIndex, Haystack | LangChain (retrievers) | Most frameworks | On-demand loading exists but requires explicit configuration. |
| The Pyramid | All (manual) | None automatic | None automatic | All frameworks allow prompt structuring, but none enforce or guide it. |
| Context Rot Awareness | None | None | All | Major gap. No framework monitors context quality degradation. |
| Compress & Restart | Letta | LangChain (manual) | Most frameworks | Only Letta provides automatic summarization. Others require developers to implement. |
| Isolate | None | Partial (sub-agents) | All | Sub-agents typically share parent context. No fine-grained isolation controls. |
| Recursive Delegation | ADK, CrewAI | LangGraph | Most frameworks | Spawning child agents with controlled context is possible but not easy. |
Historical Baggage: Most frameworks evolved from LLM wrappers. LangChain started as a chain-of-thought utility; CrewAI began as a multi-agent experiment. Context engineering wasn't a design priority from the start.
Abstraction Mismatch: Frameworks provide memory abstractions (ConversationBufferMemory, ChatMessageHistory) but these are storage primitives, not context management strategies. Developers get a database, not a context strategy.
No Quality Metrics: Frameworks measure tokens, not context quality. A conversation with 50 turns might have 80% noise, but the framework happily passes all 50 turns to the model. There's no mechanism to evaluate whether context is helping or hurting.
One-Size-Fits-All Defaults: LangChain's default memory appends every message. At turn 50 in a customer service conversation, you're spending context budget on the greeting from 45 minutes ago. The framework doesn't warn you, doesn't suggest compression, doesn't track relevance decay.
LangChain: Memory is fragmented across modules (langchain.memory, langchain_community.chat_message_histories). There's no unified context strategy. The ConversationBufferWindowMemory requires manual window sizing with no guidance.
CrewAI: Tasks carry context forward automatically, but there's no mechanism to trim or prioritize. A crew of 5 agents working on a 20-step task accumulates massive context with no quality control.
Vercel AI SDK: Being frontend-focused, it punts on context management entirely. Developers must implement their own storage and retrieval strategies.
OpenAI Agents SDK: Handoffs can carry context, but there's no built-in mechanism to summarize or compress context when passing between agents.
This analysis reveals context engineering as the most significant underserved area in the agent framework ecosystem. While LangChain provides building blocks and Letta offers sophisticated memory management, no framework provides:
- Automatic relevance scoring - Identifying which context items are actually being used
- Quality degradation warnings - Alerting when context is becoming counterproductive
- Intelligent compression - Automatic summarization when context grows too large
- Sub-agent isolation - Controlled context sharing between parent and child agents
- Cost-aware strategies - Optimizing context size against inference costs
For contextpatterns.com: This represents a major content opportunity. The frameworks provide the building blocks, but practitioners need guidance on how to use them effectively. Specific implementation patterns ("How to implement Context Rot detection in LangChain") would be highly valuable.
The agent framework ecosystem is consolidating. This isn't speculation; it's visible in GitHub activity, corporate strategy, and community migration patterns.
LangChain/LangGraph:
- ~90K GitHub stars (combined)
- Weekly releases with consistent cadence
- Enterprise traction visible in case studies (SAP, Moody's, Replit)
- LangSmith generating significant revenue
- Verdict: Dominant and consolidating position
CrewAI:
- ~25K GitHub stars
- Strongest growth trajectory in multi-agent space
- Raised venture funding, building standalone (not dependent on LangChain)
- Verdict: Becoming the default for multi-agent systems
Pydantic AI:
- ~12K GitHub stars in <1 year
- Rapid development (frequent releases)
- Strong type safety story resonates with Python developers
- Logfire integration provides differentiated observability
- Verdict: Rising star, especially for Python type-safety advocates
Vercel AI SDK:
- ~30K GitHub stars
- Dominant in TypeScript/React ecosystem
- Deep integration with Next.js deployment
- Verdict: Default choice for web developers
AutoGen:
- Microsoft's focus has shifted to Semantic Kernel and a rumored "Microsoft Agent Framework"
- AutoGen Studio (the UI) is officially in maintenance mode
- Community fragmentation with AG2 fork
- Last significant release: 3 months ago
- Verdict: Uncertain future; Microsoft may deprecate in favor of new framework
Haystack:
- Strong in RAG but limited agent traction
- deepset (the company) focusing on enterprise consulting
- Community growth slowing relative to LlamaIndex
- Verdict: Likely to remain a RAG tool rather than agent platform
Semantic Kernel:
- Strong Microsoft backing but limited non-Microsoft adoption
- .NET-first design limits Python/TypeScript community
- Verdict: Will survive as Microsoft ecosystem tool, unlikely to achieve broader adoption
All major frameworks are converging on common patterns:
-
Graph orchestration: Even CrewAI (originally pure multi-agent) added Flows (graph-based). OpenAI's Agents SDK supports graph patterns.
-
MCP support: Model Context Protocol adoption is universal. Frameworks recognize that tool standardization benefits everyone.
-
Multi-agent patterns: Single-agent frameworks (Vercel AI SDK, Pydantic AI) are adding multi-agent capabilities.
-
Observability: Tracing and debugging are now expected features. LangSmith, Logfire, Braintrust competition is driving rapid improvement.
-
Provider agnostic: Frameworks that started provider-specific (OpenAI Agents SDK, Google ADK) are adding multi-provider support.
By late 2026, expect:
Tier 1 (Dominant):
- LangChain/LangGraph: The safe enterprise choice
- CrewAI: The multi-agent default
- Vercel AI SDK: The TypeScript/web default
Tier 2 (Strong Niches):
- Pydantic AI: Python type-safety advocates
- LlamaIndex: RAG-heavy applications
- Letta: Long-running/persistent agents
- OpenAI Agents SDK: OpenAI ecosystem
Tier 3 (Survivors):
- Semantic Kernel: Microsoft shops
- Haystack: RAG specialists
Likely Departures:
- AutoGen (replaced by Microsoft Agent Framework)
- AG2 (failed to achieve escape velocity)
- Phidata (overlaps too much with Pydantic AI)
Last researched: 2026-02-16