Context engineering, not model capability, is the primary bottleneck for AI coding productivity. The same agent with different context produces dramatically different results.
A coding-session-level tool can capture approximately 60-70% of context that meaningfully impacts coding outcomes. This context falls into six pillars: Intent, Structural, Convention, Historical, Operational, and Verification.
Every piece of context an agent needs ultimately answers one of these core questions:
| Question | Context Type | Traditional SDLC Source |
|---|---|---|
| WHAT should I build? | Intent Context | Requirements, user stories, tickets |
| WHY should I build it this way? | Historical Context | ADRs, design docs, team discussions |
| HOW do we do things here? | Convention Context | Style guides, code review, tribal knowledge |
| WHAT exists already? | Structural Context | Codebase, dependencies, architecture |
| WHAT happened before? | Historical Context | Git history, PR discussions, postmortems |
| WHAT could go wrong? | Operational Context | Past bugs, production incidents |
| HOW do I know it's right? | Verification Context | Test strategies, acceptance criteria, QA |
The revelation: All this context already exists in every software organization. It is scattered across people's heads (tribal knowledge), documents that drift from reality, implicit patterns in code, conversations that disappear, and tickets that become obsolete. The opportunity is to materialize it, version it, and make it queryable.
Intent context answers: WHAT should I build and WHY (at the task level)?
| Sub-type | Description | Example | Coding Impact |
|---|---|---|---|
| Goal | The desired end state | Users can export reports as PDF | HIGH - Directs implementation |
| Acceptance Criteria | Measurable success conditions | Export completes in <5s for 1000 rows | HIGH - Defines done |
| Non-goals | Explicit scope boundaries | Not supporting CSV in this iteration | MEDIUM - Prevents scope creep |
| Motivation | Business/user need behind goal | Enterprise compliance requirements | LOW - Rarely impacts code |
Captureability: Goal and acceptance criteria are often stated at session start (~70% capturable). Motivation lives outside coding sessions (~20% capturable).
Key Insight: By the time you are in a coding session, motivation context has minimal impact on implementation. The scope should already be defined.
Structural context answers: WHAT exists and HOW is it organized?
| Sub-type | Description | Example | Coding Impact |
|---|---|---|---|
| Architecture | High-level system organization | Microservices with event-driven comms | HIGH - Where code goes |
| Patterns | Recurring solutions in codebase | Repository pattern for data access | HIGH - Implementation approach |
| Dependencies | What relies on what | ReportService depends on PdfGenerator | HIGH - Change impact |
| Boundaries | Module/domain separations | Billing never imports from Reporting | HIGH - What not to cross |
| Data Flow | How information moves | Events flow through Kafka | MEDIUM - Integration impl |
| API Contracts | Interface agreements | v2 endpoints return pagination metadata | HIGH - Compatibility |
Captureability: Highly extractable from codebase analysis (~75-95% depending on sub-type).
Cold Start Value: This is the primary value from cold start extraction. The agent immediately knows where to put new code and what exists to build on.
Convention context answers: HOW do we do things here?
| Sub-type | Description | Example | Coding Impact |
|---|---|---|---|
| Code Style | Formatting, naming conventions | PascalCase for components, camelCase for functions | HIGH - Syntax choices |
| Design Patterns | Preferred implementation approaches | Use hooks for state, never class components | HIGH - Implementation |
| Error Handling | How failures are managed | All API errors return ErrorResponse type | HIGH - Error code structure |
| Testing Approach | What and how to test | Unit test business logic, integration test APIs | HIGH - Test authoring |
| Documentation | What gets documented where | All public APIs need JSDoc with examples | MEDIUM - Doc writing |
| Commit Conventions | Message format, granularity | Conventional commits; one logical change | HIGH - Git workflow |
Captureability: Highly extractable from existing code patterns and linting configuration (~70-90%).
Key Insight: Convention context ensures code looks like it belongs. Without it, agent output is obviously AI-generated.
Historical context answers: WHY are things the way they are?
| Sub-type | Description | Example | Coding Impact |
|---|---|---|---|
| Decisions Made | Past architectural choices | Chose Postgres for ACID compliance | MEDIUM - Understanding state |
| Decisions Rejected | What was considered but not chosen | GraphQL rejected for caching complexity | HIGH - Prevents re-litigation |
| Evolution | How things changed over time | Auth moved from JWT to session-based in v2.3 | MEDIUM - Code archaeology |
| Bug Patterns | Past failures and resolutions | Race condition in payment fixed by mutex | HIGH - Avoid repeats |
| Workarounds | Intentional technical debt | setTimeout for library X timing bug | MEDIUM - Don't fix quirks |
Captureability: This is the WHY context that accumulates over time through session capture. Decisions rejected is particularly valuable (~75% capturable during sessions).
Key Insight: Historical context is rarely needed for daily coding but becomes critical for refactoring, migrations, and hard debugging. This is where accumulated context delivers visible magic.
Operational context answers: WHAT happens in production?
| Sub-type | Description | Example | Coding Impact |
|---|---|---|---|
| Constraints | Hard limits | Max 5MB payload; 30s timeout; 10k concurrent | LOW - Usually implicit |
| Failure Modes | How things break | Redis failover takes 30s; cache locally | LOW - Rarely in IDE |
| Performance Baselines | Expected behavior | P95 latency should stay under 200ms | LOW - Optimization targets |
| Security Boundaries | Trust zones | User input never reaches eval() | MEDIUM - Security patterns |
Captureability: Lives outside coding sessions (~20-40% capturable). Production reality is not in the IDE.
This context type is largely out of scope for coding-session tools. It represents future territory for operational context layer integration.
Verification context answers: HOW do I know it's right?
| Sub-type | Description | Example | Coding Impact |
|---|---|---|---|
| Quality Criteria | What makes code good here | No PR without tests for public methods | HIGH - Self-review |
| Test Strategy | How to prove correctness | Unit + integration + contract tests | HIGH - Test authoring |
| Review Checklist | What reviewers look for | Error handling, edge cases, types | HIGH - Compliance |
| Acceptance Tests | How to validate features | Cucumber scenarios for user journeys | MEDIUM - Feature validation |
| Compliance Rules | What must be followed | GDPR data handling, SOC2 logging | MEDIUM - Regulatory |
| Known Risks | What tends to break | Watch for N+1 queries in UserRepository | HIGH - Proactive avoidance |
Captureability: Partially extractable from test structure and CI config (~55-80% depending on sub-type).
Critical Insight: Verification context is the feedback loop decision maker. Without it, the agent can write code but cannot evaluate if the code is good. It can only check if it compiles. This is what separates a coding agent from a competent developer.
Not all context types have equal impact on coding session quality:
| Impact Level | Context Types | Why |
|---|---|---|
| CRITICAL | Structural, Convention, Intent (Goal), Verification | Core loop enablers - agent cannot function well without these |
| SIGNIFICANT | Historical (Rejected), Historical (Decisions), Intent (Criteria) | Quality multipliers - improve accuracy and reduce iterations |
| MARGINAL | Operational (Constraints), Historical (Bug patterns) | Edge case handlers - matter in specific scenarios |
| MINIMAL | Intent (Motivation), Intent (Stakeholder), Operational (Runtime) | Outside coding scope - rarely change implementation |
Different context types influence different quality dimensions:
| Quality Dimension | Primary Context Types | Outcome |
|---|---|---|
| Correctness (does it work?) | Intent, Structural | Agent builds the right thing in the right place |
| Compliance (does it fit?) | Convention, Verification | Code follows patterns, meets review criteria |
| Safety (won't break things?) | Historical, Verification | Avoids known pitfalls, respects boundaries |
| Efficiency (is it optimal?) | Structural, Operational | Knows constraints, uses existing solutions |
| Maintainability (can evolve?) | Convention, Historical | Code matches team expectations |
| Verifiability (can prove it?) | Verification, Intent | Knows what done looks like, how to test |
Current agents fail most dramatically at verification. This maps to a specific context deficit:
| Verification Level | Context Required | Without Context | With Context |
|---|---|---|---|
| Syntactic | Language rules | Linter passes | Linter passes |
| Semantic | Intent + Acceptance | Builds something | Builds the right thing |
| Stylistic | Convention | Generic patterns | Code that belongs |
| Architectural | Structural + Boundaries | May violate boundaries | Respects boundaries |
| Behavioral | Historical + Verification | Hopes it works | Avoids known failures |
| Compliance | Verification criteria | Self-approves | Meets team standards |
A tool operating at the coding session level has a specific observability window:
Directly Observable:
Outside the Window:
| Context Type | Cold Start | Session Capture | Combined |
|---|---|---|---|
| STRUCTURAL | ~75% | +10% | ~85% |
| CONVENTION | ~70% | +15% | ~85% |
| VERIFICATION | ~55% | +25% | ~80% |
| HISTORICAL | ~25% | +45% | ~70% |
| INTENT (Goal/Criteria) | ~10% | +60% | ~70% |
| OPERATIONAL | ~20% | +15% | ~35% |
Key Insight: When weighted by coding impact, capturable context covers approximately 60-70% of what actually matters for coding sessions. Structural, Convention, and Verification are both high-impact and highly capturable.
Two context types remain fundamentally outside coding session scope:
Gap 1 - Upstream Intent: The business motivation, stakeholder priorities, and full product context that explains why something should be built. This lives in product meetings, customer conversations, and strategy documents.
Gap 2 - Downstream Reality: The operational truth about what happens when code runs in production. This lives in monitoring dashboards, incident reports, and SRE runbooks.
With comprehensive context capture in place, an agent's effectiveness compounds over time. Cold start provides ~15-20% improvement on day one by giving the agent structural awareness and convention compliance. As historical context accumulates—particularly the WHY behind decisions and rejected alternatives—improvement reaches 35-45% over months, with the critical unlock occurring when the agent can handle refactoring, migrations, and complex debugging by drawing on accumulated institutional knowledge.
| Context | Day 1 Source | Accumulation Source | Impact | Improvement Contribution |
|---|---|---|---|---|
| STRUCTURAL | Cold start extraction | Session navigation patterns | Critical | ~5-8% |
| CONVENTION | Cold start extraction | Code review discussions | Critical | ~5-8% |
| VERIFICATION | Test/CI analysis | Review feedback, test sessions | Critical | ~4-6% |
| INTENT (Goal) | Session start | Stated intent patterns | Critical | ~3-5% |
| HISTORICAL | Git history (partial) | Session capture | Significant | ~8-15% |
| OPERATIONAL | Config analysis | Limited (out of scope) | Marginal | ~2-3% |
Total Expected Improvement Range: 15-20% (Day 1) → 25-35% (Month 1-3) → 35-50% (Month 6+)
The path from baseline to mature context is not about capturing more context types. It is about accumulating depth in the context types that matter most: Historical (WHY) and Verification (HOW TO KNOW).
Modern coding agents are remarkably capable at reverse-engineering context from codebases. Through agentic search, AST analysis, dependency graphs, and pattern recognition, they can discover what exists, how it's organized, and often infer conventions from code alone. This capability is real and continues to improve.
This taxonomy does not aim to replace that discovery process or invent a new paradigm. Instead, it addresses a fundamental limitation: agents can reverse-engineer the WHAT, but they cannot reverse-engineer the WHY.
No amount of codebase analysis will reveal why GraphQL was rejected, what production incident led to that defensive timeout, or which testing patterns the team actually values versus tolerates. This knowledge exists only in human heads, disappearing conversations, and tribal memory that erodes with every team change.
The opportunity is not to build something agents can't already do. It is to capture and persist the context that cannot be discovered—the decisions, rejections, rationale, and verification expectations that shape whether code is merely functional or truly belongs.
This context doesn't need to be exhaustive or perfect. It needs to be:
When an agent combines its native ability to search and analyze code with accumulated context about why things are the way they are, the result is not a different kind of agent. It is the same agent, operating with the institutional knowledge that previously existed only in senior developers' heads.
A common assumption is that better models will eventually make explicit context capture obsolete—that sufficiently advanced agents will simply figure everything out. The opposite is true. Better models amplify the value of good context, they don't replace it.
More capable models are better at capturing context: they recognize what's significant in a conversation, extract cleaner decision rationale, and identify patterns worth preserving. They're also better at utilizing context: they can reason over larger context windows, synthesize information from multiple sources, and apply historical knowledge more precisely to current tasks. The bottleneck was never the model's ability to use context—it was getting the right context to the model in the first place.
This creates a compounding relationship. As models improve, the same context infrastructure delivers progressively better outcomes. The investment in capturing WHY context and verification expectations becomes more valuable over time, not less. Teams that accumulate this context gain increasing advantage as the agents consuming it grow more sophisticated.
The goal is simple: help agents produce code that a thoughtful human teammate would produce—code that fits, that respects history, that anticipates problems, and that meets the team's actual quality bar. Not through more sophisticated models alone, but through the combination of better models with the context those models have always needed.