[{"data":1,"prerenderedAt":404},["ShallowReactive",2],{"navigation_docs":3,"blog-working-with-agents-in-real-codebases":4},[],{"id":5,"title":6,"author":7,"body":10,"date":389,"description":390,"extension":391,"image":392,"meta":393,"navigation":394,"path":395,"seo":396,"stem":397,"tags":398,"__hash__":403},"blog/blog/working-with-agents-in-real-codebases.md","Working With Agents in Real Codebases",{"name":8,"avatar":9},"Buildforce Team","/logo-small.png",{"type":11,"value":12,"toc":369},"minimark",[13,18,22,25,29,32,35,38,42,45,48,51,54,57,60,64,67,70,73,77,80,83,87,90,93,96,99,110,113,116,125,128,163,174,178,181,188,193,199,208,211,214,222,225,229,234,241,244,247,254,257,260,264,269,276,283,286,289,293,298,305,312,315,319,322,325,328,332,335,341,347,356,359,363,366],[14,15,17],"h2",{"id":16},"the-current-state","The current state",[19,20,21],"p",{},"Almost every engineer uses coding agents now. Cursor, Claude Code, Copilot, one of the others. Recent surveys put adoption at around 80-85% of developers - Stack Overflow's 2025 Developer Survey reports about 80%, JetBrains finds 85%. The trust numbers are a different story. Developers who report that they trust what the agent produces sit at around 30%, and that figure has been falling, not rising. That gap is the interesting part. It isn't that the models are bad. Recent models are remarkable at the kind of tasks they're asked to do in isolation.",[19,23,24],{},"The gap shows up when the work stops being isolated.",[14,26,28],{"id":27},"how-an-agent-approaches-a-task","How an agent approaches a task",[19,30,31],{},"When you hand an agent a non-trivial change, it doesn't read your entire codebase. It does something closer to what a new engineer does on their second day. It searches for entry points, follows imports, opens neighboring files, and tries to build a working model of what's going on. Agentic search, agentic exploration - the terminology varies but the process is the same. The agent reads until it thinks it has enough, makes a plan, writes code.",[19,33,34],{},"This works remarkably well when the information the agent needs lives inside the files it can read. It falls apart when the information it needs isn't there.",[19,36,37],{},"And a lot of the time, it isn't there.",[14,39,41],{"id":40},"what-code-cant-tell-you","What code can't tell you",[19,43,44],{},"Think about what actually shapes a mature codebase. Some of it is the code. But a lot of it is reasoning that happened somewhere else and left only a trace behind.",[19,46,47],{},"A caching layer with an unusual TTL, because the team discovered a subtle consistency issue during an incident. The incident lives in a postmortem. The TTL lives in the code. The reasoning that connects them lives in neither.",[19,49,50],{},"A service that still uses REST while everything around it has moved to gRPC. Looking at the codebase, REST looks like the outlier. Looking at the team's trajectory, REST is the thing being phased out. The code can't tell you which of those readings is correct.",[19,52,53],{},"A retry function with a backoff curve that looks wrong. It is not wrong. It is shaped around a third-party provider's undocumented rate-limit behavior, discovered at 3am during an outage.",[19,55,56],{},"An error handling pattern that every senior reviewer will flag if missed - log before returning - which nobody wrote a linter for because it came out of an incident and was implemented by convention. No tool catches it. Every PR gets caught on it.",[19,58,59],{},"These are not edge cases. This is most of what a senior engineer carries. The mental model of \"why this codebase is the way it is\" is maybe 70% code and 30% reasoning that lives elsewhere: in postmortems, Slack threads, PR discussions, conversations at standup, things the team said once and never wrote down. Seniors know this layer. Juniors don't. New hires take months to build it up. Agents don't build it up at all.",[14,61,63],{"id":62},"where-trust-actually-breaks","Where trust actually breaks",[19,65,66],{},"The agent that works brilliantly on a fresh repo becomes frustrating on a real one, and the frustration has a specific shape.",[19,68,69],{},"It's not that the agent is slow. It's not that the agent is confused. It's that the agent is confident, and the thing it's confident about is wrong in a way only someone with context could have caught. The code compiles. Tests pass. The PR gets submitted. A senior reviewer reads it and says \"we tried this six months ago, here's why we stopped.\" Three hours of work becomes a lesson the codebase didn't know how to teach.",[19,71,72],{},"This is why the adoption-trust gap exists in the first place. Around 80% of developers use these tools; only 30% trust what they produce. The shortfall isn't capability. It's that agents are trusted to write the parts that don't require hidden context - autocomplete, scaffolding, well-scoped functions - and not trusted with the parts that do. The work where the reasoning matters most is the work where the agent is least equipped, and everyone using these tools has felt it.",[14,74,76],{"id":75},"the-friction-compounds-with-size","The friction compounds with size",[19,78,79],{},"A 5,000-line project has little hidden reasoning. A 500,000-line project has years of it. The friction scales with how much of the team's history the agent is missing, which means agents work best exactly where teams need them least - small, young, well-documented projects - and worst exactly where teams would benefit most.",[19,81,82],{},"The common response is to give the agent more files. Bigger context windows, retrieval over wikis, documentation injection. These help at the margins. They don't solve the core problem, which is that the reasoning was never captured in the first place. You can't retrieve what doesn't exist.",[14,84,86],{"id":85},"what-engraph-is","What Engraph is",[19,88,89],{},"Engraph is a set of concepts before it is a tool. The core idea is that a codebase has a reasoning layer - the decisions, constraints, conventions, and rejected alternatives that shaped it - and that this layer can be materialized in a way the codebase itself stores, versions, and makes available to any agent that reads it.",[19,91,92],{},"Practically, Engraph is a CLI. It provides the deterministic parts of this - parsing the code, building the map, checking the layer's integrity - along with a set of skills that make the concepts usable during everyday work with a coding agent.",[19,94,95],{},"The persistent layer - what actually lives in an engraphed repository - has three pieces.",[19,97,98],{},"The codegraph is a spatial map of the repository, built deterministically from the code. Every module, package, function, and their relationships. What matters about it is not just that it exists: every element has a stable identifier that any tool can reference. When the agent needs to work on the payment module, it doesn't do keyword search across files. It resolves the scope through the codegraph, gets the precise set of modules in play, and uses those identifiers to scope everything downstream - including which contextual commits to read. This is what makes retrieval precise instead of fuzzy, and it is what lets every other piece of Engraph hang off a reliable anchor.",[19,100,101,102,109],{},"Contextual commits are structured reasoning embedded in git commit bodies. When a decision gets made during development, the rationale, the alternatives considered, the constraints discovered, and the rejected approaches are captured in the commit body itself. Permanent, versioned alongside the code, scoped to the modules the codegraph knows about, readable by any agent that can read git log. The convention itself is open and already shipped, aimed at becoming an industry standard: ",[103,104,108],"a",{"href":105,"rel":106},"https://github.com/berserkdisruptors/contextual-commits",[107],"nofollow","github.com/berserkdisruptors/contextual-commits",". Any tool that reads or writes git commits can participate, regardless of whether Engraph is in the picture at all.",[19,111,112],{},"Context files - conventions and verification procedures - are custom per repository. Small, targeted instruction packages that tell the agent how things are done here and how to verify a change is correct here. Today they are created two ways: through the cold-start mechanism, where Engraph scans the repo and proposes candidates for the user to review and accept; and through ongoing interactive work, where the user and agent surface a convention during a session and add it explicitly. They are not documentation, they are not a wiki, and they are not yet generated from accumulated commit history - that generation pipeline is on the roadmap. Today, context files are accumulated with the user in the loop.",[19,114,115],{},"These three pieces together are what an experienced engineer on the team would know. Not everything they know - Engraph isn't going to replace a senior's intuition - but enough that the agent stops being a clever contractor and starts being a competent team member.",[19,117,118,119,124],{},"On top of the persistent layer, Engraph ships a set of skills that make the three pieces actionable inside a coding agent. The skills are agent-agnostic: they follow the open Agent Skills standard (",[103,120,123],{"href":121,"rel":122},"https://agentskills.io",[107],"agentskills.io","), which means any coding agent that supports the standard can use them. Claude Code and OpenCode are what we test against directly. Anything else that implements the standard should work - teams can keep using the agent they're already comfortable with.",[19,126,127],{},"The set shipping today:",[129,130,131,139,145,151,157],"ul",{},[132,133,134,138],"li",{},[135,136,137],"code",{},"context-extract"," - the cold-start skill, though not limited to cold start. Scans the repo and proposes conventions and verification procedures for the user to review and accept. Run it again at any time to surface new patterns that emerged since the last pass; it skips what's already captured.",[132,140,141,144],{},[135,142,143],{},"context-search"," - breaks the task down, decides what to retrieve from where, and returns the relevant contextual commits, context files, and codegraph slices, scoped to the modules in play.",[132,146,147,150],{},[135,148,149],{},"context-add"," - interactively creates or updates a convention or verification procedure during a session, when one surfaces from the work.",[132,152,153,156],{},[135,154,155],{},"context-verify"," - checks a change against the conventions and verification procedures for the modules it touches, before push.",[132,158,159,162],{},[135,160,161],{},"context-commit"," - captures a session's reasoning into a contextual commit body.",[19,164,165,166,169,170,173],{},"And two deterministic maintenance commands worth naming. ",[135,167,168],{},"engraph graph"," rebuilds the codegraph from the current code; the skills use it internally whenever something might have changed, so developers don't have to think about keeping the map in sync. ",[135,171,172],{},"engraph validate"," checks the persistent layer's integrity against the code - context files that reference modules no longer in the codegraph, scopes that broke during a refactor, conventions that have drifted from what the code actually does. Small things, but together they keep the layer honest as the codebase evolves.",[14,175,177],{"id":176},"what-becomes-possible","What becomes possible",[19,179,180],{},"What follows are four use cases where the difference is concrete and immediate. These are a small selection - not the full picture of what an engraphed repo enables - but they cover the shapes of problem where the absence of captured reasoning hurts the most. Each is described as a scenario, with what an agent does today and what an agent does against an engraphed repo. These are not features Engraph adds to the agent. They are things the agent can now do because the reasoning layer exists.",[19,182,183,187],{},[184,185,186],"strong",{},"A note before the examples:"," every use case below assumes the repository has been engraphed - codegraph built, contextual commits accumulated, context files in place. Engraph doesn't retroactively add reasoning that was never captured. It makes captured reasoning available. A repository that has been using contextual commits for a week has less to work with than one that's been using them for six months. The investment is real. The payoff compounds.",[189,190,192],"h3",{"id":191},"_1-impact-prediction-before-a-change","1. Impact prediction before a change",[19,194,195,198],{},[184,196,197],{},"The scenario."," An engineer is about to add a caching layer to a hot path. The path serves user preferences, and it's read-heavy. Redis is already in the stack. The fix seems straightforward: add a cache, 60-second TTL, done.",[19,200,201,207],{},[202,203,204],"em",{},[184,205,206],{},"Without Engraph."," The agent reads the handler, sees the database call, notes that Redis is available elsewhere, writes a sensible caching wrapper. The implementation is clean. The tests pass. The PR goes up.",[19,209,210],{},"A senior engineer reviews it and asks the question the agent couldn't: \"Didn't we rip out caching on this path six months ago?\" Because the team did. An incident where preference updates didn't propagate fast enough to a downstream service led to a post-mortem, a decision to remove caching from this path, and a note that the consistency requirement would need to be solved upstream before caching could be reintroduced.",[19,212,213],{},"The agent had no way to know this. The code shows a path without caching. It doesn't show a path that used to have caching and had it removed for a specific reason.",[19,215,216,221],{},[202,217,218],{},[184,219,220],{},"With Engraph."," Before writing code, the agent queries the contextual commit history for this module. It finds the decision: caching was removed, here is the incident, here is the constraint, here is what would need to change first. Instead of writing the code, the agent surfaces the constraint and asks whether the upstream consistency issue has been resolved. If it has, it proceeds, and the new implementation cites the prior decision and explains why the constraint is no longer binding. If it hasn't, the PR never gets written.",[19,223,224],{},"The difference isn't quality of code. The difference is whether the agent reintroduces the exact failure mode the team already paid to fix.",[189,226,228],{"id":227},"_2-self-code-review-before-pushing","2. Self code review before pushing",[19,230,231,233],{},[184,232,197],{}," An engineer has finished implementing a new endpoint. Error paths are handled, types are clean, tests pass locally. Time to push.",[19,235,236,240],{},[202,237,238],{},[184,239,206],{}," The agent runs the tests, checks lint, maybe checks types. Everything green. It's ready.",[19,242,243],{},"The senior reviewer flags three things the automated checks didn't catch. The error response doesn't get logged before being returned, which has been a rule since a production incident where a silent 500 caused a week-long debugging mystery. The endpoint uses a direct database call when the team standardized on going through the service layer for cross-cutting concerns like audit logging. The integration test doesn't mock the third-party provider the way other tests in this area do, which makes it flaky in CI.",[19,245,246],{},"None of these are in a linter. All of them are in a senior's head.",[19,248,249,253],{},[202,250,251],{},[184,252,220],{}," The verification skill runs before push. It checks the change against the captured conventions for this area of the code - conventions that were extracted from the commit history, from past reviews, from the patterns that emerged after incidents. The missing log gets flagged. The direct DB call gets flagged. The test pattern mismatch gets flagged. The engineer fixes them before pushing.",[19,255,256],{},"The reviewer has nothing to catch. The PR merges on the first pass.",[19,258,259],{},"For an agency with multiple client codebases, this is the use case that pays for itself fastest. Every codebase has its own set of conventions nobody wrote down. Engraph surfaces them. Juniors stop making the same class of mistake. Reviewers stop being a bottleneck for things that aren't really review - they're re-enforcement of tribal knowledge that should have been captured the first time.",[189,261,263],{"id":262},"_3-the-branch-brief","3. The branch brief",[19,265,266,268],{},[184,267,197],{}," An engineer comes back to a feature branch on Monday after a week of working on something else. Two teammates have pushed to the branch in the meantime. What happened?",[19,270,271,275],{},[202,272,273],{},[184,274,206],{}," The engineer reads the diff. If there are a few commits, that's fast. If there are thirty commits, some with vague messages like \"fix review comments\" or \"wip,\" reading the diff isn't enough. The engineer ends up opening files, piecing together what changed, and often asking a teammate directly.",[19,277,278,282],{},[202,279,280],{},[184,281,220],{}," The agent reads the contextual commits on the branch since the last checkout. The commit bodies contain the reasoning for each change - not just \"added a queue consumer\" but \"added a queue consumer to handle the race condition we hit on Friday, chose a dead-letter-queue pattern over retry-with-jitter because of the rate-limit constraint, verification added in the integration test file.\" The agent synthesizes this into a summary: what was done, why it was done, what to look out for, what's still open.",[19,284,285],{},"The engineer reads the summary, knows the state of the branch in two minutes, and picks up where they left off.",[19,287,288],{},"This gets more valuable the more teammates are involved, the more async the team is, and the more clients an agency is balancing. The friction of context-switching between projects drops substantially.",[189,290,292],{"id":291},"_4-code-archaeology","4. Code archaeology",[19,294,295,297],{},[184,296,197],{}," An engineer is reading an unfamiliar module and encounters something that looks off. A retry function with a strange backoff curve. A config value that's hardcoded to what seems like an arbitrary number. An error handler that catches a specific exception type and re-raises it with a different message.",[19,299,300,304],{},[202,301,302],{},[184,303,206],{}," The engineer asks the agent what the code does. The agent reads the function, explains the mechanics. The engineer asks why. The agent looks for nearby comments, finds none, and answers with what amounts to plausible speculation. \"This might be for...\" \"It looks like it was designed to...\" The answer is sometimes right, often generic, occasionally wrong, and never certain.",[19,306,307,311],{},[202,308,309],{},[184,310,220],{}," The engineer asks why. The agent queries the contextual commits that touched this function. It finds the commit that introduced the retry curve, with a commit body that references the incident with the third-party provider, the undocumented rate-limit behavior discovered during the outage, the specific timing constraints that shaped the curve, and a link back to the postmortem. The answer isn't speculation. It's the actual reasoning.",[19,313,314],{},"This is the use case that shifts how engineers read unfamiliar code. Instead of guessing at intent, they ask. The codebase answers.",[14,316,318],{"id":317},"your-repo-your-data","Your repo, your data",[19,320,321],{},"A practical note, because this comes up: Engraph does not call home. No API, no telemetry, and nothing leaves the repository. Everything Engraph adds - the codegraph, the contextual commits, the context files - lives inside the repo itself, as files under version control, in a folder the team owns. They're committed and pushed on your schedule, through your git provider, under your access controls.",[19,323,324],{},"Agent inference still goes through whatever coding agent you already use - Claude Code, OpenCode, or anything else that supports the Agent Skills standard. Your subscription, your keys, your terms. Engraph doesn't sit between the agent and its provider, and it has no visibility into what you're asking the agent to do. It ships the skills and the data; the agent runs them inside the session you were going to run anyway.",[19,326,327],{},"If you stopped using Engraph tomorrow, the commit bodies and the context files would still be in the repository, still readable by any agent that can read git. There is no lock-in because there is nothing to lock in to. It's your commit history, your repo, your data.",[14,329,331],{"id":330},"what-an-engraphed-repo-means","What an engraphed repo means",[19,333,334],{},"Engraph isn't automatic yet. For these use cases to work, a repository needs three things in place.",[19,336,337,338,340],{},"The codegraph has to be built and kept current. This part is deterministic: a tool scans the repo and produces the graph. Minimal ongoing work. The ",[135,339,172],{}," command keeps the rest of the layer honest against it as the code evolves, flagging context files that reference modules that no longer exist or scopes that broke during a refactor.",[19,342,343,344,346],{},"Contextual commits have to be captured as work happens. The ",[135,345,161],{}," skill handles this during agent-assisted sessions: the agent, prompted correctly, embeds the reasoning in commit bodies as part of its workflow. No separate files. No extra process. The commit message becomes the capture mechanism. Automating the generation of commit bodies end-to-end, so the agent writes them without explicit prompting, is on the roadmap.",[19,348,349,350,352,353,355],{},"Conventions and verification procedures have to be accumulated. Two mechanisms are available today. The cold-start skill (",[135,351,137],{},") scans a repo's existing state - build configs, README, CI files, the reports from the codegraph scan - and proposes candidate conventions and verifications for the user to review and accept. ",[135,354,149],{}," creates them interactively during work, when a pattern or a rule surfaces during a session. Going forward, a generation pipeline will synthesize conventions directly from accumulated contextual commit history, with provenance back to the commits that produced them. That is not what's running today. Today, accumulation is user-in-the-loop.",[19,357,358],{},"The investment is lightest for a team that's already writing well-structured PRs and commit messages, and heaviest for a team that isn't. For most teams we've talked to, the capture discipline settles into the workflow within about two weeks. The payoff is continuous from day one and compounds from there.",[14,360,362],{"id":361},"try-it","Try it",[19,364,365],{},"The fastest way to decide whether any of this is real is to see it against a codebase. Engraph has been used on its own codebase throughout development, and there's enough accumulated history now to demo each of the use cases above on real commits, not fabricated examples.",[19,367,368],{},"If any of what's described above sounds like work you do daily, reach us and schedule a demo.",{"title":370,"searchDepth":371,"depth":371,"links":372},"",2,[373,374,375,376,377,378,379,386,387,388],{"id":16,"depth":371,"text":17},{"id":27,"depth":371,"text":28},{"id":40,"depth":371,"text":41},{"id":62,"depth":371,"text":63},{"id":75,"depth":371,"text":76},{"id":85,"depth":371,"text":86},{"id":176,"depth":371,"text":177,"children":380},[381,383,384,385],{"id":191,"depth":382,"text":192},3,{"id":227,"depth":382,"text":228},{"id":262,"depth":382,"text":263},{"id":291,"depth":382,"text":292},{"id":317,"depth":371,"text":318},{"id":330,"depth":371,"text":331},{"id":361,"depth":371,"text":362},"2026-04-24","Why agents still fail on the work that matters, and what changes when a codebase remembers","md",null,{},true,"/blog/working-with-agents-in-real-codebases",{"title":6,"description":390},"blog/working-with-agents-in-real-codebases",[399,400,401,402],"context-engineering","agentic-coding","ai-engineering","productivity","hCS6OuuSzs7y0EpmLeaN0rrZ6XdxbfKVo8OIuKlVyHY",1777259530887]