Avi Cavale at AI development platform Quarterback argues that autonomous agents fail because they have amnesia

I’ve been watching the autonomous coding agent demos with a mixture of fascination and frustration.
The demos are impressive. The agent picks up a GitHub issue, reads the codebase, writes the code, runs the tests, opens a PR. Twenty minutes of work automated. The future of software engineering.
Then you try it on your codebase. And it goes off the rails. Not because the model is stupid — it’s clearly capable. But it makes decisions that are technically valid and organisationally wrong. It uses offset pagination when your team standardised on cursor-based. It introduces a dependency that caused a cascading failure last quarter. It takes the same approach that someone already tried and abandoned because it broke the mobile API.
The industry’s diagnosis: models aren’t capable enough yet. Wait for the next generation. More reasoning. More capability. More intelligence.
I think the diagnosis is wrong.
The talented contractor on day one
Here’s how I think about it: an autonomous agent is a talented contractor who shows up on their first day with no context about your organisation. They can code. They’re smart. They can reason about complex problems.
But they don’t know your conventions. They don’t know your prior decisions. They don’t know the bugs you’ve already found and fixed. They don’t know the approaches that have been tried and failed. They’re starting from the same zero that a fresh Claude Code session starts from — which is to say, they know nothing except what they can read from the files.
A human contractor in this situation would ask questions. "What pagination approach do you use?" "Has anyone worked on this before?" "Are there any known issues with this module?"
The autonomous agent can’t ask these questions because it doesn’t know what it doesn’t know. And more importantly, even if it could ask, there’s no organisational knowledge base for it to consult. The knowledge exists only in people’s heads and stale documents.
More reasoning doesn’t fix this
The industry’s response is to give models more reasoning capability. Chain of thought. Planning steps. Self-reflection. These help the model think more carefully about the code it can see.
But they don’t help it know things it has never been told.
No amount of reasoning will help an agent discover that your team has a specific convention for retry logic. No chain of thought will reveal that a previous attempt at this fix was abandoned for a specific reason. No self-reflection will surface the known error pattern that makes one approach dangerous.
Reasoning operates on available information. If the information isn’t in context, better reasoning just generates more confident wrong answers.
What agents really need
After spending a lot of time on this, I’ve landed on a pretty simple conclusion: autonomous agents need memory. Not "memory" in the LLM sense of a bigger context window. Actual organisational memory:
Rules that are enforced, not suggested. Not a config file someone might forget to update. Mandatory constraints injected into every autonomous session at the infrastructure level. "All API changes must be backwards-compatible." "Tests must cover the error path." These should be there whether or not the person who set up the automation remembered to include them.
Decisions that are respected. The team has already made choices. Cursor-based pagination. Event sourcing for audit. Idempotency keys for payments. An agent that doesn’t know these decisions will r-decide them — probably differently — and create inconsistency across the codebase.
Error patterns that are avoided. Your codebase has known pitfalls. The N+1 query problem in the ORM. The race condition in the queue consumer. An agent with this knowledge avoids them proactively. One without stumbles into them and wastes its execution budget debugging something that was already understood and solved.
History of prior work. What happened last time someone worked on this? Did a similar task succeed? Fail? Get abandoned? The agent should know before it starts, not discover halfway through that it’s repeating a failed approach.
The reliability gap isn’t about intelligence
This is the thing that keeps nagging me. The demos work because the demo tasks are self-contained. "Build a to-do app." "Add a dark mode toggle." These require no organisational context. The code is the entire story.
Production tasks are different. They exist in a web of prior decisions, team conventions, known constraints, and historical context. The code is maybe 30% of what you need to know. The other 70% is why the code is the way it is.
That’s why autonomous agent reliability drops off a cliff when you move from demos to real codebases. The model is capable of writing the code. It’s incapable of knowing the context that makes the code correct for your organisation.
And every model improvement — more parameters, better reasoning, higher capability — improves the 30% (code quality) without touching the 70% (organisational context). The gap between demo performance and production performance persists, regardless of how good the model gets.
The infrastructure nobody’s building
The autonomous agent market is focused on model capability, tool integration, and orchestration. Better models. Better tools. Better chains.
Almost nobody is building the knowledge layer that makes all three effective. And I think that’s the bottleneck. A capable model without organisational knowledge is a talented contractor on their first day. They can write code. They can’t write the right code for your organisation without the context that makes "right" meaningful.
The agent frameworks that will win aren’t the ones with the most tools or the most sophisticated reasoning. They’re the ones where the autonomous agent, on its hundredth task, is dramatically more reliable than on its first — because it has accumulated the organisational knowledge that makes its decisions correct.
That’s not a model problem. It’s a memory problem.
Avi Cavale is the founder of Quarterback, the AI development platform that learns how your team builds
Main image courtesy of iStockPhoto.com and napong rattanaraktiya

© 2025, Lyonsdown Limited. Business Reporter® is a registered trademark of Lyonsdown Ltd. VAT registration number: 830519543