The Complacency Gap: What Switching AI Coding Tools Taught Me About Vigilance
Switching from Claude Code to Codex felt like cognitive load returning — slower, less comfortable, more effortful. Then I realized that load was not a cost. It was the work. A reflection on automation complacency, context decay, and what the right relationship with AI coding tools actually looks like.
A few weeks ago I switched from Claude Code to Codex for a project. Not because either was broken. Just to try something different.
The first thing I noticed was that I was more careful. I read outputs more closely. I questioned what the model was doing instead of approving it. I felt the friction of not knowing what to expect next.
My first instinct was to frame this as overhead — cognitive load I would eventually shed as I got comfortable with the new tool. Then I caught myself and thought: no. That load is exactly what I should have been carrying all along.
Automation Complacency Is a Real Thing
There is a well-documented phenomenon in aviation and surgical robotics called automation complacency. The more reliable a system, the less actively a human monitors it. The cognitive load does not disappear — it stops being applied. The operator is present but not watching.
This is not carelessness. It is biology. Sustained vigilance is expensive. When a system rarely fails, the brain down-regulates the monitoring effort. This is normal. It is also dangerous in high-stakes contexts.
AI coding tools are reliable enough, and pleasant enough to work with, that the same effect kicks in. You stop reviewing outputs and start approving them. The distinction sounds minor. It is not.
“The cognitive load did not feel like overhead. It felt like the work coming back.”
What Actually Gets Lost
When I examined what complacency had cost me with Claude Code, it was not code quality — at least not obviously. The outputs were still mostly correct. What degraded was something subtler:
First: triage. I stopped knowing which parts of an output deserved scrutiny. Security-sensitive logic, architectural decisions, third-party integrations — these got the same shallow pass as boilerplate. The calibration between high-stakes and low-stakes disappeared.
Second: context. Code was accumulating without a trail of why. Decisions the model made were not documented. Tradeoffs were not surfaced. Six months later, neither I nor anyone else could reconstruct the reasoning behind a non-obvious choice.
Third: uniform vigilance. Everything started getting rubber-stamped. No triage, no questions, just approval.
All three collapsed together, and each made the others worse.
Competence Becomes the Enemy of Oversight
Here is the uncomfortable part: the complacency gets worse as the tool gets better. A junior developer working with a weak model stays alert because the output requires correction. A senior developer working with a capable model gets complacent precisely because the output rarely needs correction. The tool earns your trust and that trust becomes the problem.
The Codex switch was a temporary reset. I reinstated a junior-developer posture — not because the model was less capable, but because I did not yet know what it would do. The unfamiliarity was doing real epistemic work.
But that vigilance was contingent on novelty. Once Codex becomes as familiar as Claude Code, the complacency returns. The tool does not sustain the discipline. The unfamiliarity does. And unfamiliarity is a depletable resource.
“Competence becomes the enemy of oversight. The better the tool, the harder it is to keep watching.”
The Shared Gap Neither Side Honors
The deeper issue is that context preservation is a shared responsibility — and neither side currently honors it.
The developer trusts the model to produce good output, so they stop documenting the decisions behind it. The model is optimized to produce output that looks decided, so it does not flag uncertainty or narrate the tradeoffs it made. Neither side is lying. But the context that would make the codebase understandable six months from now falls through the gap between them.
This gap widens as the tool improves. The better the model, the more the developer is incentivized to trust it. The more trusted it is, the less the context gets captured. The code is correct. The why is gone.
What the Ideal Tool Would Do Differently
If you designed an AI coding tool with this in mind, it would look different from what exists today. Not in the model, but in the interface and the contract.
It would surface decisions, not just code — narrating what tradeoffs it made and what it assumed, for every non-trivial output. Not as a log no one reads, but as a first-class artifact alongside the code.
It would flag its own uncertainty explicitly. Not confidence scores — those are too abstract. Specific markers: "I was not sure about this authorization logic. You should verify it." Force the developer to own the uncertain parts rather than silently inheriting them.
It would require intent before generation. Not always — boilerplate should stay fast. But for architectural decisions, it would push back: what are you trying to accomplish, and why this way? Context capture at the entry point, not as an afterthought.
It would run periodic audits: here is what I think we have built and why. Is this still accurate? A forced changelog, surfaced automatically.
None of these exist as designed behaviors in any current tool. They are all on the developer to impose externally. Most do not.
What You Can Do Right Now
The honest answer is that no workflow change fully solves the biology. Any system you engineer can be overridden. Discipline fades. Tool rotation helps temporarily but you adapt. There is no clean solution.
What there is: practices with low activation energy and high signal cost if skipped.
Three practices that hold up in daily use
Ask: "What would confuse a new developer in this diff?" If you cannot name anything, you have probably stopped seeing the codebase — not that there is nothing to see. This is a thinking prompt, not a form. It forces active observation rather than passive approval.
Four sections: a structural diagram of what exists, a log of why key decisions were made and what was rejected, a list of known unknowns, and a short narrative of how to think about the system. Update one section before every PR merge. Not a ceremony. A destination for the stranger question.
When an AI handles something you did not fully follow, drop a // ? comment inline — not a TODO, just a marker. Before any PR, grep for // ? and decide: understand it or remove it. Do not let the uncertainty become invisible.
- Surface decisions, not just code — narrate tradeoffs and assumptions for every non-trivial output
- Flag its own uncertainty explicitly, not as confidence scores but as specific markers ("I was not sure about this authorization logic")
- Require intent before generation for architectural decisions — capture context at the entry point
- Run periodic context audits: "Here is what I think we have built and why. Is this still accurate?"
- Make the shared contract legible — both sides visible, both sides accountable
Need this kind of thinking applied to your product?
We build AI agents, full-stack platforms, and engineering systems. Same depth, applied to your problem.
Enjoyed this? Get the weekly digest.
Research highlights and AI news, delivered every Thursday. No spam.
Related articles

How to Write a CLAUDE.md That Actually Works (And Stop Your AI Assistant From Guessing)

Vibe Coding Is Real. Production Is Not.

Who Is Winning the AI Race Right Now (Week of April 15, 2026)

Linux Just Settled Its AI Code Debate — Here's What It Actually Means
