Working with AI to generate code is extremely satisfying. In a matter of minutes, you get something that looks great and, in most cases, does what you wanted it to do and even more. But many times, what looks ready for production is far from being production-safe.

A large-scale study conducted by two researchers at FernUniversität in Hagen analyzed 7,703 files from public GitHub repositories explicitly attributed to AI tools. Using CodeQL, the researchers identified 4,241 CWE instances across 77 different vulnerability types. While 87.9% of the analyzed AI-generated code contained no identifiable CWE-mapped vulnerabilities, the risk came from code that appeared to work fine. It compiled, it solved the visible task, but it still carried hidden assumptions, unsafe patterns, and security debt.

Security is only one example. The deeper issue shows up even when the task is not obviously about security. In mature systems, experienced engineers are not just converting requirements into code. They are constantly asking whether the change fits the system.

METR, a non-profit organization that evaluates AI models to help understand AI capabilities and the risks they pose, conducted a randomized controlled trial with 16 experienced open-source developers on mature repositories they knew well. Across 246 tasks, developers expected AI tools to make them 24% faster. After the study, they still believed AI had made them about 20% faster. But the measured result went the other way: when AI tools were allowed, developers took 19% longer.

AI suggestions were often directionally useful, but developers spent extra time reviewing and correcting them. That is exactly the senior engineer’s reality. AI can reduce the friction of producing code while increasing the work required to inspect, correct, integrate, and trust it.

Senior engineers know the system’s history

AI sees the repository. Senior engineers remember the outage, the failed migration, the customer-specific edge case, the dependency that looked harmless and became a nightmare.

This is not nostalgia. It is risk memory.

Senior engineers know where the bodies are buried

Every real system has unofficial rules written in blood:

  • Never release on the last day of the week.
  • Do not touch this service without talking to payments.
  • This functionality exists because of one enterprise customer.
  • This table looks redundant, but it feeds billing reconciliation.
  • This retry logic looks ugly, but it prevents duplicate transactions.
  • This “temporary” workaround is now part of the contract.

AI may infer some of this from code. Senior engineers know which parts are accidental and which parts are load-bearing.

Senior engineers understand the blast radius

How many times did you see a small, insignificant change on microservice A cause a catastrophe on a completely different system a week later? AI can make a local change. A senior engineer has seen that more than once. This is why they know to ask:

  • What else depends on this?
  • What happens under load?
  • What happens during rollback?
  • What happens when the input is malformed?
  • What happens when this runs for three years?
  • What happens when a customer, attacker, or junior engineer uses it differently than expected?

That is judgment. Not coding speed.

Senior engineers know when not to build

The AI goal is to please the user, even if you tell it to question you. Give it a task and it tries to satisfy it. It will never tell you that the requirement is wrong, that it will create operational debt, or that this can be solved by configuration. An experienced developer will. An experienced developer will know to be blunt and tell you the truth, even if you won’t like it.

Senior engineers know what quality means in context

Quality has layers; it is never one single thing. Sometimes quality means latency. Sometimes reliability. Sometimes auditability. Sometimes reversibility. Sometimes, boring code that everyone can maintain.

AI does not naturally know which quality dimension matters most unless humans make that judgment explicit.

How senior engineers should work with AI

Senior engineers should not just “use AI.” They should design how AI works: what it can do, what it should do, and what it must not do.

That breaks into three layers that build on each other. Context shapes what AI sees. Skills shape how AI works. Gates shape where humans intervene before anything ships.

Context packs: shape the environment, not just the prompt

The senior engineer’s job is no longer only to review AI output. It is to shape the environment the AI works in. A context pack encodes the system judgment that an AI cannot infer from the repository alone: the architecture principles, the service boundaries, the known dangerous areas, the rollback requirements, the “do not touch without review” zones.

The test of a good context pack is simple. If a new senior engineer joined the team tomorrow, would they read this and understand what the code alone would never tell them? If yes, the AI has it too. If no, you are still operating on tribal knowledge that only exists in the heads of the people who were there.

AI skills: reusable judgment, not per-prompt instruction

Prompts are disposable. Skills are infrastructure. A skill is reusable judgment scaffolding: a named, versioned set of checks the AI runs every time it touches a certain kind of change.

Two examples worth building first:

A production readiness skill that refuses to mark a change complete until logging, metrics, alerts, timeouts, retries, rollback path, and feature flag coverage are all accounted for. Not “the AI should think about observability.” The AI cannot ship the change without answering those questions.

A migration review skill that checks compatibility, staged rollout, data migration risk, and fallback paths. Every time. Because migrations are where systems die, and the senior who has been burned by a migration knows what the junior and the AI both miss.

Skills like these turn seniority into something the team can reuse instead of something that walks out the door at 6 pm.

Human-in-the-loop gates: defined decisions, named owners

“A human glances at the diff” is not a gate. A gate is a defined decision point with a named owner and a clear bar the change has to clear.

Gates should cluster around the changes where rollback is hard or the blast radius is large:

  • Data integrity: schema changes, data model migrations, deletions
  • Security posture: auth changes, secrets, permissions boundaries
  • Customer-facing contracts: API changes, SLA-affecting changes, billing, privacy
  • Hard-to-reverse changes: infrastructure, cross-service contracts, anything in a poorly understood legacy area

Everything else, AI and the standard review process can handle. But these categories cannot move forward without accountable human judgment. That is not process overhead. That is the organization deciding, on purpose, where it wants its seniors spending their attention.

The risk nobody is talking about

When execution gets cheaper, judgment becomes the bottleneck. Organizations that figure this out will build the context, the skills, and the gates that let AI produce at speed without producing risk at speed. Organizations that do not will ship faster for a while, and then discover that the seniors who used to catch this stuff have either left or stopped trying, the juniors never learned to catch it because AI did the work they would have learned from, and the code quality is silently degrading because nobody is shaping the environment anymore.

The senior engineer is not becoming obsolete. The senior engineer is becoming the person who decides what “good” means, where the system can fail, and who gets to say no.

That role is more consequential now, not less.