Delivery Pressure Is the Oldest Threat to Engineering Quality. AI Just Made It Faster.

In a post I wrote last December, I discussed the production triangle: the constraint that governs every production system, including software. You can optimize for two of three dimensions, time, quantity, and quality, but never all three. Push velocity while adding features, and quality absorbs the cost. Every time, without exception.

This constraint is not new. Kahneman mapped the cognitive mechanism in Thinking, Fast and Slow: under time pressure, we disengage System 2, the slow, deliberate reasoning that catches the things that will hurt us later, and default to fast, intuitive pattern-matching. Kuutila et al.’s 2020 systematic review of the software engineering literature confirmed the outcome: increased throughput, decreased quality, and a root cause that almost always traces back to the pressure itself being a product of bad estimation. The triangle is not a theory. It is physics.

What AI changes is the speed at which the physics hits, and how well it hides the damage while it does.

What AI changes is the speed and the concealment

AI coding tools have accelerated output at a scale no previous abstraction matched. The throughput gains are real. So is what comes with them.

Nathen Harvey, one of DORA’s leads, described the dynamic precisely in the 2025 DORA report: “AI is an amplifier. It magnifies the strengths of high-performing organizations and the dysfunctions of struggling ones.”

The Faros AI Engineering Report 2026 calls it Acceleration Whiplash. Apiiro’s enterprise telemetry found that by June 2025, AI-generated code was introducing over 10,000 new security findings per month, a 10x spike in six months. Itay Nussbaum, Apiiro’s product manager, put it simply: AI is fixing the typos but creating the timebombs.

Faros telemetry · 22,000 developers · 4,000+ teams

What AI accelerated

ThroughputPRs per author +20%

Syntax errors in AI codeApiiro telemetry −76%

Logic bugsApiiro telemetry −60%

What AI hides

Incidents per PRFaros telemetry +242%

PR review timeFaros telemetry +441%

PRs with no reviewFaros telemetry +31%

Privilege escalation pathsApiiro telemetry +322%

Architectural design flawsApiiro telemetry +153%

More code is shipping. Fewer decisions are being made about it.

The friction you removed was doing work

Armin Ronacher, creator of Flask, now building professionally with agentic coding tools, gave a talk with Cristina Poncela at the AI Engineer Europe conference in London in April 2026 worth reading in full. The title says everything: “The Friction Is Your Judgment.”

His observation about what happened when agentic tools went mainstream: “The tools didn’t buy us slack, but they raised the baseline expectation.” Early adopters got breathing room. Then the expectation shifted to shipping faster rather than reclaiming time. The slack was consumed before teams could use it for the judgment work the tools were supposed to support.

His framing of the structural problem is blunter: “Now every engineer has an agent amplifying their output. But responsibility still rests on the human, and it hasn’t been amplified. You are outnumbered by entities that cannot carry responsibility. Code reviews are increasingly skipped or rubber-stamped.”

The moments when people feel most confident skipping the judgment work, when the AI is fast, the deadline is real, and the friction feels like obstruction, are exactly the moments when judgment matters most. Ronacher: “The moments where you want to skip thinking are exactly the moments where thinking matters most.”

This is not a discipline problem. It is a systems design problem.

What erodes first, and why

When delivery pressure spikes, organizations don’t decide to stop doing judgment work. They just start skipping it, one step at a time.

Gilley and Godek’s research on organizational change describes the pattern as the organizational immune system inverting. Normally, antibodies attack change. Under acute delivery pressure, they attack the protective rituals instead, review, deliberation, documentation, design discussion, because those are what visibly slow things down. The body attacks the organ that protects it.

Clayton Christensen described the economic mechanism in The Capitalist’s Dilemma: under pressure, executives cut long-horizon work first, because short-term metrics never reward it. The judgment layer looks like overhead in a compressed sprint. It has no line item. It produces no ticket. It does not appear in the velocity chart.

So it goes.

The early signals, before the pain becomes visible:

PRs merging with no review, rising
Change-failure rate creeping up while deployment frequency rises
PR review time climbing, signaling review is a bottleneck, not a gate
Incidents per PR increasing while throughput metrics look healthy

When those numbers move together, Acceleration Whiplash is already underway.

What to automate and what to protect

The answer is not to slow everything down. It is to be deliberate about what you automate and what you keep in human hands.

A significant portion of quality work can and should be automated. Low-friction, high-value, well-documented checks are the right candidates: OWASP vulnerability scans, static analysis, dependency audits, known anti-pattern detection, security policy enforcement, and licensing checks. This work is done inconsistently under pressure because it depends on individual attention. Automation makes it consistent precisely when humans are most likely to skip it.

This is the Gawande model applied to engineering. Atul Gawande’s surgical safety checklist reduced major complications by more than a third and cut inpatient death rates by nearly half, not by replacing surgeon judgment, but by protecting the steps skilled people miss under pressure. The design principle transfers directly: a checklist for AI-generated changes should be short, cover only what skilled engineers actually skip under pressure, and include both task checks and communication checks. A long checklist gets skipped under pressure. That defeats the purpose.

Protect review capacity with the same deliberateness. Production power has scaled 5 to 10x with AI agents. Review capacity has not. Mandate small PRs, route mechanical review to tooling, and reserve human attention for what tooling cannot evaluate. If PR size or review time is climbing sharply, review capacity is already the binding constraint.

What judgment cannot be delegated

Two categories must stay with humans, not because AI won’t generate confident output in both, but because the consequences of a wrong call are irreversible and the AI carries no accountability for them.

Product Judgment. Defining the exact user problem. Evaluating competing interpretations of ambiguous requirements. Setting boundary intent, what this feature is allowed to do and what it explicitly is not. Deciding which trade-offs align with business and user reality. These decisions are not derivable from a codebase or a prompt. They require context that lives in human heads and organizational history.

Engineering Judgment. Evaluating system architecture for long-term maintainability. Assessing security configurations against actual threat models, not documented best practices. Evaluating production consequences, failure modes, cascade risks, degradation paths. Deciding whether a new dependency is worth its exposure surface. These are the high-blast-radius changes Ronacher identified as requiring mandatory human sign-off: database migrations, authentication and permissions changes, new dependencies, backward-incompatible API changes, irreversible destructive operations.

The failure mode is not that AI makes these decisions badly. It is that when delivery pressure is high and the AI has already produced a complete, coherent implementation, the human in the middle stops making the decision at all. The output exists. The review becomes a formality. The judgment never happens.

Build the gates into the workflow, not the culture

Charity Majors, CTO of Honeycomb, put the organizational reality directly: “You cannot make tens of thousands of engineers reliably ‘slow down and check the work’ through good intentions and a memo. At scale, keeping a human in the loop has to be enforced by architecture, not good intentions.”

Under pressure, people skip rather than forget. Norms require discretionary attention. Delivery pressure is precisely the condition that eliminates it. The structural version of this looks like four things.

Gates before generation, not after. Require a reviewed plan before code generation begins on any high-blast-radius change. The judgment decision happens while options are still open, not after a complete implementation already exists and the path of least resistance is to approve it. Lint rules and pre-commit hooks that force design decisions encode judgment requirements into the workflow rather than leaving them to individual discretion under pressure.

Routing by consequence, not by convention. Mechanical issues, style, known anti-patterns, clear rule violations, go to automated review. Changes touching migrations, authentication, permissions, new dependencies, or irreversible operations require human sign-off, with no exceptions and no override when the sprint is on fire. The category determines the path, not the manager’s judgment in the moment. This is the line between what you automate and what you protect, made structural rather than cultural.

Measurement that tracks the right thing. Velocity metrics, PRs merged, features shipped, story points closed, inflate under AI acceleration while hiding the judgment debt accumulating underneath. The metrics that reveal actual system health are incidents per PR, rework rate, change-failure rate, and mean time to recovery. Throughput rising while those metrics deteriorate is not improvement. It is Acceleration Whiplash in progress, and treating it as success is how organizations end up months deep in a hole they cannot see from the dashboard.

A speak-up norm with teeth. Crew Resource Management emerged from the 1977 Tenerife disaster: 583 people died from a communication and hierarchy failure, not a mechanical one. CRM’s central principle is that stopping the plane under operational pressure must be safe and expected, not an act of individual courage. The engineering equivalent: a mid-level engineer blocking a senior’s AI-generated PR on a high-blast-radius change should be unremarkable. If it requires courage, the culture has already failed the structure.

The pressure will not stop. The gates have to hold.

In Judgment-Driven Development, judgment is not a value. It is a scarce resource that requires deliberate protection, especially when the conditions that erode it: speed, volume, confidence, and pressure, are exactly the conditions AI creates.

The triangle has not changed. You still only get two. AI has made the third corner cheaper to ignore and more expensive to lose.

The organizations that get this right will not be the ones that ship the most. They will be the ones who, under pressure, know which decisions the AI should make and which a human must own, and have built their workflows so that distinction holds even when everything is on fire.

Faster is only better if it stays that way.

What AI changes is the speed and the concealment#

The friction you removed was doing work#

What erodes first, and why#

What to automate and what to protect#

What judgment cannot be delegated#

Build the gates into the workflow, not the culture#

The pressure will not stop. The gates have to hold.#