Hiring for Judgment in an AI-Accelerated World

The bar for building has collapsed. Anyone with an AI assistant can produce a prototype, a draft, a working script. That is not the challenge anymore. The challenge is identifying people who can tell whether what was built is right, catch what is wrong, and make the call on what to do about it.

That is a different hire than the one most managers have been trained to make. And the stakes are higher now, because your hiring decision is no longer a baseline. It is a multiplier.

AI Amplifies What Is Already There

The evidence is accumulating that AI does not lift all boats equally.

METR ran a randomized controlled trial in 2025 with 16 experienced open-source developers working on codebases they had averaged five years with. Developers expected AI tools to speed them up by 24%. The measured result was a 19% slowdown. The tasks where AI hurt performance most were the complex, ambiguous ones that required understanding context across a large system. Not coincidentally, those are also the tasks that most resemble real engineering work.

GitClear’s analysis of 211 million lines of code from 2020 through 2024 found code churn nearly doubled over the period, and copy-paste code blocks grew eightfold in 2024 alone. Code is being generated faster than it is being understood.

Addy Osmani, who leads Developer Experience at Google Chrome, has been tracking what he calls the 70% problem: AI can rapidly produce roughly 70% of a solution. The final 30% — edge cases, security, production integration — is where teams get stuck, and where the engineers who understand the system separate from the ones who were just operating the prompt.

The pattern across all three data points is the same. AI amplifies what is already there. Hire someone with strong judgment and a verification reflex, and AI makes them faster without making them careless. Hire someone who accepts what the model produces, and AI makes them generate more confidently while understanding less. The hiring error has always had a cost. Now it compounds.

Breadth Is No Longer Optional in Any Role

In a previous post, I argued that the best product managers are full-stack by design. Not because one person should do everyone’s job, but because product work does not respect org charts. The skills I described there — conducting deep research, writing clear materials, reading and writing basic code, defining and running test plans, designing a reasonable UI and user flow, selling the product internally and externally — are not about replacing specialists. They are about having enough of a working model of adjacent disciplines to know when something does not make sense, to challenge assumptions, and to move things forward without being blocked.

That argument now applies beyond PMs. AI has lowered the cost of acquiring breadth. The skills on that list used to require years of deliberate cross-functional exposure. Now they are more accessible to anyone with the curiosity to reach for them. That means the excuse for not having them is weaker. And the value of hiring someone who does have them is higher, in every role.

A PM who can read a diff and ask a hard question in review is not threatening the engineer. They are protecting the system. An engineer who understands user behavior well enough to push back on a spec is not overstepping. They are strengthening the decision. You are not hiring people to do each other’s jobs. You are hiring people who have enough of a shared language across disciplines that collaboration produces pressure rather than deference.

This is what keeps AI-accelerated teams coherent. When everyone can generate faster, the thing that holds quality is not process. It is a shared capacity to recognize when something is wrong, regardless of whose domain it technically belongs to.

In another post I wrote about where the full-stack builder model works and where it breaks. The argument there was about org design. The argument here is smaller and more practical: hire people with enough range that they are never helpless at the edges of their role, and who are curious enough to keep building that range over time.

The Interview Needs to Test Judgment, Not Output

Most technical interviews were designed to test whether a candidate can produce correct code under pressure. That question is less interesting now. AI can produce correct code for a large class of problems, and candidates know it. The cheating problem is real — interviewing.io data from 2025 shows 81% of Big Tech interviewers have suspected candidates of using AI to cheat, and 31% have definitively caught one.

The more important shift is what this reveals about the old format: if AI can pass the test, the test was testing the wrong thing.

The companies that have moved fastest to redesign their interviews are converging on the same answer. Canva retired its classical CS fundamentals round entirely and replaced it with contextual problems that cannot be solved with a single prompt. They require iterative reasoning, requirement clarification, and decision-making under ambiguity. The bar did not drop. The signal shifted. Meta introduced an AI-enabled coding interview where candidates work with a multi-file codebase and have access to AI tools — and are evaluated not on whether they used the AI, but on whether they understood and could verify what it produced.

There are four signals worth designing your interview around.

Verification reflex. When shown code or an AI output, does the candidate instinctively ask how they would know if it is correct? Meta’s internal evaluation criteria puts it plainly: “Test before using. Don’t prompt your way out of it.” The candidate who runs the code and reads the result is different from the candidate who trusts the suggestion. This tells you something about how they operate under real conditions, not just interview conditions.

Skeptical reading. Can they look at an output — generated or not — and say what is wrong with it? An interviewing.io practitioner described this directly: “I want you to be able to go in and look at the code and say, oh yeah, there’s a line that’s wrong.” This applies to a PM reviewing a generated spec as much as to an engineer reviewing generated logic. The skill is the same. The domain is different.

Curiosity that surfaces as questions, not answers. What does the candidate ask about the system, the codebase, the users? A candidate who has no questions about how your system works is telling you something important. The habit of asking how to be better at something, and going to find someone who can answer it, is one of the strongest leading indicators of long-term growth in any role.

Willingness to disagree, with evidence. Microsoft Research surveyed 319 knowledge workers across 936 real AI use cases and found a clean inverse relationship: the more a person trusted AI, the less critical thinking they applied to its outputs. The more they trusted their own judgment, the more they pushed back. You can test this directly. Give candidates something plausible but wrong. See what they do. The ones who name what is wrong and explain why are the ones who will catch the race condition, the security gap, the spec that looks complete and is not.

Engineers Who Cannot Code Cannot Judge Code

This is worth saying directly, because the drift toward “just use AI” in engineering hiring is real and wrong.

Engineers still need to code. Not because you need more code produced — you don’t, you need less — but because writing code is still the most reliable signal that someone has a genuine model of what happens underneath the prompt.

A 2026 arXiv study found that CS achievement and writing skill are the two strongest predictors of performance in AI-assisted development, and that CS achievement remains a significant predictor even after controlling for general cognitive ability. When the tooling has shifted from typing code to specifying what code should do and verifying it does it, the prerequisite is not typing. It is the conceptual understanding that makes specification and verification possible.

The engineers who can read a race condition in generated code and recognize it as a race condition — not just a test that happened to fail — are the engineers who studied the machine. That knowledge does not become less relevant when you stop writing everything by hand. It becomes the prerequisite for using the tool without being misled by it. As Kelsey Hightower observed, the programs you build still run on hardware, still interact over the same protocols and networks. The foundations did not move. The interface to them changed.

The Pipeline You Cut Today Is the Senior Shortage of 2030

One short note that deserves to be said.

The Stanford Digital Economy Lab’s 2025 study, using ADP payroll data covering over 25 million workers, found early-career developers aged 22 to 25 experienced a 16% relative employment decline in AI-exposed roles. SignalFire data shows new graduate hires at Big Tech now represent just 7% of hires, down from much higher levels before 2022.

The short-term logic is understandable. Seniors are more productive with AI tools. Junior onboarding takes time you do not have. But the pipeline you hollow out today is the senior shortage you will manage in 2030. As I wrote in the post on growing senior engineers in the AI era, the broken ladder problem is real and worsening — and organizations that stop hiring juniors lose the mechanism by which senior engineers are made. Hire juniors with the range, curiosity, and verification instinct you are now hiring seniors for. Give them a deliberate growth path. The judgment you need in five years has to start forming now.

Judgment Compounds. Speed Without It Does Not.

You are not just filling a role. You are setting the epistemic health of your team.

A room full of people who accept AI output is a team that owns systems it cannot reason about. Hire people with range, a verification reflex, and the habit of pushing back — in every discipline, not just engineering — and judgment compounds. The team catches each other’s blind spots. AI makes each person faster without making the collective understanding shallower.

The speed does not outrun the understanding. That is the hire worth making.

AI Amplifies What Is Already There#

Breadth Is No Longer Optional in Any Role#

The Interview Needs to Test Judgment, Not Output#

Engineers Who Cannot Code Cannot Judge Code#

The Pipeline You Cut Today Is the Senior Shortage of 2030#

Judgment Compounds. Speed Without It Does Not.#