A few days ago I kicked off several research tasks in parallel and didn’t check which model was running underneath. All of them defaulted to Claude Opus 4.8. When I looked at the bill, I’d spent $350.
So the question became obvious. Was the output better?
No. I compared the results against similar research I’d run before on cheaper models. No meaningful difference in quality. I still had to verify every source and go deeper on half the areas myself. The expensive model just sounded more confident while doing it.
That’s the exact thing Elena Verna called out this week in “Please Stop the AI Confidence Theater.” Her point: everyone is performing AI mastery while quietly doing the same basic workflows as everyone else. Summarizing Slack. Drafting emails. Running a scheduled scan. Useful, sure. Life-changing, rarely. The gap between the performance and the actual work is where the theater lives.
I agree with her completely. I use AI multiple times a day, every day. And what I see in my own usage, and in the usage of people around me, is not replacement. It’s acceleration of things we were already doing.
The Shutdown Test
Here’s the test I’ve started running on my own work. If someone shut off AI tomorrow, could I still do my job?
Yes. Slower. Like the good old days.
I did research before AI. I spent more time Googling, reading, and summarizing it myself. I wrote PRDs, initiatives, and epics before AI. I just spent longer writing them. I read code, and wrote some, relying on Stack Overflow instead of a chat window. Slower, but the same output eventually landed.
That’s the tell. AI hasn’t handed me a capability I didn’t have. It’s compressed the time between having an idea and having a draft of it. Compression is valuable. It is not the same thing as replacement, and conflating the two is exactly the confidence theater Elena is describing.
The Layoffs Are Coming Back
This isn’t just a personal anecdote. The market is running the same experiment at scale, and the results are coming in.
CNBC reported that Ford, IBM, and Commonwealth Bank of Australia are all rehiring staff they cut in the name of AI. Ford brought back over 350 experienced engineers after its automated quality-control systems couldn’t catch defects without the judgment those engineers used to provide. IBM is tripling entry-level hiring after an AI system built to handle HR requests hit a wall on the 6% of cases involving ethical judgment. Klarna’s CEO already walked back the company’s customer-service cuts after satisfaction scores fell. Research firm Orgvue found that 39% of business leaders made staff redundant because of AI, and 55% of that group now call it a mistake. Robert Half puts the rehire rate at roughly one in three.
These weren’t small companies making a rash call. These were board-level bets that AI could replace judgment, not just accelerate execution. The bet didn’t hold. What came back in every case was the same thing: people who could catch what the system couldn’t, and take responsibility for the call.
Sounding Smart Is Not the Same as Being Right
The $350 bill taught me something specific about where the theater actually lives. It isn’t only in the LinkedIn posts. It’s baked into the models themselves.
AI is extremely good at sounding intelligent. Clean structure, confident tone, plausible citations. None of that is the same as being correct, and the more expensive the model, the more convincingly it can be wrong. I still had to validate sources by hand. I still had to go deeper in the areas that mattered. The model didn’t remove that work. It just made the output look finished before it was.
That gap, between sounding right and being right, is exactly what the market discovered when it tried to run customer service, HR, and quality control without anyone left to check the work.
What Doesn’t Get Automated
AI is a tool. A very good one. Like any tool, it’s worth learning and worth using to extend what you can do. But a tool doesn’t replace the work of the person holding it, and it especially doesn’t replace the parts of the work that were never about execution in the first place: setting direction, deciding what’s worth building, and knowing when the confident-sounding answer is wrong.
That’s the argument underneath everything I’ve been writing about judgment this year. AI made execution cheap. It didn’t touch the scarce part.
I paid $350 to relearn that lesson on a Tuesday. Ford paid a few years and 350 headcount. Same lesson, different invoice.