In 2012, I was given a project at Dalet that I did not fully understand at the time: move the entire company from waterfall to Agile. We were a mid-size product company with engineering teams spread across time zones, and waterfall was doing what waterfall always does at that scale, creating the illusion of control while real problems accumulated in the gaps between phases.
The transition worked. And the productivity gains were not incremental. Engineers who had spent months waiting for approval to write code were suddenly shipping every two weeks. The business could see working software, give real feedback, and change direction without blowing up a six-month plan. It felt like the industry had discovered something fundamental.
But something else happened alongside the productivity gain: the speed exposed new risks. When you ship every two weeks instead of every six months, the cost of building the wrong thing at the wrong quality compounds faster. So the industry invented mechanisms to control that: sprint planning to align on what matters, daily standups to surface blockers before they bury a sprint, reviews to validate with stakeholders, retrospectives to correct course. The ceremonies were not bureaucratic. They were the infrastructure that made fast delivery safe.
Some organizations took that logic further than necessary; SAFe ceremonies can make a two-hour sprint planning feel like a constitutional convention. But the underlying instinct was right. Speed without a decision layer is not velocity. It is drift.
We are in that moment again.
The 2025 DORA report, the largest annual study of software delivery performance, drawing on nearly 5,000 technology professionals, found something that should be on every engineering leader’s wall: AI adoption increased throughput and increased instability simultaneously. Teams are shipping more. Teams are also experiencing more change failures, more rework, and longer recovery times when things go wrong.
This is not a coincidence. It is a structural problem.
AI accelerated execution. The decision infrastructure did not move. The gap between the two is where instability lives.
Sprint ceremonies are the decision infrastructure. Most teams have not touched them since AI arrived.
Ceremonies Were Never About Controlling Engineers
The standup, the planning session, the review, the retro, none of these were invented to slow engineers down or create reporting obligations. They were invented to control risk. To create regular moments where a team stops, inspects what it actually built, and decides what to do next.
The timebox is not the point. The inspection is.
In a world of waterfall, there was no inspection cadence at all. Problems accumulated for months before anyone saw them. Agile fixed that by shortening the cycle: two weeks of work, then stop and look. In that world, the natural friction of building things meant that time was needed, misunderstandings surfaced during implementation, and developers had to reason through every line, which meant problems emerged organically before the ceremony even started. The ceremony caught the stragglers.
AI removed most of that natural friction. A feature that used to take three days now takes three hours. The developer accepts the AI’s output, the tests pass, and the PR is opened. The ceremony arrives, and everything looks fine.
The problem is not in what the ceremony sees. It is in what the ceremony no longer has the mechanism to surface.
Sprint Planning: Add the Decision Layer
Standard sprint planning answers two questions: what will we build, and can we do it in two weeks.
JDD planning adds a third: what judgment calls are embedded in this work, and who owns them before we start?
Three concrete changes:
Spikes become first-class artifacts, not exceptions. A spike is a time-boxed investigation whose output is knowledge, not code. Instead of building a feature directly, the team allocates a fixed period, usually half a day to a day, to answer a specific question: Is this architectural approach compatible with our existing system? Will this third-party service handle our edge cases? What are the tradeoffs between these two implementation paths? The output is either a decision document or a proof of concept. The commit comes after.
In standard Agile, spikes are used sparingly, usually when something is genuinely unknown. In JDD planning, any story where AI will generate significant implementation should first be evaluated to determine whether a spike is warranted. The AI will produce something. The question is whether the team has decided what it should produce before it does.
Context is mandatory. Engineers need to know why a story matters, not just what it asks for. “Build the dashboard” and “build the dashboard because our largest enterprise customer is threatening to churn without it” are the same story with different stakes. The stakes change every time the engineer makes a tradeoff during implementation: what to cut if time runs short, how much validation to add, how aggressively to optimize. Withholding context does not protect the engineer from distraction. It guarantees they will make decisions with incomplete information.
Name the live architectural decisions before the sprint begins. Which decisions will this sprint’s work require making? Which of those needs an ADR?
An Architecture Decision Record (ADR) is a short document that captures a single architectural decision, its context, the alternatives considered, and the reasoning behind the choice. The concept was formalized by Michael Nygard in 2011 and has become standard practice at organizations serious about long-term codebase health. An ADR is not a design document. It does not describe how something works. It answers one question: why did we make this choice, and what were we trading off?
In a JDD sprint, the question is not “did we write ADRs?” It is “which decisions will be made by default this sprint if we do not name them?” Because AI will make them. It will choose a pattern, a dependency, and an approach, and unless someone has decided what the right choice is, the AI’s choice becomes the codebase’s choice. Not because anyone approved it, but because nobody noticed it happening.
Daily Standup: Surface What AI Is Hiding
Standard standup: what did I do, what will I do, what is blocking me.
The problem is the first question. “I finished the feature” used to carry implicit information: the developer reasoned through the implementation, encountered obstacles, made decisions, and arrived at a working result. They understood what they built because the friction of building it forced them to understand.
Today, “I finished the feature” may mean the developer accepted 400 lines of AI output, ran the tests, and opened a PR. The feature is done. The developer may not be able to explain three key decisions embedded in that code.
The standup has no mechanism to surface that. It counts the feature as done because the feature is done.
JDD adds one rotating prompt to the standup rhythm, not every day, but as a regular part of the weekly cadence: “Where did judgment get applied, and where did it get deferred?”
This is not a confession or a performance review checkpoint. It is an early warning system. Code that nobody understands is a risk that compounds with every commit built on top of it. A junior engineer’s unreviewed AI output from Tuesday becomes the foundation for a senior engineer’s feature on Thursday. By Friday, the thing nobody understood is load-bearing.
Surface it while there is still time to own it.
The second standup change follows from the first: AI-generated code that has not been owned gets flagged before the next commit layers on top of it. The flag does not mean the code is wrong. It means a brief pair review happens before it becomes infrastructure. Not to blame. For continuity of understanding.
Sprint Review: Make the Decision Audit Visible
Standard sprint review: demo what was built, collect stakeholder feedback, inform the backlog.
JDD adds a second layer. It runs either before the stakeholder demo or immediately after, between the team and the engineering lead. Fifteen minutes. Three questions:
- What significant architectural or product decisions were made this sprint?
- Were they logged, or were they made by default?
- Did any AI-generated output introduce patterns that were not deliberately chosen?
This is where ADRs get reviewed. Not as a bureaucratic checkbox but as a forcing function. If a decision was made and not recorded, the sprint review makes that visible. Over time, that visibility changes behavior in planning; teams start making decisions upfront because they know the review will surface them if they do not.
The deeper purpose of this layer is not just cataloging what the AI produced. It is understanding why. Why did the implementation take the shape it took? Why were certain commits deferred? Why did this approach get chosen over the alternative? In a world where AI makes choices continuously and the developer often accepts them without deliberation, the sprint review is the moment where the team reconstructs the decision trail and asks whether it reflects what they actually intended.
This is the difference between a codebase your team understands and one they are managing by feel.
Retrospective: Calibrate Judgment, Not Just Delivery
Standard retrospective: what went well, what did not, what do we change. The classic format produces three lists, start doing, stop doing, continue doing, and the outputs are process changes: update a definition of done, add a step to the PR template, fix a recurring coordination problem.
That structure does not change in JDD. The outputs of a JDD retro still land in those three categories. What changes is what the team is examining.
Standard retros surface process friction and interpersonal dynamics. JDD retros add a third examination: judgment quality.
Two questions that belong in every JDD retro:
- Where did delivery pressure cause us to skip a decision we should have documented?
- Where did we defer judgment to AI output without checking the reasoning?
The first question is about the organizational immune system under pressure. The second is specific to AI-assisted development and has no equivalent in a pre-AI retro format.
The outputs look familiar, but the content shifts. “Start doing” might mean introducing a spike requirement for stories above a certain complexity threshold. “Stop doing” might mean accepting AI-generated architectural changes without a brief ADR. “Improve” might mean tightening the specification format that gets fed to the AI before implementation starts. The guardrails are added, updated, or removed here, at the team level, based on what the team actually experienced.
The Judgment Log, covered in the next post, is the artifact that makes this examination possible. Without it, the retro is working from memory and impression. With it, the team is reviewing a record of actual decisions, deferred decisions, and decisions that were made by default. The difference among those three categories is where the calibration happens.
This is how a team learns to trust its own judgment over time rather than progressively outsourcing it. The retro is not the place where judgment gets exercised. It is the place where the team gets better at exercising it.
What Does Not Change
Sprint length: unchanged. Ceremony timeboxes: unchanged. Scrum roles: unchanged. Backlog management: unchanged in structure.
The ceremonies are not longer, but the questions inside them are different.
This is the objection worth addressing directly: we do not have time for more process. JDD does not add a process. It redirects the time you are already spending toward the thing that AI has made newly consequential. The standup that used to surface blockers now also surfaces deferred judgment. The review that used to demo features now also audits decisions. The retro that used to fix process friction now also calibrates the quality of judgment.
At the same time. Different focus.
The Instability Is Not in the AI
The DORA finding is precise: AI increases throughput and instability simultaneously. The instability does not come from the AI making mistakes. It comes from the decision infrastructure failing to keep pace with execution speed.
The companies that will move fastest with AI over the next two years are not the ones that gave engineers the best tools. They are the ones who rebuilt their decision infrastructure to match the speed of those tools.
Sprint ceremonies are where that happens. They were adapted once to make fast delivery safe. They need to be adapted again for the same reason.
The next post covers the Judgment Log: the artifact that makes the ceremony changes stick, and the difference between a team that documents its decisions and one that is constantly relearning the same lessons.