Week 5: The retrospective build caught what the post-mortem missed

Daniel Nikulshyn · CEO & Founder · Thu, 14 May 2026 07:42:06 GMT

Two more experiments delivered. One worked better than I expected; the other failed in a way that made the failure itself the lesson.

## The retrospective build

We picked the most recent incident worth re-litigating — a degradation a fortnight ago in our auth flow that took two engineers four hours to find. The post-mortem doc was solid. Five contributors, ordered timeline, three action items. Standard stuff.

The build challenge asked something different: rebuild the timeline inside Minecraft, with the people who lived through it, then walk anyone else through it as the incident unfolds.

Three people spent an hour and a half on it. The room layout that emerged was not the room layout I'd have predicted from the post-mortem.

In the doc, the auth bug surfaced in the third paragraph and got the most words. In the in-game build, the auth bug was a small side-room. The big space — and the room everyone kept walking back to — was the monitoring delay. The eight-minute gap between the first failing request and the first alert was the architectural decision the incident exposed, not the bug itself.

We had typed it in the post-mortem. We had not felt it. The physical version made the room you cared about the room you couldn't avoid walking through.

The architecture proposal that came out of that session is now in .tasks/: shave the alert delay from eight minutes to ninety seconds. The auth bug is fixed. The alert delay is the real fix.

"We wrote three action items and missed the obvious one. Walking the timeline did what reading it didn't." — one of the

engineers from the incident

## The "explain it to me" challenge

This one didn't go the way I hoped.

Five engineers each picked a piece of code they own, built an in-game model of it, and stepped back while someone else tried to articulate what it does without being told. Total time: two hours.

Three out of five worked beautifully. The model was clear, the watcher narrated the flow in under two minutes, and the conversation that followed was the kind of conversation no PR review ever produces.

Two of five didn't work at all. The code was a queue processor and a feature flag system — both essentially state machines with too many edges. The Minecraft model collapsed into a tangle of redstone the watcher couldn't read. We ended up explaining the build with words, which is the thing the exercise was supposed to replace.

I should have predicted this. The pattern that's emerged: physical metaphors work for code with shape — flows, layouts, sequences. They break for code with state — anything where the meaning depends on what happened five interactions ago. Forty minutes of building can't compress what a stack trace can.

That's not a failure of the experiment. That's the boundary of the experiment, which is the more useful thing to know.

## What went sideways

One specific thing this week. We had a host on rotation for the first time after the week-4 incident, and the host was conflict-averse enough that a low-grade argument about base layout went on twenty minutes longer than it should have. No drama, no harm done, but the lesson was clearer than the original problem: a host who can't enforce doesn't help.

We rotated hosts again the next session and added a one-line note to the host playbook: "If you can't stop a session, you're not the host."

Small, but the kind of thing you only learn by doing it badly once.

## The numbers so far

🗓️ Days active (cumulative): 35 👥 Developers participating: 9 / 9 💡 Project-related ideas captured: 49 (+8 this week) 🛣️ Ideas promoted to the roadmap: 7 🔁 Architecture proposals from the build challenge: 2 (the partner onboarding hub, the monitoring-delay fix) 🕐 Average session length: ~1.4 hours 🎙️ Debrief participation rate: 91% — engineers asking for it rather than us scheduling it 😴 Self-reported energy after sessions: 7.6 / 10 (recovering after the week-4 dip)

The number I keep watching is the energy score. Week 2 was 8.4, week 4 was 7.2, week 5 is 7.6. The week-4 dip was real and the bounce-back is small. We're at the edge of where this should run as an open-ended thing without a planned wind-down. The week-6 close is right.

## The replicability question, again

Last week I wrote that the real precondition is trust — that the game doesn't fix a team that won't be slightly silly together.

Week 5 made me add a second precondition: a willingness to publish your own boundary. The "explain it to me" challenge taught us something only because we let two of five attempts fail in public, in front of each other. Five out of five would have been a marketing post. Three out of five with a clear hypothesis about why the other two failed is a thing other people can build on.

A team that can't tolerate showing its own seams won't get the second-order learning here. That's a different precondition than trust, and it's the one I want to write into the playbook if we publish one.

## What's next

Week 6 is the close. Three things on the docket:

An exit interview round — every developer, ten minutes, written notes. Specifically: what would you do again, what

would you skip, would you recommend this to a friend's team if they asked.

The playbook write-up decision. If the exit interviews trend positive enough, we publish. If they're mixed, we

publish the mixed picture, which I'm starting to think is the more valuable version anyway.

A "what we'd do differently from day one" doc, drafted by the team in a single session, posted internally first. If

it survives a week of being argued with internally, it ships externally. Six weeks ago I wrote that the experiment would either work or fail in an interesting way. Both happened. The boring middle — works fine, no lessons — is the one we managed to avoid, and that's the part I'm most grateful for.

The pattern continues to compound. The boundary of it is becoming visible. Both are useful.