The Dopamine Hit Trap: Why Time-to-First-Output is the Wrong Metric

Share
The Dopamine Hit Trap: Why Time-to-First-Output is the Wrong Metric

Today, we were previewing something internally a few days before sharing it with a large auto client. The team had built an automotive-trained image-generation agent — the kind that handles reflections, brand guidelines, and integrated copy without the usual uncanny-valley artifacts that make most AI car ads unusable. The demo ran. The output was great. Showroom-grade. Brand colors right, reflections clean, copy on-brief. But I could tell from the reaction on the call that something wasn't landing. It wasn't the image quality. It wasn't the UI. The renders looked like they'd come out of a $100,000 photo shoot. People were quiet for a different reason. It took a few minutes to generate the image. Five to eight, depending on the prompt and how many self-correction passes the agent ran. And the moment that hit the screen, I could feel the room tilt. The instinct — even from people who'd been working on the build — was "the client will never wait that long." It was clear to me that they were comparing the wrong thing.

What we were actually comparing

The mental model in the room was: fast tool returns an image in 20 seconds → our tool returns one in 8 minutes → ours is 24x slower. That math is wrong, but it's the math everyone runs. The fast tool returns a first image in 20 seconds. Then you regenerate, because the light is off. And again, because the angle isn't quite right. And again, because the brand color drifted. And again, because the reflection on the hood is doing something a human would never ship. Ten regenerations is normal in this category. The honest designers will admit it's usually more. That's twenty minutes of a designer, sitting at the screen, refining the prompt, mid-flow attention captured. They can't do anything else, because the next iteration is 30 seconds away and they have to keep deciding what's wrong. Our tool returns the final image in 8 minutes. The agent self-corrects during the run. Reflections, brand, copy — handled. The designer gets an email when it's done. While that's running, the designer takes a Slack call. Runs a meeting. Grabs a Diet Coke (IYKYK). The 8 minutes happens in the background. The cost in minutes of attention to get to a final, acceptable output: about 30 seconds, to look at the result.

Time to one final image 0 5 10 15 20 25 MINUTES 20 min "FAST" TOOL 8 min OUR TOOL of which ~30 sec is your attention
Wall-clock time. Our tool's eight minutes happens in the background — your attention cost is about thirty seconds.
FAST TOOL Prompt Output Refine ×10 Acceptable ~20 min later OUR TOOL Prompt 8 min — async Final image
Top: the dopamine loop (fast first output, ten regenerations). Bottom: one-shot, asynchronous, finished.

The metric that matters

Fast AI optimizes for time-to-first-output. That metric exists because it feels like progress — it's the dopamine hit. You hit Enter, something happens, you feel productive. What operators actually care about is time-to-final-acceptable-output. The two metrics diverge the more brand-precise, technically-precise, or domain-specific the work is. A generic stock photo? Time-to-first-output is probably fine. A Porsche 911 with correct reflections, brand colors, and an on-brief scene? You want time-to-final. We see this with increasingly regularity, especially for clients wanting massive scale and/or high-fidelity, tools that longer per request are actually quietly winning in the end. Deep research tools that run for five minutes can outperform the fast chat-based ones. Code agents that take 90 seconds to plan a change can outperform the autocomplete-style ones. The pattern doesn't hold everywhere, but I see it holding in more and more places within even the most sophisticated AI-led operations.

SYNCHRONICITY async sync OPTIMIZED FOR time-to-first time-to-final "more options, you pick" the compounding configuration the dopamine trap "wait, get it right" Midjourney grid Batch ad gen Our tool Code agents Deep Research ChatGPT chat Most fast image gen Copilot autocomplete Wolfram Alpha Long sync RAG
Where AI tools sit on the two axes that matter. Top-right is the configuration that compounds; bottom-left is where most teams default.

When we go to the client

When we go to the client, we won't lead with speed. We'll tell them the truth: yes, eight minutes per asset. Every asset replaces a component of a $100,000 photo shoot, lands in the designer's inbox, and looks like it came out of one. The pitch isn't "we're faster." It's "we're done." I like our odds.