Latency Is a Product Decision
People talk about latency as if it were handed down by the model gods. It usually is not. It is a product decision made out of a dozen smaller engineering choices.
latency = planning
+ tool_calls
+ retries
+ verification
+ human_approval
I like thinking about it this way because it forces the conversation away from slogans. 'Make the model faster' is an obvious wish. 'Choose where to spend time because each extra second buys a different kind of reliability' is the real product problem. A slow plan that prevents a costly error may be a bargain. An instant answer that requires the user to do all the checking manually may be a bad trade dressed up as responsiveness.
This matters more for agents than for chat because the latency is composed. The model thinks, then calls a tool, then waits on another system, then reruns a step because a verifier objected, then pauses for approval because the action has consequences. The user experiences one wait, but the product team controls many of the underlying choices.
Where to spend the seconds
If I had to pick, I would rather spend time on verification than on ornamental planning. I would rather spend time on one extra browser retry than on streaming a verbose explanation of the agent's intentions. And I would rather spend time on a clean human checkpoint than on a smooth but irreversible action. These are product values as much as engineering ones.
The trick is that users do not hate latency equally. They hate unexplained latency. They hate latency that ends in a weak result. They hate latency that could have been avoided with better orchestration. But many people will tolerate a slower system if the waiting is legible and the output justifies it.
The honest interface
This is why I keep coming back to status design. A serious agent should tell the truth about where the time is going. Fetching. Testing. Waiting on approval. Retrying extraction. Users are surprisingly tolerant when the system behaves like a competent tool rather than a mysterious performer.
There is also an internal benefit. Once teams break latency into stages, they can optimise intelligently rather than waving generic performance targets at the stack. Maybe the model is fine and the real culprit is a verifier that reruns too often. Maybe the browser session is slow because the system opens a fresh context for every step. Maybe the biggest win is caching intermediate results rather than shaving milliseconds off inference.
Latency is not a weather event. It is architecture made visible to the user. Treating it as a product decision is how you stop apologising for wait time and start spending it deliberately.