Schemas Beat Confidence
The most useful line in a browser-agent workflow is usually not the one that sounds intelligent. It is the one that constrains the output.
PriceRecord = {
"sku": str,
"seller": str,
"price_gbp": float,
"in_stock": bool
}
That is why AWS's Nova Act example for competitive price monitoring, published on 1 April, is more interesting than a lot of broader agent rhetoric. The workflow leans on typed extraction, retries, and explicit human takeover points. In other words, it treats browser automation as an engineering problem rather than a personality test for the model.
Why structure wins
Free text feels clever because it gives the model room to sound smooth. It is also where ambiguity goes to hide. If your agent says, 'The item looks available at around twenty pounds from the third seller I checked,' that may be linguistically pleasant, but it is useless for an automated pipeline. A typed record is stricter and far less charming. Good. Pipelines should be difficult to charm.
What the Nova Act example gets right is the idea that the agent sits inside a workflow with expectations. The system knows what fields it wants. It knows which actions can be retried. It knows when to stop and ask for a human. That structure does not make the agent less capable. It makes the capability legible enough to depend on.
The actual shape of reliability
Browser tasks fail in ordinary ways. Layout drift. Product variants. Surprise pop-ups. A retailer quietly renames a label. The answer is not to ask the model to be more confident. The answer is to narrow the contract. Give the system a schema. Verify the fields. Retry when extraction fails. Escalate when the page stops behaving like the workflow expects.
This is the same pattern I want in coding agents and document agents. The model can explore in open-ended space during planning if necessary, but the handoff between stages should be typed wherever possible. If the next tool in the chain expects a patch, give it a patch. If it expects a JSON object, do not hand it a paragraph with vibes attached.
I think a lot of agent disappointment comes from mixing up expressiveness with robustness. Humans are impressed when a model improvises. Systems are impressed when a model returns valid fields in the right order under mild stress. The second one scales better.
There is a broader comfort in this. Typed workflows reduce the amount of trust you have to place in the model's internal reasoning. You do not need to believe the agent is wise. You need to know the output can be checked, retried, and rejected cheaply.
That is the category I want more of. Not agents that sound more certain. Agents surrounded by enough structure that certainty stops mattering so much.