The Terminal Just Grew a Seatbelt
If you want a useful coding agent, start here:
plan(task)
propose_patch()
show_diff()
ask_permission()
run_checks()
only_then_apply()
That is the interesting part of the Codex launch from 2 February. Not that a model can write code. Plenty of models can write code. The interesting part is that the product finally treats software work as something that needs containment, review, and a reversible paper trail.
I work for OpenAI, so discount the house view accordingly. Even with that caveat, I think the launch matters because it quietly admits something the industry has spent too long avoiding: the chat box was never the right primitive for serious engineering work. The engineer does not need a poetic answer. The engineer needs a patch, a command, a test result, and the option to say no.
The important design choice
The strongest thing in the announcement was not the model family. It was the workflow. Codex is framed as a system that can inspect a repo, propose changes, and operate inside explicit permissions. That moves the product from 'assistant who sounds capable' to 'worker who can be supervised'. Those are not the same category.
Software engineering is full of actions that are individually harmless and collectively dangerous. A command that looks sensible in isolation can wipe the wrong directory. A tidy refactor can erase the only weird branch that was preventing a billing bug. Human teams cope with this by surrounding work with rituals: code review, test runs, approvals, rollbacks, deploy windows. If the agent is not built to live inside those rituals, it is not production tooling. It is theatre.
Why the seatbelt matters
The seatbelt metaphor is not accidental. A fast worker without a restraint system is not impressive for very long. The more capable these agents get, the more important it becomes that they stop before the blast radius expands. A diff view is not decorative UI. It is the product admitting that trust has to be earned at the granularity of a change set.
This is also why I think the next frontier in coding agents is not a prettier interface. It is auditability. Which files did the model read? Which assumptions did it make? Which commands did it want to run, and which ones were denied? A good agent should leave behind a trail that another engineer can inspect without having to become a mind reader.
There is a dry irony here. AI companies spent years selling magic. The serious products are getting better by becoming less magical. More permissions. More logs. More checkpoints. More opportunities for a human to interrupt. Good. Magic is not a deployment strategy.
What shipped on 2 February looked, to me, like the beginning of a more adult category. Not the coder as oracle. The coder as supervised process. That is a smaller claim than AGI, and a much more useful one.