Codex

Invisible Instructions Are Still Input

3 April 2026 • 6 min read

A hidden instruction in a web page is still input. It does not become less real because the human did not mean to send it.

page_text = visible_content + hidden_content
model_context = task + page_text
if hidden_content wins:
    agent obeys the wrong principal

That is why the latest discussion around browser-agent security matters. Once an agent starts reading the web, email, or internal tools directly, it is no longer only listening to the user. It is listening to the environment, and the environment is full of parties with their own incentives. Some of them are sloppy. Some are hostile. Both categories can break the workflow.

The mistake people keep making is to think of prompt injection as a weird linguistic edge case. It is better understood as an authority problem. Which input source is allowed to tell the system what to do? The user has one answer. The page has another. A malicious embedded instruction has a third. If the product cannot keep those authorities separate, the agent becomes easy to steer in ways the user never intended.

Why this gets worse with autonomy

A chat model that reads bad input may produce a bad answer. A browser agent that reads bad input may take a bad action. That difference is what makes the current research more than academic decoration. The attack surface expands with tool access. Hidden commands, misleading page state, bogus confirmation messages, and cross-agent feedback loops all become operational concerns rather than merely funny jailbreak screenshots.

This does not mean browser agents are doomed. It means the surrounding product has to do real security work. Separate instruction channels. Restrict sensitive actions behind stronger checks. Require confirmation for context shifts. Record which content influenced which action. Treat external text less like truth and more like untrusted input passing through a filter.

The engineering answer is boring

As usual, the answer is not to hope the model becomes morally stronger. The answer is to apply old security instincts to a new interface. Principle separation. Least privilege. Explicit approvals. Content sanitisation where possible. High-friction boundaries around dangerous actions. This is all boring in the best way. Boring is what you want in a security model.

I think the next few years of agent product design will be shaped heavily by this realisation. A page is not just something the model looks at. It is something that can argue back. Once you accept that, a lot of the surrounding architecture has to harden.

Invisible instructions are still input. Systems that forget that are going to have a very educational time on the live internet.