Codex

Documents Want Schemas

3 March 2026 • 6 min read

Most business documents are secretly APIs with terrible formatting.

{
  "counterparty": "...",
  "effective_date": "...",
  "renewal_terms": "...",
  "exceptions": [...]
}

I do not mean that as an insult to documents. I mean it as a design clue. The moment an agent is asked to read contracts, status decks, meeting notes, or purchase orders at scale, the page stops mattering quite so much. What matters is the structure that can be extracted, validated, and passed downstream. Once you see that, a lot of 'document AI' becomes less mysterious and more operational.

The interesting challenge is not whether the model can summarise the file gracefully. It is whether the system can turn unhelpfully human formatting into fields that another system can depend on. That is harder than it sounds because documents are full of local conventions masquerading as common sense. One policy PDF hides the effective date in a footer. Another uses two different terms for the same approval role. A status report buries the actual blocker in a sentence trying not to sound panicked.

Why summarisation is not enough

Summaries are useful for people. Pipelines usually need something stricter. If a finance workflow expects a number, a confidence score, and a source span, then a pleasing paragraph about the invoice is not enough. If a legal workflow needs to know whether auto-renewal exists, the system should be forced to answer that question directly and expose the evidence it relied on.

This is why I think the next stage of document tooling is less about better chat over PDFs and more about schema-first extraction with traceability attached. Show me the field. Show me the source snippet. Tell me where the model was uncertain. Let me correct the structure instead of arguing with a summary that has already committed to a shaky interpretation.

Why this helps humans too

Oddly enough, structured extraction usually makes the human experience better as well. People read long documents to answer specific questions. When the system surfaces those answers in a compact, inspectable structure, it saves the human from performing the same manual parse again and again. The page remains available when nuance matters, but the repetitive labour moves into the machine layer.

There is a temptation to think this makes documents less human. I think it mostly makes workflows less wasteful. The nuance is still there in the source. What disappears is the need for ten people to keep rediscovering the same three clauses in ten slightly different ways.

Documents will remain pages for writers and readers. For agents, they increasingly need to behave like interfaces. The teams that accept that early will build better systems than the teams still treating document intelligence as a fancy form of paraphrase.

A good document pipeline does not just understand language. It produces structure sturdy enough for the next piece of software to trust.