Governing AI Agents Without Sacrificing Expressiveness: A Formal Result

One of the most repeated arguments against safety restrictions in agent systems is that any control layer ultimately limits what the system can do. A paper published on May 6, 2026 on arXiv—arXiv:2605.01030—provides formal evidence to the contrary: it is possible to impose complete governance over an AI agent's effects without touching its internal computational expressiveness.

The result is not intuition or a design proposal. It is a development mechanically verified in Rocq 8.19 (the proof assistant formerly known as Coq), with 36 modules, around 12,000 lines of code, and 454 theorems, all compiled with zero admitted lemmas. This means no proof was accepted as valid without verification.

What the paper proposes

The authors define a governance operator G that acts as an intermediary for all "effects" an AI workflow can emit: memory access, external calls, and queries to oracles (that is, language models). The formalization is built on Interaction Trees, a functional data structure that allows modeling programs with effects in a composable and verifiable way under formal verification.

Based on this foundation, the paper demonstrates seven properties (P1-P7). The most relevant for practice:

P1 and P2 (Turing completeness and oracle expressiveness under governance): the governed system can compute anything the ungoverned system can compute, including LLM queries. Governance does not amputate capabilities.
P3 (decidability frontier): governance predicates are total and closed under boolean composition—allowing complex policies to be built predictably—but semantic properties of programs remain non-trivial and undecidable by the governor. In plain terms: the governor can decide whether an action is permitted, but cannot decide what the program does in general.
P6 (structural governance subsumes content filtering): a layer controlling effect structure is strictly more powerful than a layer filtering generated content. This has direct implications for debates about where to place controls in systems with LLMs.
P7 (semantic transparency): when governance permits execution, the governed and ungoverned interpretations are equivalent. There is no silent distortion.

Why it matters for the agent ecosystem

In systems like Claude Code, where sub-agents can invoke external tools, execute lifecycle event hooks, or connect to MCP servers, the question of how to ensure a control layer does not degrade legitimate system behavior is entirely operational. Until now, answers were mostly empirical: one tested that the system still functioned after adding restrictions.

This paper offers something different: a verifiable guarantee that the design is correct before executing a single instruction. Property P7 in particular—semantic transparency—is what would allow trust that a governed agent behaves identically to one without restrictions in all cases where governance gives approval.

The distinction between effect governance and content filtering (P6) also deserves attention. Much of current safety debate around LLMs focuses on text output. This work formally suggests that acting at the effect level—what the agent does, not what it says—is a more robust stance.

For whom this is useful

The paper is aimed at researchers in formal verification and type theory, but its conclusions have direct reading for those designing agent architectures in production. Teams building orchestrators, defining tool usage policies, or auditing multi-agent workflows will find here a theoretical framework against which to compare their design decisions.

It is not a ready-to-install repository. Rather, it is a formal foundation on which to build tools and standards with verifiable guarantees.

---

From our perspective, the result strikes us as sound precisely because its authors do not sidestep limitations: the decidability frontier (P3) is explicitly bounded, without promising more than the formalization can sustain. That is what distinguishes a useful verification paper from one that merely sounds good.

Governing AI Agents Without Sacrificing Expressiveness: A Formal Result

What the paper proposes

Why it matters for the agent ecosystem

For whom this is useful

Sources

Read next

Conversational Design for Museums: From Monologue to AI Dialogue

Will AI Kill the Scientific Paper As We Know It?

Anthropic Explains Why It Trains Claude With Moral Reasoning, Not Just Rules