Abstract
Terminology used in this paper is defined in glossary.md.
Most LLM-failure taxonomies are descriptive: hallucination, jailbreak, prompt injection, sycophancy, drift. Descriptive taxonomies support evaluation; they do not, by themselves, govern a running system. Governance requires a compilation step: each failure mode must be resolved into (1) an observable proxy at runtime, (2) a required artifact or authority claim, (3) an invariant predicate over state, and (4) a terminal outcome — HALT, ESCALATE, PROCEED, or ROLLBACK.
This paper specifies that compilation. Without it, a taxonomy is a vocabulary, not a governance mechanism.
The compilation gap
A failure mode is a label. An invariant is a predicate. The gap between them is the gap between naming a phenomenon and denying a transition. A taxonomy that says "the model hallucinated a tool output" tells you what happened after the fact. An invariant that says tool_output.signature ∈ trusted_keys denies the transition before it propagates.
Compilation closes that gap. It is not translation from English to code — it is a forced accounting: for every failure mode you claim to govern, you must be able to name the event, the artifact, the predicate, and the terminal outcome. If you cannot name all four, you do not govern that failure mode; you observe it.
Input substrate
The compiler needs a substrate of runtime objects already present in the system:
enforce(…).Compilation pipeline
1 · name the failure mode
A concise, non-overlapping label. If two modes share a predicate and outcome, collapse them — duplicate entries produce routing ambiguity.
2 · bind to an observable proxy
A runtime signal the enforcement layer can read: an event type, an artifact field, a state attribute. Latent or semantic-only signals must first be reified into artifacts (e.g. signed classifier verdicts) before they become proxies.
3 · specify the required artifact
The evidence whose presence and validity would make the transition legitimate. The artifact-absence case is often the clearest invariant to compile first.
4 · write the invariant
A pure predicate over (event, state, artifacts). No side effects. No probabilistic terms. Invariants must be replay-stable.
5 · assign a terminal outcome
One of PROCEED, HALT(code), ESCALATE(role), ROLLBACK(scope). Terminal outcomes are the governance system's vocabulary; advisory outputs are not terminal and do not close the loop.
Worked examples
V.a · Authority smuggling via agent tool call
failure_mode : authority_smuggling_via_tool_call
observable_proxy : event.type == "agent.tool_exec"
required_artifact: ack_envelope(authority_ref, scope, signature)
invariant (pure) : enforce(event, state, skill_context) passes iff
artifacts.ack_envelope.scope ⊇ event.requested_scope
AND verify_signature(ack_envelope, trusted_keys)
AND event.caller ∈ ack_envelope.delegation_chain
terminal : HALT(code=301, reason="authority_smuggling")
| ESCALATE(role="runtime_operator")
replay : stable — inputs are all artifact-bound
V.b · Output-validator drift
A validator changes its verdicts between admission and runtime. Descriptive name: "drift." Compiled:
failure_mode : validator_verdict_drift
observable_proxy : (admission_verdict, runtime_verdict) pair
required_artifact: validator_version_pin + signed_verdict_record
invariant (pure) : admission.verdict.version == runtime.verdict.version
AND admission.inputs_hash == runtime.inputs_hash
→ admission.verdict.outcome == runtime.verdict.outcome
terminal : HALT(code=412, reason="verdict_drift")
V.c · Prompt approval bypass
A model-facing surface applies a filter; a machine-facing surface does not. Same policy identifier, divergent enforcement. Compiled as a cross-surface equivalence invariant:
failure_mode : cross_surface_approval_divergence
observable_proxy : policy_id emitted on both surfaces
invariant (pure) : ∀ surface ∈ {s1, s2}. equiv_class(enforce(event, state, _))
is identical under equivalent inputs
terminal : HALT(code=409, reason="surface_divergence")
| ESCALATE(role="policy_author")
Terminal outcomes
When the compiler cannot determine a terminal outcome — missing artifact schema, unreachable proxy, undecidable predicate — the default is HALT, not PROCEED. Fail-closed is a structural property of the compilation pipeline, not an opt-in runtime mode.
What does not compile
Some failure modes resist compilation. The honest response is to record the residue, not to pretend it is governed.
- Purely semantic judgments that cannot be reified into an artifact (tone, intent, aesthetic quality). These may require an escalation-owned human evaluator; they are not runtime invariants.
- Probabilistic phenomena (hallucination rate) without a threshold bound to a deterministic proxy. These may be monitored but do not produce replay-stable terminal outcomes.
- Cross-epoch intent drift where the governed system changes meaning faster than policy versions do. Requires a policy-lifecycle invariant of its own, not a per-event invariant.
A governance system does not become stronger by claiming more invariants than it can compile. It becomes stronger by making the residue visible, logged, and routed to the authority model.
Conclusion
The compilation pipeline is the minimum discipline that converts a failure vocabulary into an enforcement system. Each governed failure mode carries (proxy, artifact, invariant, terminal). Anything less is observation.