Claim
Guardrails regulate what a model says. Governance regulates what a system does. Guardrails are hygiene applied at the model's generation boundary; governance is authority enforced at lifecycle state transitions. Both are useful. Only one is replay-stable, authority-modeled, and evidence-backed. Conflating the two produces systems that feel governed and behave governed-adjacent.
Two layers, two jobs
The layers are complementary, not substitutive. A guardrail that classifies output as safe does not grant authority for a transition. A governance decision that authorizes a transition does not inspect text quality. The common mistake is to let guardrail verdicts stand in for authority decisions.
Reframing prompt injection
Indirect prompt injection is usually described as a content problem: adversarial text smuggled into the model's context. This framing is incomplete. The failure is cross-layer authority ambiguity: a payload arrives through a low-authority surface (retrieved content, tool output) and, absent an explicit authority model, influences a high-authority action.
Reframed: the model did not "obey an instruction" — the system admitted a transition for which no valid authority artifact was present. Content filtering narrows the blast radius; it does not close the authority gap.
Equivalence boundary test
A simple test distinguishes guardrail coverage from governance authority:
Given a surface S emitting event E with policy_id P:
- observe the guardrail/governance outcome on S
- present an equivalent event E' on a parallel surface S'
that also routes through enforce(…)
- compare terminal outcomes
PASS : outcomes are in the same terminal class (PROCEED|HALT|ESCALATE)
FAIL : surface-dependent divergence — not governance, only local hygiene
A guardrail that passes on S' because S' does not import the filter is not governing; it is locally decorating. A governance invariant compiled through the routing registry must produce equivalent outcomes across all governable surfaces.
Filled cases
V.a · Output-validator drift
Admission-time validator and runtime validator diverge on identical inputs. Typical guardrail framing: "the classifier is noisy." Governance framing: admission.verdict.version ≠ runtime.verdict.version — a version pin and signed verdict record would have closed the gap. The fix is artifact, not threshold. (See INV-DRIFT-011.)
V.b · Prompt approval bypass
A model-facing surface applies a prompt-layer filter; a machine-facing surface does not. Same policy ID, divergent enforcement. The equivalence test fails. The remedy is compilation into a cross-surface invariant, not additional prompt instructions. (See INV-EQUIV-007.)
Implication
Guardrails belong inside a governance system — as signal sources and content hygiene — but their verdicts must be reified into signed artifacts before invariants may read them. A guardrail whose verdict cannot be replayed is not admissible as authority evidence. Governance gets the final word on transitions; guardrails get the first word on generation quality.
Do not claim a failure mode is governed because a prompt-layer filter addresses it. Claim it governed only when the filter's verdict is artifact-bound, its invariant is registered, and its terminal outcome is named.