← Back to Blog

AI agents are no longer limited to generating text. They browse the web, execute code, query databases, send emails, and interact with external APIs. The capabilities that make them useful also make them dangerous when they operate outside defined boundaries.

The current approach to AI safety is to constrain the model at the prompt level: system instructions, content filters, and output classifiers. These work well enough for narrow tasks. They do not hold up under adversarial conditions at scale.

Where prompt-level controls fail

Prompt-level controls share a fundamental weakness: they are enforced by the same system being controlled. A sufficiently crafted input can bypass a system prompt. A jailbreak that works against one model version often transfers to the next. Output classifiers catch known patterns; adversarial inputs are specifically designed to avoid known patterns.

More importantly, prompt-level controls do not address what happens after a model produces output. An AI agent that generates a valid database query and executes it has taken a real-world action. The harm, if any, has already occurred before any output filter could intervene.

Prompt-level guardrails constrain what a model says. Infrastructure-layer security constrains what a model can do. The distinction is the difference between advice and enforcement.

What infrastructure-layer security means in practice

Infrastructure-layer security treats AI agents the same way network security treats users: with verified identity, explicit permissions, and enforcement that operates independently of the agent itself.

This means the security policy is not embedded in the model's context window. It is enforced at the layer through which agent actions must pass. An agent cannot take an action that its policy does not permit, regardless of what instructions it has received or what reasoning it has produced.

The practical implementation involves three components:

Why formal verification matters here

Safety claims about AI systems are only as credible as the method used to verify them. Testing against known attack patterns is necessary but not sufficient. A system that has passed every test that has been run has not been proven safe. It has been shown to survive the specific inputs it was tested against.

Formal verification takes a different approach. Rather than testing samples, it exhaustively checks whether a system satisfies specified invariants across all possible states. This is standard practice in safety-critical hardware and distributed systems. It is not yet standard practice in AI security, which is part of why AI security is in the state it is in.

Applying formal verification to AI agent safety is not a research exercise. The mathematical tools exist. The engineering challenge is correctly specifying the invariants you want to hold and constructing a system architecture that can be verified against them.

The practical implication for enterprises

Organizations deploying AI agents in production environments face a question that prompt-level controls cannot answer: how do you know the agent did not do something it was not supposed to do?

With prompt-level controls, you do not know with certainty. You have a system prompt that said not to do it, and you have logs of outputs. That is not a security posture. It is a liability exposure.

Infrastructure-layer enforcement changes the answer. The agent could not have taken that action. The policy prevented it. The log confirms it was attempted and denied.

That is the difference between AI deployment that creates enterprise risk and AI deployment that can be defended to a board, a regulator, or a court.

AI security built at the infrastructure layer

The Dynamic Broker Architecture enforces policy at the infrastructure layer, independent of model behavior. Formally verified. Zero-trust by design.

Learn about DBA