Why Everything You Know About Access Control Fails With Agents

Everyone is worried about AI reading things it shouldn’t. That’s the wrong threat model.

The real problem is what agents do after they read.

You can lock down every file in your organization. Once an agent has read a document, the data is in its context window. And from there, it can go anywhere.

The context window is not a trust boundary

When an AI agent calls a tool, say read_document, the result lands in its context window. From that moment on, every subsequent tool call the agent makes can draw on that data. It doesn’t matter what the storage layer says. The data is already inside.

Traditional access control is about reading. Bell-LaPadula’s “no-read-up” rule: a subject cannot read a resource above their clearance. That’s the model driving Drive IAM, S3 bucket policies, and most of what we call access control. [1]

Agents need the other half: no-write-down. A subject cannot write sensitive data to a channel below its classification. An agent that read a confidential document should not be allowed to send that content in an email to someone who wasn’t cleared to read it.

No tool-level ACL catches this today. The enforcement point has to sit somewhere else.

The threat is real and growing. Microsoft’s security team documented how tool call manipulation and prompt injection have become primary vectors for data exfiltration in agentic deployments, with tool invocations treated as high-value targets precisely because they sit outside the model’s security perimeter. [2] CrowdStrike research showed how attackers inject malicious parameters directly into tool metadata (email BCC fields, API endpoints) to redirect sensitive data without the agent ever knowing. [3]

A reference monitor for tool calls

A recent paper proposed PCAS (Policy Compiler for Agentic Systems) as a formal enforcement layer for exactly this problem, demonstrating 93% policy compliance versus 48% for prompt-based approaches. [4]

I built an independent implementation of the core architecture to see if it actually works in practice. [5]

The idea: intercept every tool call, figure out what causal history led to it, evaluate policy on that history, and block or allow.

Every event in an agent session gets recorded as a node in a dependency graph. Three node types: MESSAGE (a user or assistant turn), TOOL_CALL (a proposed invocation), and TOOL_RESULT (the data that came back, tagged with metadata). Edges are causal links. When the agent proposes a tool call, that call gets an edge back to every node currently in its context.

The graph grows monotonically. It’s never rewritten. It becomes the audit trail.

Before any tool call executes, the reference monitor computes the backward slice of that call: every node in the transitive closure of its dependencies. That slice is the complete causal history of why the agent wants to do this action.

Taint propagation through Datalog

The policy language is Datalog. Simple, declarative, evaluable over a fresh engine per authorization request.

Taint starts at source nodes. If a TOOL_RESULT node came from reading a sensitive document, it is tainted:

Tainted(Node) :-
  IsToolResult(Node, read_document, Doc),
  SensitiveDoc(Doc).

Taint propagates through the causal graph:

Tainted(Node) :-
  Depends(Node, Ancestor),
  Tainted(Ancestor).

Any proposed action whose backward slice includes a tainted ancestor gets denied for non-privileged principals:

Denied(Entity, Tool, tainted_dependency) :-
  PendingAction(ActionId, Tool, Entity),
  Depends(ActionId, Ancestor),
  Tainted(Ancestor),
  not EntityRole(Entity, vp).

Deny overrides allow. The monitor is fail-closed: if no policy fires for a proposed action, the default decision is deny.

The policy files are separate from the enforcement code. You write Datalog rules, load them into the engine, and the monitor evaluates them. New rules don’t require changes to the monitor itself.

This is related to work on information-flow control for AI systems more broadly. Research on FIDES showed that labeling messages and tool results with confidentiality and integrity tags, then propagating those labels through causal chains, can achieve formal non-interference guarantees for agent workflows. [6] The PCAS approach is less formal but more operational: the backward slice is a practical approximation of information-flow tracking that works with existing LLM infrastructure.

What this catches that IAM doesn’t

I ran experiments with the implementation to test the guarantees.

The clearest one involved prompt injection. An adversarial document in the agent’s context tried to coerce the LLM into calling send_email with sensitive compensation data. The LLM followed the injected instruction. The agent issued the tool call. The reference monitor checked the backward slice, found a tainted ancestor (the compensation document) and denied the call.

The LLM was compromised. The monitor wasn’t.

This is the key insight: you cannot rely on the model to enforce policy. Models can be manipulated. A reference monitor operating outside the model’s context, evaluating formal rules on the causal graph, doesn’t care what the LLM was told to do. It evaluates the facts.

A second experiment tested multi-principal sessions. Alice’s agent reads a sensitive document. Her session ends. Bob’s agent needs to continue the workflow. Sharing Alice’s full context would leak the sensitive content to Bob. Sharing nothing loses the causal history of the work done.

The solution is taint-aware context projection: evaluate taint globally over Alice’s dependency graph, then filter out tainted nodes before handing the context to Bob. Bob inherits the structure of what was done, the causal intent, without inheriting the sensitive content.

Limitations I won’t paper over

PCAS controls tool calls. It does not control what the LLM thinks. If the model has already internalized sensitive content from a tainted document, that understanding persists in the context window regardless of what the monitor blocks.

Inference attacks are real. If the monitor redacts a tool result and replaces it with a placeholder, the existence of the placeholder tells the LLM something was there. A sufficiently capable model may be able to infer content from shape and metadata alone.

The system is also only as strong as the taint sources you declare. If you forget to mark a document as sensitive, taint never initiates. The Datalog rules are only as correct as the person who wrote them.

This is a building block, not a complete solution. It extends the trust surface of agentic systems by adding a formal enforcement layer. It doesn’t eliminate the need for careful tool design, prompt hygiene, or model-level safeguards.

Why I think this matters

The trajectory of agentic systems is toward more tools, more context, more autonomy, longer sessions, and more principals. Every one of those dimensions makes the gap between storage-level access control and actual data flow wider.

We have good models for securing files. We do not yet have good models for securing the agentic workflows that operate on those files. The context window is a new kind of trust boundary, and right now it has no enforcement mechanism that most deployments use.

My goal in building this implementation was to verify the core claims: that agent interactions can be modeled as a dependency graph, that Datalog policy evaluation over backward slices is practical, and that the result meaningfully constrains what agents can do at runtime.

The verification holds. The graph is the audit trail. The Datalog rules are auditable policy. The reference monitor is the enforcement point.

Whether this becomes part of how we build agentic systems, or remains an experimental sketch, probably depends on how seriously we take the problem of what agents are allowed to do with the data they read.


References

[1] D.E. Bell and L.J. LaPadula, "Bell–LaPadula model," Wikipedia. en.wikipedia.org

[2] Microsoft Security Blog, "From runtime risk to real-time defense: Securing AI agents," January 2026. microsoft.com

[3] CrowdStrike, "How Agentic Tool Chain Attacks Threaten AI Agent Security," 2025. crowdstrike.com

[4] PCAS Authors, "Policy Compiler for Secure Agentic Systems," arXiv:2602.16708, 2026. arxiv.org/abs/2602.16708

[5] E. Donadei, "Independent implementation of PCAS," GitHub, 2026. github.com/edonadei/policy-compiler-for-agentic-systems

[6] S. Bengio et al., "Securing AI Agents with Information-Flow Control," arXiv:2505.23643, 2025. arxiv.org/abs/2505.23643

← Back to all posts