Chapter 6 — Runtime and Agents: Autonomy Is Privilege
Opening Scenario
A financial services firm built an internal AI agent to accelerate their deal analysis workflow. The system was elegant: analysts could ask questions in natural language, and the agent would retrieve data from internal databases, summarize documents, run calculations in a sandboxed environment, and draft preliminary reports. It saved hours per deal. Leadership loved it.
The agent had access to a document retrieval tool, a SQL query interface, a code execution sandbox, and an email drafting capability. Each tool had been reviewed individually. Each passed security review.
Three months in, an analyst asked the agent to "find all communications related to the Meridian acquisition and summarize any concerns raised by legal." The agent retrieved emails, parsed attachments, found a privileged attorney-client memo that had been misfiled in a general folder, summarized its contents, and included key excerpts in a draft email addressed to the deal team—which included two external advisors.
The agent did exactly what it was designed to do. It retrieved relevant documents. It summarized concerns. It drafted a communication. No individual action was unauthorized. But the sequence of actions—retrieval, synthesis, distribution—constituted a privilege violation that no human would have made. The agent didn't understand privilege. It understood relevance.
The firm discovered the breach when outside counsel flagged the forwarded memo. By then, privilege had been waived, the document was in discovery, and the acquisition's negotiating position had fundamentally changed.
This chapter is about runtime and agents—and why granting AI systems the ability to act is granting them privilege, whether you intended to or not.
Why This Area Matters
The conventional view of AI agents treats autonomy as a feature. Agents can browse the web, execute code, call APIs, manage files, send messages. The more tools an agent can use, the more useful it becomes. This framing positions agent capabilities as a product problem—what can we enable?—rather than a security problem.
This is architecturally wrong.
The real problem is that agents are security principals operating with delegated authority, and almost no organization treats them that way. When a human analyst accesses a database, there's a clear identity, a session, an audit trail, and a set of permissions. When an agent accesses the same database on behalf of that analyst, the identity model fractures. Is the agent acting as the user? As itself? As the system? The answer is usually "unclear," and unclear identity is the root of most authorization failures.
What actually happens is this: organizations deploy agents with broad tool access because narrow access makes them useless, then rely on the agent's "reasoning" to avoid misuse. This is the equivalent of giving an intern root access and hoping they'll only use it responsibly. Except the intern can read social cues, understand organizational politics, and ask clarifying questions. The agent cannot.
Agents break traditional threat models because they collapse the distinction between data access and action. In classical security, reading a file is different from sending an email. You can grant read access to sensitive data without granting the ability to exfiltrate it. Agents dissolve this boundary. An agent with read access and email capability has exfiltration capability, even if no one designed it that way.
This matters because runtime is where AI becomes dangerous—not because models are malicious, but because autonomous execution amplifies every architectural weakness. A misconfigured permission becomes a breach. A misfiled document becomes a legal crisis. A missing boundary becomes a blast radius problem.
The architectural question is not "how do we make agents safe?" It's "how do we treat agents as principals with privilege, and apply the same rigor we would to any other privileged identity?"
Architectural Breakdown
Agents as Security Principals
The first architectural shift required is conceptual: agents are not features of an application. They are principals operating within your security perimeter.
A principal, in security terms, is an entity that can be authenticated, authorized, and held accountable. Users are principals. Service accounts are principals. When an agent executes actions—calls APIs, queries databases, sends messages—it is acting as a principal, whether your identity system recognizes it or not.
The problem is that most agent architectures don't model agents as principals. Instead, they inherit identity from context:
[User] → (authenticates) → [Application]
↓
[Agent Runtime]
↓
[Tool A] [Tool B] [Tool C]
↓
[Database] [API] [Email]
In this model, the agent operates under the application's identity or the user's delegated identity. Every tool invocation happens with the same credential context. There's no distinction between "user asked agent to read X" and "agent autonomously decided to read Y as part of reasoning about X."
This is the architectural equivalent of sudo without logging which commands were run. You know privilege was used. You don't know how.
The alternative is treating agents as distinct principals with their own identity, their own permissions, and their own audit trail:
[User] → (authenticates) → [Application]
↓
[Agent Identity Service]
↓
[Agent Principal: agent-deal-analyst-001]
↓
[Scoped Token A] [Scoped Token B] [Scoped Token C]
↓
[Database] [API] [Email]
↓
[Audit Log]
(principal, action, context, justification)
This model makes the agent's actions attributable, constrainable, and auditable. It's not a theoretical nicety—it's the foundation for every other security control.
Tool Use Is Privilege Escalation
Every tool an agent can invoke is a privilege grant. This seems obvious but is routinely ignored in agent design.
Consider a typical agent toolkit:
- Document retrieval: Access to file systems, document stores, or search indices
- Database queries: Ability to read (and sometimes write) structured data
- Code execution: Running scripts in sandboxed or semi-sandboxed environments
- API calls: Invoking internal or external services
- Communication: Sending emails, messages, or notifications
Each tool, individually, can be secured. Document retrieval respects access controls. Database queries run with a service account. Code execution happens in a container. APIs require authentication.
The problem is composition. An agent with retrieval + summarization + email can exfiltrate sensitive data, even if each tool is "secure." An agent with database read + code execution can perform inference attacks that extract information beyond what any single query would reveal. An agent with calendar access + email + meeting notes can reconstruct organizational dynamics that no single data source would expose.
The architectural failure is treating tools as independent capabilities rather than composable privileges. If you can't answer "what can this agent do with the combination of tools it has access to?", you haven't secured the agent.
Tool composition creates emergent privilege. The blast radius of an agent is not the union of its tools' blast radii—it's the product of what those tools can do in sequence.
Memory and Persistence: State Is Power
Agents increasingly maintain memory—context that persists across sessions, interactions, and tasks. Memory makes agents more useful. It also makes them more dangerous.
Memory creates persistence. In traditional security, a compromised session ends when the session ends. Memory means a compromised interaction can influence future interactions indefinitely. Poisoned context doesn't expire.
Consider an agent that maintains:
- Conversation history: What users have asked and what the agent has responded
- Task context: Intermediate results, retrieved documents, reasoning chains
- User preferences: Learned patterns about how users want information presented
- System state: Information about what tools have been used and what results they produced
Each memory type creates a different attack surface:
Conversation history can leak information across trust boundaries. If an agent serves multiple users with different access levels, memory can become a side channel. User A asks about Project X. User B asks "what were we discussing about acquisitions?" Memory doesn't understand information classification.
Task context can be poisoned. If an attacker can influence what documents the agent retrieves early in a task, that poisoned context shapes all subsequent reasoning. The agent doesn't re-evaluate sources—it builds on what it has.
User preferences can be manipulated. An attacker who can shift an agent's learned preferences can influence how information is filtered, summarized, or presented to legitimate users.
System state creates predictability. Knowing what an agent has done enables predicting what it will do, which enables more targeted manipulation.
The architectural question is: who controls memory, and what are the trust boundaries around it?
Most agent frameworks treat memory as a feature to be enabled, not a security surface to be managed. Memory should have:
- Scope boundaries: What memory is shared across users, sessions, tasks?
- Retention policies: How long does memory persist? Who can purge it?
- Integrity controls: How do you detect memory poisoning?
- Access controls: Can users see what the agent "remembers" about them?
If you can't answer these questions, memory is a liability, not a feature.
Feedback Loops and Autonomous Action
The most dangerous architectural pattern in agent systems is the unsupervised feedback loop: agent takes action → action produces result → agent uses result to decide next action → repeat.
Feedback loops are what make agents useful. They're also what make agents uncontrollable.
In a supervised workflow, a human reviews agent outputs before they become inputs to the next step. The human is a checkpoint—a place where bad reasoning can be caught, wrong assumptions corrected, dangerous actions blocked.
In an unsupervised loop, the agent is both the actor and the evaluator. It decides what to do, does it, evaluates the result, and decides what to do next. Each step can amplify errors from previous steps. Small misunderstandings become large failures.
The architectural problem is that feedback loops compress time. A human making the same sequence of decisions might take hours or days, with natural pause points for reflection and review. An agent can execute hundreds of actions in minutes. By the time anyone notices something is wrong, the damage is done.
Consider the trust model of a feedback loop:
[Initial Prompt] → [Agent Reasoning] → [Tool Invocation]
↑ ↓
[Decision to Continue] ← [Result]
At every iteration, the agent trusts:
- Its own interpretation of the previous result
- Its own judgment about what to do next
- Its own assessment of whether to continue or stop
There's no external validation. The agent is trusting itself.
This is why agents break traditional threat models. In classical security, we assume attackers are external. We build perimeters, authenticate identities, validate inputs. But an agent operating autonomously can become an unintentional insider threat—taking actions that no attacker requested but that cause harm nonetheless.
The architectural response is not to eliminate feedback loops—that would eliminate the value of agents. It's to design loops with:
- Bounded iteration: Maximum steps before mandatory review
- Scope limits: Constraints on what actions can be taken without approval
- Anomaly detection: Patterns that trigger human review
- Reversibility requirements: Preference for actions that can be undone
The Authorization Gap
The deepest architectural problem with agents is authorization—specifically, the gap between what agents can do and what anyone intended them to do.
Traditional authorization answers the question: "Is this principal allowed to perform this action on this resource?" It assumes the action is known, the resource is identified, and the principal's intent is reflected in the request.
Agents break this model because:
Actions are emergent: The agent decides what to do based on reasoning, not explicit instruction. Authorization systems can't anticipate every action an agent might take.
Resources are discovered: The agent finds resources during execution—documents in search results, records in query results, endpoints in API responses. Authorization happens after discovery, not before.
Intent is inferred: The agent interprets user intent, often incorrectly. A request to "find relevant documents" doesn't specify which documents, what relevance means, or what should happen with them.
The result is that agents routinely perform actions that are individually authorized but collectively unauthorized. Each database query is permitted. Each file access is allowed. Each email is sent by an authenticated principal. But the sequence of actions violates policies that no individual authorization check could catch.
This is not a problem that finer-grained permissions solve. You can't enumerate every harmful sequence of authorized actions. The number of combinations is astronomical.
The architectural response requires shifting from action-level authorization to intent-level authorization:
- Task scoping: Define what the agent is trying to accomplish, not just what tools it can use
- Outcome constraints: Specify what results are acceptable, not just what actions are permitted
- Contextual authorization: Make permissions dependent on the reasoning chain, not just the immediate action
- Human-in-the-loop for novel combinations: Require approval when the agent attempts action sequences it hasn't performed before
This is hard. It requires authorization systems that understand context, not just credentials. But without it, agent authorization is theater—each check passes while the overall behavior fails.
Common Mistakes Organizations Make
Mistake 1: Treating Tool Access as Binary
What teams do: Grant agents access to tools with on/off permissions. The agent either can or cannot use the database connector, the email tool, the code execution sandbox.
Why it seems reasonable: This mirrors how application permissions work. Either an app has API access or it doesn't. Binary permissions are simple to implement and audit.
Why it fails architecturally: Tool access isn't binary—it's contextual. The question isn't "can this agent query the database?" but "can this agent query the database for this purpose, in this context, with these result constraints?" Binary access ignores the composition problem entirely. An agent with binary access to retrieval + email has the same permissions whether it's summarizing public documents or exfiltrating trade secrets.
What it misses: Authorization should be scoped to intent, not capability. The same tool used for different purposes should have different permission requirements.
Mistake 2: Relying on the Model to Enforce Boundaries
What teams do: Use system prompts or instructions to tell agents what they shouldn't do. "Never access personnel files." "Don't send external emails without user confirmation." "Always respect data classification."
Why it seems reasonable: Modern language models are good at following instructions. If you tell the agent not to do something, it usually won't. This feels like a control.
Why it fails architecturally: Instructions are not enforcement. They're suggestions to a probabilistic system. Models can be jailbroken, confused, or simply make mistakes. More fundamentally, relying on the model to enforce security inverts the trust hierarchy. The agent—the least trustworthy component—becomes the enforcer of the most important constraints.
What it misses: Security controls must be external to the system they constrain. You wouldn't rely on an application to enforce its own rate limits or access controls. Agent boundaries require the same externalization.
Mistake 3: Auditing Actions Without Context
What teams do: Log every tool invocation—what tool was called, with what parameters, what was returned. Build dashboards showing agent activity. Feel confident they have observability.
Why it seems reasonable: Action logs are the foundation of security monitoring. If you can see what happened, you can detect anomalies and investigate incidents.
Why it fails architecturally: Action logs without reasoning context are nearly useless for agent security. Knowing an agent queried the database tells you nothing about whether that query was appropriate. The same query can be legitimate or malicious depending on why it was run. Without the reasoning chain—why the agent decided to take that action—you can't evaluate appropriateness.
What it misses: Agent audit logs need to capture intent, not just execution. What was the agent trying to accomplish? What information led to this decision? What alternatives were considered? Without this, you're logging the what without the why.
Mistake 4: Assuming Sandboxing Solves Execution Risk
What teams do: Run agent code execution in containers, VMs, or sandboxed environments. Limit network access. Restrict file system visibility. Assume execution risk is contained.
Why it seems reasonable: Sandboxing works for untrusted code execution. If the agent can't reach the network or access sensitive files, it can't cause harm.
Why it fails architecturally: The sandbox constrains the execution environment, not the agent's reasoning. An agent can't directly exfiltrate data from a sandbox—but it can return results to the orchestration layer, which does have network access. The sandbox doesn't prevent the agent from reasoning about sensitive information it was given as input, then including that reasoning in its output.
What it misses: The agent's inputs and outputs cross the sandbox boundary. Sandboxing execution without constraining the data that flows in and out is security theater.
Mistake 5: Deploying Agents Without Identity
What teams do: Run agents as background processes with service accounts or inherited user credentials. Treat agents as features of applications rather than principals in their own right.
Why it seems reasonable: This is how backend services work. A web application doesn't have its own identity separate from the user it's serving. Why should an agent?
Why it fails architecturally: Agents make autonomous decisions that services don't. A backend service transforms user requests into system actions—the user is the decision-maker. An agent decides what actions to take. That decision-making authority requires its own identity, its own permissions, and its own accountability.
What it misses: Without distinct identity, you can't constrain agent behavior independently of user behavior. You can't audit what the agent did versus what the user did. You can't revoke agent access without revoking user access. Agent identity is the foundation of agent governance.
Architectural Questions to Ask
Identity and Attribution
- Does every agent have a distinct identity in your identity system, separate from the user or application context?
- Can you trace any agent action back to a specific agent principal with a complete audit trail?
- When an agent acts on behalf of a user, can you distinguish between user-directed actions and agent-initiated actions?
- Do agent identities have defined lifecycles—creation, rotation, revocation?
- Can you answer "which agents accessed this resource in the last 30 days" as easily as you can for users?
Why these matter: Attribution is the foundation of accountability. If you can't identify what an agent did, you can't secure, audit, or improve it.
Tool Access and Composition
- For each tool an agent can access, can you articulate what harm is possible through that tool alone?
- For each pair of tools, have you analyzed what harm is possible through their combination?
- Can you restrict tool access based on task context, not just agent identity?
- Do tools have rate limits, scope limits, or approval requirements for sensitive operations?
- Is there a maximum blast radius for any single agent session?
Why these matter: Emergent privilege from tool composition is the most common source of agent-related incidents. If you haven't analyzed combinations, you haven't analyzed risk.
Memory and State
- What memory does each agent maintain across sessions? Across users?
- Who can access, modify, or delete agent memory?
- How do you detect memory poisoning or context manipulation?
- Do memory retention policies exist, and are they enforced?
- Can users see what an agent "knows" about them?
Why these matter: Memory creates persistence, and persistence creates long-term risk. Unmanaged memory is an unmanaged attack surface.
Feedback Loops and Autonomy
- What is the maximum number of actions an agent can take without human review?
- Are there action types that always require approval, regardless of context?
- How do you detect when an agent is stuck in a loop or behaving unexpectedly?
- Can agents take irreversible actions without explicit approval?
- What triggers human review during autonomous operation?
Why these matter: Unsupervised feedback loops amplify errors at machine speed. Without bounds on autonomy, you've delegated control, not just capability.
Authorization and Boundaries
- Can you articulate, for each agent, what it should never do under any circumstances?
- Are those constraints enforced externally, or do they rely on the model following instructions?
- Do authorization decisions consider the sequence of actions, not just individual actions?
- Can you revoke specific agent capabilities without redeploying the agent?
- How do you handle the gap between what the agent can do and what it should do?
Why these matter: The authorization gap is where incidents happen. If your authorization model doesn't account for emergent behavior, it doesn't account for how agents actually operate.
Observability and Response
- Can you reconstruct the full reasoning chain for any agent action?
- Do your alerts distinguish between unusual agent behavior and unusual user behavior?
- Can you kill an agent session immediately if suspicious activity is detected?
- Do you have playbooks for agent-specific incidents?
- Can you identify all resources an agent touched during an incident?
Why these matter: Agent incidents look different from traditional incidents. If your detection and response capabilities weren't designed for agents, they won't catch agent problems.
Key Takeaways
Agents are principals, not features: Any system that takes autonomous action is a security principal requiring identity, authorization, and accountability. If your agent doesn't have its own identity distinct from users and applications, you've already lost architectural control.
Tool composition creates emergent privilege: The risk of an agent isn't the sum of its tools' risks—it's what those tools can do together. You must analyze tool combinations, not just individual capabilities, and design authorization around composite actions.
Memory is an attack surface: Persistent context creates persistent risk. Memory scope, retention, integrity, and access controls are security requirements, not feature decisions.
Autonomy must be bounded: Feedback loops amplify errors at machine speed. Without explicit limits on iteration, action scope, and irreversibility, you've delegated authority without accountability.
External enforcement, not model compliance: Security constraints must be enforced by systems external to the agent, not by instructions to the agent. The model cannot be trusted to police itself.
The core insight of this chapter connects directly to the lifecycle thesis: agents are an identity and authorization problem, not an AI problem. The organizations that will secure agents are those that apply the same architectural rigor to agent principals that they apply to human principals and service accounts. Those that treat agents as magical features rather than privileged identities will discover, painfully, that autonomy granted is autonomy that can be misused.