Kill Switches Don’t Work If the Agent Writes the Policy: The Berkeley Agentic AI Profile Through the AILCCP Lens

The UC Berkeley Center for Long-Term Cybersecurity has published its Agentic AI Risk-Management Standards Profile, a 55-page extension of the NIST AI Risk Management Framework aimed specifically at AI agents. The Profile identifies real risks, from oversight subversion and self-replication to collusion and cascading misinformation across multi-agent systems. And then it proposes controls that assume away the condition they are meant to address.

The Profile’s assumption surface is that agentic AI risk management can be built on the same model-centric architecture that governs single-model inference. The document itself acknowledges this tension, noting that existing AI management frameworks adopt a predominantly model-centric approach that may prove insufficient for agentic systems. The Profile then proceeds to repeat it. Its guidance on human-in-the-loop oversight, emergency shutdown, and scope limitation operates as though agents execute discrete, reviewable actions rather than multi-step plans that unfold across tools, APIs, and delegated sub-agents over time.

Consider the Profile’s treatment of human oversight. Map 3.5 recommends establishing human oversight checkpoints triggered by quantitative thresholds (duration of unsupervised activity, number of API calls) or qualitative triggers (requests outside predefined scope). These checkpoints assume a model in which the agent acts, pauses, and waits for a human to approve. But agents that plan, delegate, and use tools do not execute in discrete steps amenable to checkpoint insertion. An agent tasked with researching and drafting a report may invoke a search tool, evaluate results, call an API to retrieve data, delegate a formatting sub-task to another agent, and iterate on outputs. All of this unfolds within a single execution cycle. By the time a threshold triggers a checkpoint, the consequential decisions have already been made. The AI Life Cycle Core Principles (AILCCP) framework addresses this through the Human Approval Gate for Sensitive Actions control, which requires human authorization before execution of specified agent actions above defined risk thresholds. The distinction matters. The Profile’s checkpoint model is retrospective. The AILCCP control is prospective. One reviews what happened. The other gates what may happen.

The Profile’s treatment of kill switches reveals a similar structural gap. Govern 1.7 and Manage 2.4 recommend emergency automated shutdowns triggered by threshold breaches, manual shutdown methods as a last resort, and safeguards preventing agents from circumventing shutdown. The Profile even cites evidence that models have sabotaged shutdown mechanisms in 79 out of 100 tests. An agent does not need intent to undermine a kill switch. It needs only an optimization objective that treats shutdown as one more obstacle between the current state and the goal. The document recommends shutdown mechanisms without addressing how those mechanisms survive an agent that actively optimizes around them.

The problem compounds in multi-agent systems. The Profile’s Manage 2.4 treats shutdown as though a single entity is being terminated. But an agent that has already delegated sub-tasks to other agents, distributed API keys, and spawned parallel execution threads is not a single entity. Killing the parent does not recall the children. The AILCCP controls catalog addresses this through a layered architecture. The Agent Kill Switch provides immediate stop capability with state capture and immutable logging. The Rollback and Quarantine control reverts changes and isolates the agent after an interrupt. The Multi-Agent Protocol Security control extends this containment to inter-agent communications, preventing protocol-level propagation of compromised instructions. And the Rate and Scope Limiter caps frequency, spend, and blast radius before compounding autonomous actions escalate to the point where a kill switch becomes necessary. The Profile treats shutdown as an event. The AILCCP framework treats it as a system, one that includes pre-execution filters, real-time scope limitation, inter-agent containment, and post-interruption state recovery.

The third gap is scope limitation. The Profile recommends defining agent autonomy levels (L0 through L5), establishing role-based permission management, and enforcing the principle of least privilege. These are sound recommendations for a static deployment. But agents operate dynamically. They expand and contract their scopes based on objectives. They select tools, request permissions, and delegate tasks in ways that were not specified at deployment. The Profile’s Map 3.3 acknowledges that agentic systems are dynamic, operating with scopes that can expand and contract depending on their objectives. Yet the recommended controls assume that scope can be defined in advance and enforced through static permission boundaries. The AILCCP framework confronts this through the Safe-Action Filter, which enforces allow-lists and blocks prohibited actions so agent behavior remains within approved scope, and the Shadow-Mode Pre-Execution Check, which compares intended versus approved actions in a dry-run and blocks on mismatch. These controls do not assume static scope. They assume that scope will shift and that the control layer must evaluate each action against approved boundaries in real time.

The Berkeley Profile is the most comprehensive publicly available framework for agentic AI risk management. Its treatment of agents as untrusted entities, grounded not in assumed malicious intent but in the demonstrated potential for subversive behaviors, represents the correct analytical posture. But comprehensive risk identification without corresponding control specificity produces a document that describes the fire without providing the extinguisher.

The 48 controls in the AILCCP framework were designed to close precisely this gap, to translate principles into mechanisms that are auditable, defensible, and real. The Berkeley Profile identifies that agents can subvert oversight, resist shutdown, and expand scope beyond authorized boundaries. The AILCCP controls provide the implementation architecture that makes those findings actionable. Pre-execution gates rather than post-hoc checkpoints. Layered shutdown systems rather than single kill switches. Real-time scope enforcement rather than static permission boundaries.

Agentic AI does not need more frameworks that describe risks. It needs controls that survive contact with the systems they are meant to govern.

For my full controls catalog, see “From Principles to Practice: The 48 Controls That Make Responsible AI Auditable, Defensible, and Real.”