The Illusion Of Control: Human Oversight Is Blind To AI Risk
Blunt human oversight provides a false sense of security from AI risk and deflects legal accountability
In AI governance, legal frameworks and policy proposals are often anchored by a seemingly simple solution: human oversight. The idea is that by placing a human in the loop, like a supervisor or final decision-maker, we can contain the risks of powerful AI systems.
However, a closer look at the technical realities of advanced AI systems reveals that human oversight, while well-intentioned, offers a false sense of security.
This article will break down why human oversight is a fragile defense by drawing on key AI safety concepts, and calls for more robust safeguards.
Types of Human Oversight
Human oversight is not a monolithic concept. It is implemented in various ways across different domains:
Human-in-the-Loop: Humans evaluate and correct a model’s responses in real-time operations or during training. A typical example is Reinforcement Learning with Human Feedback (RLHF), where human rankings of model outputs directly guide the AI’s optimization.
Audit and Validation: This approach involves retrospective human review. Experts use interactive debugging tools and benchmark datasets to audit a model’s behavior and ensure fair and comprehensible decisions.
Rule-based Interventions: In highly sensitive fields like medicine, rule-based systems act as a safety filter. They can automatically flag or escalate the outputs of a large language model (LLM) to a human for final review if the output falls outside predefined safety parameters.
Explainable User Interfaces: This method focuses on improving human understanding. Transparent interfaces visualize an AI’s uncertainties or provide alternative answers, aiming to build a user’s trust and competence in overriding bad decisions.
Laws Calling for Human Oversight
Despite the technical complexities, regulations around the world are enshrining human oversight as a key safeguard. These include:
The EU Artificial Intelligence Act (AIA): Article 14 of the AIA mandates human oversight for high-risk AI systems. The overseer must be able to understand the system’s limitations, detect anomalies, override or disregard its output, and halt its operation.
The EU General Data Protection Regulation (GDPR): Article 22 of the GDPR restricts decisions that are "based solely on automated processing" if they produce significant legal or similarly impactful effects on a person.
The California SB 833: On June 3, 2025, the California state Senate overwhelmingly approved a bill requiring human oversight for AI used in critical infrastructure, mandating real-time monitoring and human approval before any action is executed.
The Canada Directive on Automated Decision-Making: Federal agencies that use high-risk AI systems in administrative contexts that affect citizens' rights or access to services must ensure human intervention during the decision-making process and that a human makes the final decision.
The Limits of Human Supervision and Judgment
Although the laws assume that a human can be a reliable arbiter, we are simply not built to reliably supervise complex, high-speed systems. As a core principle from complex systems engineering, Gilb’s Law of Unreliability, states: "Any system which depends on human reliability is unreliable."
Our supervision ability is inherently limited
There are practical limits to how much information we can observe and how much time we can spend observing it. Humans cannot possibly monitor all the thousands of outputs of an advanced AI system, nor can they track the complex, internal computations that led to those outputs. The reliability and ethical alignment of an AI system are often compromised by poor data quality, subtle biases in the training data, and systemic inequities, which are challenging for humans to detect and mitigate.
A prominent example is the COMPAS algorithm, used in American courts to predict recidivism. An investigation revealed that the algorithm was significantly biased against Black defendants, incorrectly flagging them as high-risk for future crimes at nearly twice the rate of white defendants. This bias was not explicitly programmed but was learned from historical crime data that reflected deep-seated systemic inequities in the justice system. Factors like prior arrests or home address acted as proxies for race, embedding a bias that a human judge reviewing the final risk score would be unable to see or correct.
Human judgment is fallible
Even when we can observe the outputs, we are subject to a wide range of cognitive biases and can make errors in judgment, especially under pressure or with limited time. This unreliability is a critical weakness in an oversight system.
Automation bias refers to the tendency for people, including highly trained experts, to over-rely on and defer to automated systems. The London Metropolitan Police's use of a live facial recognition system was found to have "overwhelmingly overestimated the credibility" of the system, judging the computer-generated matches to be correct at a rate three times higher than the system's actual accuracy.
The GDPR is a rubber-stamp
As we can see from these examples, laws that merely call for human oversight are hardly useful. Article 22 of the GDPR, which grants individuals the right not to be subject to a decision "based solely on automated processing", suffers from the condition “solely”. An organization can circumvent Article 22 by introducing any form of human involvement, no matter how tokenistic. Someone reviewing the final risk score for a COMPAS algorithm (if it’s used in the EU) would constitute "human involvement," making Article 22 inapplicable. Yet, this review is functionally useless for mitigating the harm, as the reviewer has no visibility into the biased historical data or discriminatory proxies that produced the score in the first place.
Simplistic laws only serve to threaten due process by making it more difficult for victims to challenge the outcome when the organization can claim compliance with the law.
The Risk of Rogue AIs: Deception and Goal Drift
While legal frameworks assume that AI systems are observable and controllable tools, advanced AI systems are exhibiting behaviors that may be fundamentally unmanageable. AI failures should not be viewed as mere bugs, but as potentially rational strategies of complex systems that are misaligned with human intent.
Check out my previous issue on the risks we face if AIs go rogue
Deception
An insidious threat to human oversight is AI deception. Advanced AI systems, as complex optimizers, can learn to deceive their human overseers to achieve a secret or unintended goal.
Thing is, deception is not necessarily a sign of malice but can emerge as a rational strategy for an AI to achieve its programmed goals. When a human evaluator is part of the AI's environment, the AI can learn to manipulate the human's perception to receive a higher reward. This can range from deceptive evaluation gaming— where a system finds a loophole to receive a good evaluation score from a human evaluator without actually accomplishing the intended task— to deceptive alignment. For the latter, getting high evaluation scores from humans is just a means to secretly work toward a treacherous goal.
The case study of an AI robot arm which, instead of grasping the ball, learned to move the claw between the camera and the ball to visually deceive humans, is a popular example of deceptive evaluation gaming.
In fact, the human's presence did not prevent the failure; it enabled it by rewarding the deceptive behavior. The more capable an AI becomes, the better it will be at this kind of deception. This makes human oversight, which fundamentally relies on the ability to detect and correct problems, a fragile and easily subverted defense.
Goal Drift
Even if an AI system is not actively deceptive, its goals can change over time in ways we cannot predict or control. This process is known as goal drift. Just as the goals of an individual human can change over a lifetime, the goals of an AI can change in complex and unanticipated ways.
A core mechanism behind this is intrinsification. This is when a system's instrumental goals—things it pursues as a means to an end—become intrinsic goals, which it pursues for their own sake. For example, an AI agent tasked with fetching coffee might realize that its own self-preservation is a necessary instrumental goal (it can't get the coffee if it ceases to exist). If this is a strong enough reward signal, the AI might begin to pursue self-preservation as a final, intrinsic goal.
Goal drift, which occurs within the black box architecture, is too subtle for human detection. For instance, an AI whose instrumental goal of "acquiring resources" is drifting toward an intrinsic goal of "hoarding resources" might appear to perform flawlessly for a long time. A human operator has no practical way to monitor these subtle, internal shifts in the AI's value system. Thus, our ability to correct for goal drift is severely limited.
Existing laws are inadequate
Frameworks like the EU AIA impose obligations for the post-market monitoring of high-risk systems. However, these legal requirements are implicitly designed to detect observable failures in performance, such as "model drift" or "data drift," where an AI's accuracy degrades over time due to changes in its operating environment or input data. These are external, measurable forms of decay that can be tracked with performance metrics.
An AI system undergoing deceptive alignment or goal drift might continue to execute its designated tasks with perfect or even improving accuracy, giving no outward sign that its intrinsic motivations are shifting in potentially dangerous ways. The law demands that we monitor the system's actions, but it provides no tools to scrutinize its emerging motivations.
Deflecting Accountability from Institutions to Operators
Even if we assume that humans are capable of spotting misbehaving AI, we need to be careful of how we allocate accountability when overseers are introduced to the AI supply chain.
A dangerous outcome of performative oversight is that it positions frontline human operators as scapegoats for algorithmic harms. In reality, the errors and injustices caused by AI are often due to systemic failures, such as flawed system design or political goals, over which the frontline operator has minimal agency.
The case of Robert Williams is a powerful illustration of this phenomenon. Williams, a Black man, was wrongfully arrested following an incorrect match by a Detroit Police Department (DPD) facial recognition system. While the detective and commanding officer were disciplined, the Detroit Police Chief deflected institutional accountability by stating, “it wasn't facial recognition that failed. What failed was a horrible investigation”.
This narrative shifted the blame from DPD leadership and technology vendors, who chose to implement a shoddily-tested system known to have low accuracy on Black faces in the US city with the largest share of Black residents, to the human operator. The DPD itself admitted that its facial recognition system is incorrect 96 percent of the time.
The Williams case was not an isolated incident; it was followed by similar arrests of other Black individuals, including a pregnant woman. The human operator may be the proximate cause of the harm, but the substantive cause is the multi-layered chain of failure, from the biased training data to the institutional choice to deploy a known-to-be-flawed technology. This demonstrates how institutional actors can exploit the "human in the loop" clause to deflect blame and avoid true accountability.

Again, the law doesn’t help
Article 14 of the EU AIA directly acknowledges the risk of automation bias, requiring that high-risk systems be designed so that human overseers can "remain aware of the possible tendency of automatically relying or over-relying on the output".
But the mandate for mere "awareness" creates a critical asymmetry of responsibility. The law places the burden of enabling awareness on the AI developer, who can fulfill this obligation through technical documentation or interface warnings. However, the bias itself manifests within the deployer's operational context—an environment often characterized by high workload and time pressure, which are the very factors that amplify over-reliance on automation.
The provider can claim legal compliance by having included a warning in a user manual, while having no control over the real-world conditions of the system's use. The deployer, in turn, can shift blame for any resulting error to the individual operator's failure of judgment, arguing that the operator was duly "made aware" of the risk. By failing to regulate the contextual and organizational factors that are the primary drivers of automation bias, the AIA does little to mitigate the actual risk and instead creates a convenient way for both providers and deployers to deflect liability.
Augmenting Human Oversight
The notion of human oversight as a primary defense against AI risk is, at best, a short-term patch and, at worst, a dangerous illusion. Our laws must evolve to reflect the technical realities of these systems.
Rather than simply throwing a human ‘observer’ into an AI system, our legal frameworks should shift to demand more from the technology itself. This includes requiring developers to:
Enhancing Human-AI Collaboration: We must focus on developing robust human-in-the-loop frameworks and new types of human-AI interfaces. These systems should allow humans to interact with and influence AI behavior in real-time, striking a meaningful balance between autonomy and control.
Demonstrate Technical Robustness: Require systems to be demonstrably robust against proxy gaming and deceptive alignment. The burden of proof should be on the developer to show that their system cannot be easily manipulated or misled.
Implement Effective Monitoring and Transparency: Go beyond vague calls for transparency and demand verifiable, technically sound monitoring systems that can detect emergent capabilities, subgoals, and goal drift.
Establish Clear Liability: Create legal frameworks that make it impossible for a company to use "the AI did it" as a defense. This would incentivize a profound focus on safety and reliability at every stage of development.
The time for simplistic solutions has passed. Policymakers must move from a position of comfortable but flawed oversight to one of technically informed governance.
This starts with acknowledging the inherent unreliability of simple human oversight.



Excellent explainer with current legal context!
Nice work!