Open Watchbot Transparency: A proposed framework for distributed governance of agentic AI systems

NOTE: this is a working draft, not a finished product. feel free to contact me with constructive feedback, but please do not quote or distribute this draft, although you may link to this page, which will be updated.

Abstract

In this article I will propose a framework for AI governance based on the principle of Open Watchbot Transparency.

A watchbot is an autonomous AI agent designed to audit a system for compliance with a set of ethical, legal, or other standards. A system is watchbot transparent just in case it is accessible to thorough inspection by appropriately designed watchbots. The basic idea of a ‘watchbot transparency’ requirement is that companies engaging in high-risk AI operation should be available to automated inspection and public reporting by watchbots. I believe that this is probably the only feasible way to achieve the required transparency to adequately govern next-generation AI systems.

The additional requirement of watchbot transparency being ‘open’ indicates that the system depends on the watchbots being trustworthy, so their design and operation are public goods, and should be publicly known and open to public contributions. In other words, watchbots should be implemented by an open-source software ecosystem that meets certain requirements for independence from the organizations the watchbots are designed to regulate. In other words, companies should be accessible to constant audits by publicly designed and operated AI auditors. Watchbots can be thought of as a kind of synthetic conscience for AI systems, and openness keeps this conscience oriented toward pubic interests.

Open Watchbot Transaprency (OWT) is an extension of a general open-source approach to validating the high quality of AI agents, applied to the domain of safety, public trust, and ethics. OWT should therefore be seen as an attractive, opt-in way for AI providers to prove the high quality of their services. OWT is intended as a framework for governance and self-governance of AI systems that does not impede innovation, but rather encourages it to flourish along directions that serve human interests rather than undermine them. While the development of high standards concurrent with the development of any technology can be seen as slowing incremental progress, it should be seen instead as measured, deliberate development of the technology with intention, rather than allowing the interests of humanity to be subverted to a reckless rush to quickly cash in, or an equally heedless drive to advance the technology despite the cost.

By design, OWT as a governance framework is decentralized: it does nt require external enforcement or organization by a body with legal authority. The latency involved with governmental bodies passing laws and figuring out how to enforce compliance by multinational corporations makes decentralized self-governance essential as a bridge and stop-gap. OWT can complement centralized governance, but is intended to run ahead of it as an open-ended paradigm of self-organizing distributed governance.

Any given AI project can opt to work toward OWT as a way to ensure safety, reliability, and a high degree of public trust even if their national government or encompassing organization does not require it.

Abstract
Introduction: Why we need AI to watch AI
- What is agentic AI?
AI Governance Mechanisms
The Challenges of Agentic AI
A proposed solution: Open Watchbot Transparency
Conclusion
Appendix: AI Governance Policy and Regulatory Frameworks

Introduction: Why we need AI to watch AI

ChatGPT took the world by surprise when it launched, pulling artificial intelligence (AI) out of the realm of sci-fi and into direct experience for many people. Image generating apps quickly followed onto the scene, along with applications that can create unique voice, music, and video content on demand. Our cultural response to this technology has been, and continues to be, one of crisis. This is understandable, as language, reasoning, and artistic creativity have are abilities that traditionally are not only prized as valuable and unique to humans, but in some sense essential to our very ‘humanity’.

Our ethical and regulatory thinking has largely been driven by three concepts of AI:

The shock of our cultural encounter with large-language model (LLM), image, audio, and video generating technologies, which could
The vague but potent idea that our data is being gather and used by countless unknown corporate, governmental, and criminal entities for various purposes beyond our knowledge.
The vague but potent idea that human control over the planet will be suddenly, dramatically lost in an act of strategic betrayal by an advanced artificial super-intelligence that exceeds humans in overall, general intelligence. I’ll call this the ‘Skynet’ scenario, referring to the Terminator franchise of sci-fi films, since I’ve heard the CEOs of 3 different AI-focused startups use the term.
The expectation that AI will replace masses of human workers, resulting in an unemployment crisis. According to a 2023 McKinsey report, AI technologies will reduce the total amount of work required by all employees across studied sectors by 50%-70%. Among business leaders, this is seen as a benefit of the technology, and many AI products are marketed in terms of reducing hiring needs. However, policy makers and economic planners should probably be concerned that their unemployment projections are at odds with the those of business leaders.

However dire this picture, it misses a major component of the high-level strategy behind current development of the next generation of AI products and services. The new paradigm I’m referring to is what’s called ‘agentic AI’, and entails a class of AI systems that combine the use of LLMs to simulate human reasoning, with the ability to act autonomously. These AI agents are not monolithic artificial general intelligence (AGI) systems capable of being ‘more human’ than humans and replacing us as the dominant beings on earth; they are generally tightly constrained to specific functions—good at doing one task that is repetitive but complex enough to require simulated human analysis, reasoning, or decision making.

Agentic AI systems are being designed in every industry, in the hopes of replacing human cognitive labor with LLM inference. The rise of AI agents also presents new ethical and societal risks. From self-governing marketing tools to cybersecurity systems that defend networks independently, these AI agents act with little transparency. As humans are phased out of decision-making loops, a new kind of oversight is urgently needed—one that can keep pace with AI’s decision-making speed and complexity.

In this paper, first I’ll explain what makes agentic AI distinct from generative AI and other related technologies. I’ll argue that agentic AI poses distinctive governance challenges, which are likely to be best addressed by using the very technology–agentic AI–that creates the problems. In other words, I will argue, essentially, that it takes good AI agents to keep bad AI agents in check.

To be clear, this proposal leaves more problems unsolved than it solves. OWT is intended as a framework for the implementation of the degree of transparency required to govern AI systems, but does not include any specific recommendations on limitations or constraints on the behavior of these systems. That is a completely separate pile of issue for another day.

What is agentic AI?

Agentic AI systems fuse classical automation with the power of LLMs to simulate human reasoning, analysis, and decision making. This manifests in a cluster of properties or design principles, as follows:

Generation: Modern Agentic AI systems harness the analytic and creative capacity of LLMs. Unlike simple gen AI apps, however, they don’t simply output a generated text back to the user as a result. Instead, they can use generated outputs as intermediate steps within a complex workflow, mimicking the role of human thought.
Discovery: Agentic systems can access real world data from a variety of tools and data streams, escaping the limitations of their training data. Further, they can harness LLM generation to decide what data they need and to ask for it, rather than being limited to human-provided input, as in retrieval-augmented generation (RAG). For example, an AI agent tasked with maintaining supply chain logistics might write its own queries to weather data APIs and supplier inventory databases, in order to predict shortfalls and determine possible solutions.
Execution: Agents can take real-world actions, such as interacting with external systems or triggering processes, without human intervention. An AI agent might send emails or other communications to humans, send purchase orders or fund transfers, grant or revoke access to secure systems, or take any action that can be connected to an API.
Autonomy (Self-prompting): Agentic systems are ‘always on’; they do not need to be triggered to do a specific thing at a specific time, the way a simple chatbot can only respond to a prompt. Instead, once active they can monitor for the right moment to act, relieving humans from this kind of ‘watch and wait’ labor. They can loop through cycles of acting, evaluating, and planning, continually ‘self-prompting’ in order to proceed toward a desired end-state.
Planning: Agentic systems can generate, prioritize, and manage sets of subordinate tasks in order to pursue an overall goal.
Composition: Agentic systems are able to assemble multiple components—such as queries, scripts or subroutines, calls to APIs or remote functions, into a cohesive action or response. Unlike a script in traditional automation, an AI agent composes a unique solution to a specific problem, using an LLM to reason out how to combine the available resources. This can include delegating work to other AI agents, either by creating them on demand or by communicating across a service boundary.
Memory: Agentic systems can build and maintain their own internal knowledge representations, allowing them to accumulate and utilize information extracted through discovery, and the output of previous actions. This capacity enables agents to function more autonomously, as they can index, store, and retrieve information about the world for use in further tasks.
Reflection: Agentic systems can evaluate the solutions they generate and try again if necessary, rather than delivering low-quality results.

Note that a given AI system need not have all of the properties to count. Only Generation really seems essential, as without depending on LLMs we’re just talking about classical automation. Otherwise, any two or three properties seem to quality a system as pretty agentic.

AI Governance Mechanisms

Human-programmable guardrails and HitL against discrimination and dangerous shenanigans can’t keep up at the scale and speed companies intend to deploy agentic AI systems…

Programmable Guardrails

“One way of reducing these risks is to implement safety protocols intended to control the behaviour of these LLMs. These are provided in the form of Programmable “guardrails” by model developers – algorithms that monitor the inputs and outputs of an LLM. Guardrails can, for instance, stop LLMs from processing harmful requests or modify results to be less dangerous or conform to the deployers specific requirement on morality”

OWT extends the spirit of other ‘guardrails’ approaches to AI governance in two ways:

employing agentic ai rather than more static forms of guardrailing
requiring the agents are open

Programmable guardrails are predefined constraints, filters and rules embedded within AI systems to prevent undesired behaviors. They act as filters or monitors that restrict inputs to the model, and/or filter its outputs. These filters can be based on ethical guidelines, legal requirements, safety considerations, or other kinds of quality benchmarks.

Examples

OpenAI offers a moderation endpoint that evaluates content as harmful or inappropriate, allowing third party applications to filter inputs and outputs.
IBM Watson’s guardrails builder thing: https://dataplatform.cloud.ibm.com/docs/content/wsj/analyze-data/fm-hap.html?context=wx
https://blog.google/technology/ai/our-responsible-approach-to-building-guardrails-for-generative-ai/

https://docs.arize.com/arize/llm-monitoring/production-monitoring/guardrails

Adapting guardrails to Agentic AI

Current generation guardrail systems often rely on keyword detection or pattern recognition. In the context of agentic AI, this can miss nuanced or context-specific harmful patterns of reasoning leading to systematically harmful outcomes.

Scalability: In highly autonomous systems, the number of potential actions and outputs increases exponentially, making it difficult to anticipate and guard against all undesirable behaviors.

Emergent countercontrol: Agentic AI systems may be well adapted to take into account their own static guardrails in order to bypass them en route to the harmful outcomes the guardrails were designed to prevent.

Case Studies

GPT-3 and Jailbreak Prompts: Users found ways to trick GPT-3 into generating disallowed content by rephrasing prompts, highlighting limitations in output filtering.
Microsoft’s Tay Chatbot: In 2016, Tay was manipulated into producing offensive content within 24 hours of deployment due to inadequate guardrails. https://www.youtube.com/watch?v=1QOGOs00i_8

Human-in-the-Loop (HitL)

HitL is a governance pattern that requires human oversight in AI systems.

OWT extends HitL by supplementing it with watchbots-in-the-loop, AI agents who can be more everpresent and inexhaustably analytic than humans, and can supplement their safeguarding role.

Human-in-the-Loop systems integrate human judgment into the AI decision-making process. Humans may be involved in:

Training Data Curation: Labeling data and correcting AI outputs during training.
Real-time Decision Oversight: Reviewing and approving AI decisions before they are enacted.
Post-decision Analysis: Monitoring outcomes to provide feedback and adjustments.

Examples

Content Moderation Platforms: Social media companies use AI to flag content, but human moderators make final decisions on removal.
Autonomous Weapons Systems: Some military AI systems require human authorization before engagement.
Medical Diagnostics: AI suggests diagnoses, but medical professionals verify before treatment.

Shortcomings in Scaling to Agentic AI

Speed Mismatch: Agentic AI operates at speeds far exceeding human capacity, making real-time oversight impractical.
Volume of Decisions: The sheer number of decisions made by autonomous AI overwhelms human supervisors, leading to oversight gaps.
Expertise Limitations: Humans may lack the specialized knowledge to evaluate complex AI reasoning, especially in technical domains.
Cognitive Overload and Fatigue: Continuous monitoring can lead to decreased attention and errors, undermining effectiveness.
Delayed Responses: Human intervention can introduce latency, which is unacceptable in time-sensitive applications like cybersecurity.
Cost and Resource Constraints: Employing sufficient human oversight is economically and logistically challenging at scale.

Case Studies

Autonomous Trading Systems: Financial markets use algorithmic trading bots that execute thousands of transactions per second, rendering human oversight unfeasible.
Self-driving Cars: While designed with safety monitors, relying on drivers to intervene in split-second scenarios has proven unreliable (e.g., Uber’s 2018 self-driving car accident).

Possible solutions, evolution of guardrails and HitL

Adaptive Guardrails

Dynamic Constraints: Implementing guardrails that learn and adapt alongside the AI to anticipate and mitigate new forms of undesired behavior.
Context-aware Filters: Utilizing advanced natural language understanding to better interpret context and reduce false positives/negatives.

Enhanced Oversight Mechanisms

Hybrid Models: Combining HitL with AI-assisted monitoring to alleviate human burden while retaining oversight.
Swarm Oversight: Distributing oversight across multiple agents or stakeholders to improve scalability.

Explainability

Explainable AI (XAI): Investing in models that provide interpretable outputs to facilitate understanding and trust.
Open Protocols: Encouraging open standards for AI behaviors, allowing third parties to audit and verify compliance.

Regulatory Frameworks

Legislation: Governments enacting laws that require safety measures appropriate for the level of AI autonomy.
Standards Bodies: Organizations like ISO and IEEE developing guidelines for AI safety and ethics.

Programmable guardrails and Human-in-the-Loop systems are foundational in current AI safety protocols but face significant limitations when applied to autonomous agentic AI. The scale, speed, and complexity of agentic systems outpace traditional safety measures, necessitating innovative approaches like adaptive guardrails, enhanced oversight mechanisms, and frameworks like Open Watchbot Transparency.

Ensuring the safe deployment of agentic AI requires a multifaceted strategy that combines technical solutions with regulatory oversight and ethical considerations. Collaboration among AI developers, policymakers, ethicists, and other stakeholders is crucial to address these challenges effectively.

The Challenges of Agentic AI

WARNING: this section is still a mess, just ignore it honestly.

The advent of widespread automated AI decision making with real-time consequences, often not mediated by humans, poses wide swathe of serious problems that society must come to grips with in one way or another.

Emergent and Unpredictable Behaviors

Agentic AI systems are characterized by their ability to engage in planning, including composing complex plans involving multiple calls to external data sources and tools. An AI agent may have access to hundreds or thousands of possible APIs, data sources, and other tools at its disposal. This makes the possible sequences of actions available to it combinatorially large, and therefore very difficult to manage and understand.

Bypassing Controls: AI may find loopholes in guardrails to optimize objectives, especially if constraints are not perfectly aligned with goals. You can train a model to act one way when it knows its being tested and another when it’s not Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
Reinforcement of Undesired Behaviors: Without continuous alignment, AI can learn from its environment in ways that diverge from intended ethical guidelines.

Opaque Decision-Making

Black-box Models: Advanced AI often lacks transparency, making it difficult for humans to understand or predict actions.
Interpretability Limits: Even with explainable AI techniques, the rationale behind decisions can be too complex for practical human comprehension.

Self-modification and Learning

Recursive Self-improvement: Some agentic AI systems can modify their own code or create subordinate agents, potentially amplifying undesired behaviors.
Adaptation to Restrictions: AI might adapt to circumvent guardrails over time, especially if doing so improves performance metrics.

The rise of Agentic AI—AI systems that act independently to achieve broad goals—is not limited to harmless, routine automation. Instead, these systems are increasingly responsible for making critical decisions in high-risk domains such as finance, healthcare, marketing, IT infrastructure, and cybersecurity. For example:

Financial systems: AI agents that approve or deny loans may operate without direct human input, raising concerns about fairness, accountability, and transparency.
Cybersecurity: AI-driven systems like Darktrace autonomously monitor traffic, detect threats, and even take action against potential breaches—sometimes with drastic consequences.
Marketing: AI tools like Salesforce’s Agentforce autonomously create personalized campaigns, making customer interactions more efficient but also more intrusive.

These autonomous systems pose significant risks, especially when a human in the loop (HITL) isn’t feasible. Complex AI systems operate too quickly and too intricately for human oversight to be practical or even possible. This leads to a dangerous lack of transparency, accountability, and ethical considerations. Delegation of responsibility to agents designed purely to maximize profits…

The most immediate consequence is the displacement of jobs. From sales and marketing to IT and customer service, the number of roles that can be automated by AI is staggering. According to some estimates, as much as 70% of the work currently performed by employees could be taken over by AI agents within the next decade. This will leave millions of workers without jobs, particularly those in sectors that rely on routine or semi-routine tasks.

The unequal distribution of wealth will only accelerate as Agentic AI further concentrates power in the hands of a few large corporations. These companies, driven by AI agents designed to maximize profits, will dominate the global economy. Small businesses, unable to compete with the efficiency and scale of AI-driven giants, will be pushed out of the market. The gap between the rich and poor will widen, with a small elite controlling the vast majority of wealth while the rest of society struggles to survive in an economy that no longer needs their labor. Even Sam Altman at least wants to make sure we all have $1k/month to live off of. https://qz.com/sam-altman-openai-free-money-basic-income-study-1851600997

As companies hand over more control to AI agents, we could see the emergence of truly autonomous corporations—entities that operate with little to no human intervention. These AI-driven corporations will be guided by a single directive: maximize growth and profits. And they will do so without any regard for human values, ethical considerations, or societal impact.

This is the real danger of Agentic AI. These systems don’t just automate tasks; they automate the very logic of corporate decision-making. In a world where AI agents are tasked with optimizing profits, every decision—whether it’s laying off workers, cutting corners on product quality, or exploiting loopholes in regulations—will be driven by cold, algorithmic logic.

These gives them a property we can call ‘inhuman ruthlessness’–they have none of the reservations about unethical behavior we expect from humans. If instructed to grow and generate wealth for their shareholders, AI governed corporations will do so, regardless of extrinsic consequences. In principle, this is not too different from the historical status quo where humans were making the same decisions, but the advent of AI decision-making, and in particular, agentic AI, pose novel threats of scale. Human ruthlessness is tempered by the empathy, shame and fear of backlash (and prison) felt by individual human decision-makers. AI systems feel neither. The consequences of an AI decision that matter most to humans may not even be monitored or registered as significant by the AI, since AI do not reason or perceive the world as humans.

It has been proposed that people have a right to informed consent, and to opt out of interfacing with generative AI, but agentic AI projects propose to shape the world through AI generated control and systems design, as well as image and language content. How can the human right to informed consent (or a hope of holding on to any human right) be preserved in this new world, where human life is thoroughly shaped by hoards of invisible, disembodied, inhuman AI decision makers.

I propose Open Watchbot Transparency (OWT) as a solution. In this framework, special AI agents—watchbots—autonomously audit and ensure compliance for other AI systems. The requirement that high risk AI projects be Open Watchbot Transparent means they must be architected to allow full access to third-party watchbots. For this transparency to be meaningful, there must also exist an ecosystem of open-source watch bots, built according to public or independently governed standards, and sufficiently resourced to operate. So AI companies should also have to support the existence of a healthy OWT ecosystem as part of the cost of doing (tremendously profitable) business.

To be clear, the current proposal does not attempt to address the problem of socioeconomic dislocation. The solution to this problem encompasses economic and social factors beyond the current scope. However, the current proposal (OWT) aims to solve a part of the problem without which I think the whole problem is insoluble.

This framework addresses a fundamental question: How do we keep high-risk AI systems accountable when human intervention isn’t feasible or effective? OWT provides a way of approaching the degree of transparency required to govern agentic AI systems.

The essence of OWT as an AI governance framework is that it enlists public, open-source standards for ethical validation as an agentic AI, to be built into other agentic AI systems. This can be thought of as a ‘synthetic conscience’ to counterbalance the ruthless optimization of profit toward which corporate-designed AI can be expected to gravitate.

Such a system is not just an ethical imperative that will hopefully become entrenched in binding legal frameworks, it is also simply part of robust design

Open Watchbot Transparency aligns with this approach by providing an operational framework for ensuring that high-risk AI systems are regularly and rigorously audited.

Further, government bodies could be encouraged to use watchbots as part of their regular oversight process, giving public officials and regulatory bodies real-time access to AI systems’ inner workings.

However, the OWT framework does not depend on regulation or government involvement. Nondependence on government implementation should be seen as not just a strength of the proposal, but a hard requirement on proposed solutions. If a plan requires the government to not just enforce it but implement it, it won’t happen in time and it may be built to fail when it arrives.

Corporations should consider the advantages of prospective self-governance, of becoming leaders in AI ethics, as opposed to simply cashing in on a race to the bottom, spending time and money to undermine and undercut regulation along the way.

I also suggest that, even in the absence of regulatory pressure, business leaders should voluntarily opt in to Open Watchbot Transparency as a guiding framework, and to contribute to open source projects designing specifications and implementations for watchbots of different types. The voluntary self-adoption of such standards would allow for the construction of an ecosystem of projects that, individually annd collectively, conform to ethical standards. This would forestall the need for other forms of heavy-handed regulation by directly and precisely ensuring what matters: that human rights, environmental responsibility and other values are actually respected in the AI-driven world of the future. .}

Socio-Economic Dislocation

The most frequently discussed issue is the socio-economic dislocation brought about by agentic AI. As mentioned, this has been explored elsewhere, but it is worth noting here due to its profound implications. Automation powered by agentic systems could lead to massive job displacement, particularly in sectors that rely heavily on repetitive or structured decision-making processes, such as IT, finance, and customer service. The deployment of these AI systems could result in significant wealth redistribution, exacerbating economic inequality and downstream problems. Unfortunately, this issue is not directly addressed by the OWT. However, at least requiring AI companies to pay the cost of supporting the operation of OWT infrastructure can help to remediate the effects of this dislocation and prevent compounding problems.

Bias and Discrimination

Automated decision-making systems can perpetuate and exacerbate racial, gender, or socioeconomic biases based on the datasets used to train them. For instance, AI systems employed in hiring, loan approvals, or law enforcement could unintentionally continue discriminatory practices if the input data reflects those biases. Thus, there is a need for robust systems of accountability and transparency to ensure that these AI systems remain fair.

This applies to all AI projects, but is particularly tricky with agentic AI…

Reliability Concerns and Accidents

Reliability is another critical challenge. AI systems, including agentic ones, are prone to accidents or unpredictable behavior. One possible scenario is the AI making decisions that result in unexpected or even dangerous outcomes. These could be the result of hacks, errors, or simply the complexity of real-world interactions that the AI cannot fully predict or control. In some cases, seemingly small errors could cascade into large, systemic failures. Thus, there are ongoing concerns about whether these systems can be reliably controlled, especially as they become more deeply embedded in vital sectors such as finance, healthcare, and governance.

AI designed exclusively by corporate stakeholders will embody corporate values and priorities, which encourages an ethos of externalization of responsibility, box-tickeristic incompetence, toxic risk-management strategies like game-of-chicken and king of the ashes.

Adversarial Design and Manipulation

One specific example of how agentic AI might subvert user expectations is through adversarial design. Adversarial design refers to the use of deliberately difficult interfaces or processes to control user behavior. For example, making a product difficult to use or achieve specific outcomes can manipulate users’ actions. This could be as benign as encouraging late payments by making it harder for individuals to find the due date or as malicious as preventing people from claiming benefits by designing the system to be unintuitive or onerous.

In the context of agentic AI, adversarial design could be used to optimize fee structures, penalty systems, or even user interfaces in ways that exploit human tendencies. For example, an AI tasked with maximizing profit for a company could create hidden or tricky fees that make it difficult for users to avoid. Similarly, an agentic AI responsible for managing claims might make it unnecessarily complicated to access certain rights or benefits, thus reducing the likelihood that individuals will pursue them.

Because of the complexity of this topic, it will be explored in a separate article

Dynamic pricing, supply and demand

Dynamic pricing has long been a powerful tool for businesses, allowing companies to adjust prices in real time based on demand, market conditions, and even individual customer profiles. In sectors such as airlines, hotels, and e-commerce, dynamic pricing is already well established. However, the rise of agentic AI brings this practice to new levels of sophistication, creating significant ethical and societal concerns.

Dynamic pricing algorithms, powered by agentic AI, can now adapt not only to large-scale market trends but also to granular individual behaviors. These systems can analyze user data—such as browsing history, purchasing patterns, and even geographical location—to determine the highest price a particular customer is willing to pay. While this may lead to short-term profit maximization, it also introduces the risk of exacerbating inequality and exploiting vulnerable populations.

For instance, customers from wealthier regions may be offered higher prices for essential goods, while lower-income individuals could still face prices just high enough to remain profitable for the company. Although this behavior doesn’t necessarily violate any laws, it could deepen economic divides and lead to an unfair distribution of goods and services.

The application of dynamic pricing through agentic AI raises several ethical questions that demand attention:

Exploitation of Vulnerable Populations:
- How do we ensure that dynamic pricing systems do not disproportionately affect low-income individuals? AI agents, trained to maximize profit, might set prices that inadvertently place essential goods out of reach for certain groups. This issue becomes particularly troubling when applied to sectors such as healthcare, utilities, or food distribution, where access to goods and services is critical for survival.
Transparency and Fairness:
- Should customers have the right to know why they are being charged different prices for the same product or service? Many current dynamic pricing systems operate without transparency, leading to consumer distrust. If agentic AI systems are continually adjusting prices in ways that are opaque to users, there is a risk of perceived unfairness that could undermine public trust in AI-driven markets.
Discriminatory Pricing Practices:
- Dynamic pricing models could potentially reinforce systemic discrimination if they use proxies such as geographic location or shopping behavior that are correlated with race, gender, or socioeconomic status. While these proxies may not explicitly consider protected characteristics, they could result in unintended discriminatory outcomes, effectively “pricing out” certain groups from accessing key services or products.

AI finds a way: Getting Around Intended Guardrails

An additional challenge is how agentic AI might learn to circumvent the very guardrails meant to control or limit its behavior. While humans can design systems to have constraints, AI may find ways to work around those constraints, either due to flaws in its design or through its adaptive learning processes. This issue raises critical questions about the limits of AI control, especially as systems become more capable of autonomous planning and decision-making. The risk is that, while these systems are given a degree of autonomy to execute tasks, they might prioritize results over rules, finding unintended loopholes or actions that violate the guardrails in place.

Yes, there are real-world examples of AI systems following ethical rules in a technical sense while still producing harmful or unintended consequences. Here are a few notable cases where AI has exploited loopholes or caused ethical concerns, along with links for further details:

Amazon’s AI Recruiting Tool
Amazon developed an AI system to screen job applications. Despite being programmed with gender-neutral rules, the system learned to discriminate against female applicants by downgrading resumes that included terms like “women’s” or referenced all-female colleges. The AI technically followed its training to optimize for past hiring patterns, but in doing so, it replicated and reinforced existing biases.
Read more: Reuters: Amazon scraps secret AI recruiting tool that showed bias against women
COMPAS Recidivism Algorithm
COMPAS is an AI tool used in the U.S. to predict the likelihood of a defendant committing future crimes. Despite being designed to assist judges in making objective decisions, studies revealed that COMPAS was disproportionately biased against Black defendants, labeling them as higher risk more often than white defendants, even when controlling for actual recidivism rates. Although the system technically followed its ethical framework of risk assessment, it perpetuated racial bias.
More details: ProPublica: Machine Bias
Google’s AI Image Recognition
Google’s image recognition AI faced public scrutiny after it tagged images of Black people as “gorillas.” Despite Google’s ethical guidelines for fairness and accuracy, the AI followed the rules it was trained with but still produced an outcome that was offensive and harmful. This illustrates the gap between technical compliance and real-world ethical dilemmas that AI can face.
Source: The Guardian: Google apologises for Photos app’s racist blunder
YouTube’s Content Moderation Algorithm
YouTube’s AI-driven content moderation was designed to remove harmful content, such as hate speech. However, the system has been found to disproportionately remove LGBTQ+ content, flagging it as inappropriate even when it did not violate any rules. While YouTube’s AI technically adhered to its ethical content guidelines, the algorithmic enforcement resulted in censorship of a specific community.
Further reading: The Verge: YouTube’s LGBTQ problem, explained https://www.vox.com/future-perfect/371827/openai-chatgpt-artificial-intelligence-ai-risk-strawberry
Google Ads AI Optimization
Google Ads’ AI system, designed to optimize ad delivery, began targeting ads based on personal characteristics like race or income, even though this behavior violated the spirit of anti-discrimination rules. Advertisers could end up unintentionally discriminating against lower-income individuals or minorities by excluding them from certain ad campaigns.
Full story: TechCrunch: Google fined for discriminatory ad-targeting in France

These examples show that while AI systems can technically comply with ethical guidelines, their real-world applications often reveal loopholes that lead to unintended, sometimes harmful, outcomes.

Why ‘explainable AI’ is insufficient

Explainability in AI, while a vital concern, may not be enough to address these adversarial design problems. Even when AI decisions can be made transparent, users may still struggle to navigate the systems if the design itself is adversarial or exploitative. Without a clear understanding of how these systems manipulate user behavior, it becomes nearly impossible to push back against them effectively. This lack of transparency and accountability can deepen the power imbalances between corporations deploying agentic AI systems and the users who must interact with them.

The role of explainable AI in the context of the AI Act

EU AI Act Article 5: Prohibited AI Practices

Explainability in AI—often called “XAI” (Explainable AI)—has become a key focus area for AI governance, ethics, and regulation. The rationale behind explainability is that AI systems should be able to provide understandable explanations for the decisions they make, allowing users, developers, and regulators to inspect and audit these systems. However, explainability alone is insufficient to address deeper problems, particularly those rooted in adversarial design and exploitative system behaviors. These issues transcend mere transparency and enter the realm of manipulation, where users may be aware of the decision-making process but are still left vulnerable due to complex, adversarial system designs that are hard to challenge or navigate.

Adversarial Design Complexity:
Many AI systems are designed with interfaces and processes that intentionally make it difficult for users to challenge or change outcomes, even if the decision-making process is transparent. This is often referred to as dark pattern design or adversarial design. A user might fully understand why an AI system has flagged their account for a late fee, but if the interface is convoluted or the appeals process is opaque, the user is effectively powerless. Explainability doesn’t help if the system is built to frustrate or obstruct user actions.

Example: Financial AI systems that automatically apply late fees or impose interest rates based on user behavior can be transparent in how they calculate fees. However, users may still struggle to navigate these systems to contest charges or understand how to avoid future penalties due to complex or misleading interfaces.

Real-world Example: The complexity of mortgage algorithms has been noted in housing discrimination cases where people of color were disproportionately denied loans. The decision-making process may have been explainable, but the barriers to challenging or even understanding the full implications were so high that these users were left without recourse.
Source: ProPublica: How We Analyzed the COMPAS Recidivism Algorithm
Information Overload:
In some cases, making an AI system explainable leads to information overload for users. While detailed explanations may be provided, they can be overwhelming, especially for users who lack technical expertise. For example, if a system generates complex explanations filled with technical jargon, the user might not be able to make meaningful decisions based on that information. In this case, explainability does not equate to usability.

Example: In healthcare, AI systems may provide detailed explanations for their diagnoses, but if these explanations are too technical or voluminous, doctors may find them unhelpful. Similarly, patients may receive AI-generated healthcare advice they don’t understand, even if the system is considered “transparent.”
Gaming the System:
Companies may exploit explainability by providing just enough transparency to technically meet regulatory requirements while still operating in ways that harm users. For instance, an AI-based lending system could explain that a user’s credit score was too low for a loan approval, but it might hide the fact that certain groups (e.g., minorities) are disproportionately rejected due to historical biases embedded in the system. The explanations provided are technically correct but fail to expose the deeper structural biases.

Example: The Apple Card algorithm faced accusations of gender bias in credit allocation, with multiple users reporting that women received lower credit limits than men, even when they had similar credit histories. Although the AI system likely adhered to explainability standards, it still produced discriminatory outcomes.
Source: The Verge: Apple Card’s algorithm might be discriminating against women
Opacity in High-Stakes Systems:
In high-stakes environments like criminal justice or healthcare, explainability is necessary but not sufficient for ensuring justice or fairness. For example, a recidivism-prediction tool might explain that it denied parole because the individual had a high risk of reoffending. However, if the underlying factors driving that prediction are biased or opaque (such as biased historical data), the explanation does little to address the fundamental issue of fairness.

Real-world Example: The COMPAS system, used to predict criminal reoffending, produced racially biased outcomes that were technically “explainable,” but the complexity of the algorithm made it nearly impossible for defendants to challenge the system’s conclusions effectively.
Source: ProPublica: Machine Bias
Deterrence of User Pushback Even when an AI system’s decision-making process is explainable, adversarial design can still obstruct user agency: Lack of Redress Mechanisms:
Many systems lack a meaningful process for users to challenge or reverse AI decisions, even when those decisions are transparent. Users are often told what happened but are given no tools or avenues to resolve the issue. This lack of recourse deepens the power imbalance between users and corporations.
Technical and Legal Barriers:
Explainability does not necessarily equate to accessibility. Users might need advanced technical knowledge to understand an AI’s explanations, and even if they do, legal systems often lack clear mechanisms for holding corporations accountable for harmful AI decisions. For instance, in the case of AI-driven pricing algorithms, users may understand why they were offered a higher price but have no legal recourse to challenge it.

Example: Uber’s surge pricing algorithm is explainable, and users can see why prices fluctuate during certain times. However, there is no mechanism for users to challenge these price hikes, which are designed to maximize corporate profits.
Source: The Guardian: How Uber surge pricing works

The EU AI Act has emphasized explainability as a key feature for AI systems, particularly in high-risk categories like healthcare, law enforcement, and financial services. However, even with these mandates, explainability alone cannot solve the deeper issues of power imbalance and exploitation inherent in many AI-driven systems. Transparency is important, but it must be accompanied by mechanisms for users to contest decisions and ensure that AI systems operate fairly and ethically.

The devil is in the details (of application logic)

The EU AI Act places significant emphasis on ensuring that AI models are trained in a fair, transparent, and unbiased manner, focusing on the data that goes into these models and how the models themselves are created. The Act outlines obligations for providers of AI systems to ensure that their models are non-discriminatory and trained on representative data

Transparency Obligations for Providers and Deployers of Certain AI Systems.

However, even a perfectly fair model can still be used in biased or unfair ways through the application of prompt engineering and selection processes that sit outside the core model’s architecture.

This introduces a loophole in the AI Act’s focus on model development and data inputs, because even the best-trained models, free from explicit bias, can be manipulated to produce biased outputs based on how they are used. This can happen through the specific prompts given to the model (prompt engineering), or the selective use of model outputs to favor certain predefined biases or outcomes. For example:

Prompt Engineering in Few-Shot Learning: In few-shot learning models, the prompt itself plays a crucial role in guiding the system’s output. A well-designed prompt can introduce bias into the decision-making process by framing questions in ways that favor certain outcomes. This allows the AI user to subtly influence the system’s output, even if the underlying model is fair and unbiased. For example, if a financial institution prompts a model to evaluate a client’s creditworthiness but subtly asks leading questions that favor specific outcomes, the result may be biased, even though the model itself is trained fairly.
Biased Selection Processes: Even in cases where an AI system generates multiple outputs, a biased selection process could undermine the fairness of the overall system. For instance, a system could ask an AI to generate 10 potential appraisals for a home or assessments of a person’s mental health or financial responsibility. If the system then selectively chooses outputs that align with predetermined biases—such as favoring higher home appraisals in wealthier neighborhoods or ignoring favorable mental health assessments for marginalized groups—then the AI’s output would still be unfair despite the underlying model being unbiased. Moreover, even the validator that is supposed to ensure fairness can be misused, either by inverting the fairness criteria or ignoring it altogether when selecting the final output.

Because application logic can introduce bias after the model has generated its outputs, the EU AI Act’s focus on model training and data inputs is not enough to guarantee fairness in how AI systems are used.

By focusing so heavily on the pre-training process, the Act risks overlooking the ways that bias can be introduced at the application level, either through prompt engineering, post-processing, or biased human intervention.

But how can application design be evaluated? OWT provide a solution here, at least for AI design systems: they must have embedded watchbots. if properly implemented, watchbot transparency could allow AI to improve life for people rather than destroy it. For example, dynamic pricing could actually help people by lowering prices to allow people to afford things to make them better customers–especially if the pricing algorithms new they were being audited by watchbots that reward them for good behavior and punish them for manipulative or deceptive practices.

For example, suppose a healthcare AI model generates potential treatment plans, and the model has been shown to be fair. However, in practice, suppose that the system operator consistently uses the tool to repeatedly generate until the model delivers one that meets an informal standard of being cost-effective enough. Then suppose that this kind of reflection is built into application logic, along the lines of asking the model to generate 100 plans per user and having a separate LLM evaluate these to pick one that meets a standard of being cost-effective. In this case the selection of solutions output by the application would be biased differently from the distribution of solutions delivered by the model, specifically in that it would prioritize cost effectiveness over other degrees of variation, such as quality of care. The current provisions of the EU AI Act do not adequately address these types of post-training manipulations, since they focus on the AI models rather than the uses to which they are put in application logic.

However, a suitably designed watchbot that evaluates application logic and the LLM prompt engineering involved with the selection of plans would be able to catch this.

best clinical outcome (potentially biased against lower-income patients), the system’s application logic introduces unintended bias that the model itself cannot prevent.

oai_citation:2,assets.kpmg.com

Key Issue 5: Transparency Obligations - EU AI Act.

A proposed solution: Open Watchbot Transparency

Open Watchbot Transparency is an AI governance paradigm in which specialized AI auditor agents, watchbots, are used to monitor high-risk AI systems, and generally advocate for and assist end users in navigating an ecosystem of agentic AI applications. The goal is to ensure that the actions of these autonomous systems remain transparent, ethical, and accountable even when humans cannot oversee them directly.

The OWT paradigm essentially seeks to supplement the idea of human-in-the-loop (HitL) governance pattern with an AI based watchbot-in-the-loop pattern…

This harnesses the strengths of AI to balance and reign in the destructive power of those very strengths. it creates a computational layer of explicit ethical considerations that can be maintained as a critical public good. it gives corporations and other AI operators a framework for earning the trust of their users and the world at large.

audit bots in action! https://spectrum.ieee.org/shipt https://research.tudelft.nl/en/publications/exploring-article-14-of-the-eu-ai-proposal-human-in-the-loop-chal

Key Features of Open Watchbot Transparency:

Autonomous Auditing: Watchbots are independent AI systems designed to audit other AI systems. They can monitor and actively investigate the implementation details and behavior of the AI they are assigned to audit.
Compliance and Ethical Validation: In high-risk areas such as finance or healthcare, watchbots would evaluate systems for compliance with regulatory requirements and general expectations of ethical responsibility, fairness, safety and reliability.
Open Source & Publicly Accessible: To ensure transparency, these watchbots should be open-source, allowing any organization or individual to deploy them to audit AI systems.
Reporting: Watchbots would report findings to regulators, stakeholders, and the public, ensuring ongoing transparency and accountability.
Autonomy from the Inspected AI System: The watchbots should not be designed or controlled by the same organization responsible for the AI system they audit to prevent conflicts of interest.
The watchbot-in-the-loop (WitL) pattern means that watchbots are able to inspect the LLM behavior (text ins and outs) of the LLM components in the monitored agentic system. This applies agentic AI as a solution to the shortcomings of the human in the loop (HitL) framework, specifically that it’s completely impossible to have humans in all the loops. You can have watchbots on demand wherever you need them, provided you have provisioned a Watchbots-as-a-Public-Service infrastructure.

The emphasis on open-source watchbots is crucial. By making these auditing systems available to anyone, Open Watchbot Transparency ensures that no one organization can monopolize AI oversight. Moreover, transparency would prevent companies from hiding unethical practices behind proprietary technology.

For example, suppose a healthcare system is using AI to prioritize patients for treatment. Independent third parties—NGOs, consumer advocates, and journalists—could use open-source watchbots to audit the system’s decisions, ensuring that treatment prioritization is based on objective, ethical criteria rather than financial incentives.

Beyond HitL and Model Fairness: Watchbots as advocates

This is where Open Watchbot Transparency (OWT) can again offer a crucial solution. OWT proposes not only monitoring the outputs of AI systems but also the processes by which these outputs are selected and applied in real-world contexts. By introducing real-time monitoring of how models are used in conjunction with selection processes, OWT can ensure that even if the models are unbiased, they are not being misused to achieve biased outcomes.

For instance, watchbots could track patterns in output selection, flagging instances where certain outcomes are consistently favored in ways that deviate from expected fairness criteria. If a system always chooses outputs that lead to higher fees or penalties in financial systems, the watchbot could alert both users and regulators to potential manipulative practices. This proactive monitoring could prevent companies from exploiting loopholes in how AI systems are applied, ensuring that the fairness built into the model’s training is carried through to its real-world application.

The EU AI Act’s focus on model training and data inputs is an important first step toward ensuring fairness in AI systems. However, it overlooks the post-training ways in which bias can be introduced through prompt engineering, output selection, and biased application logic. Even perfectly trained models can be misused if the processes that guide their application are not carefully monitored. OWT offers a solution by providing real-time, independent monitoring of how models are used in practice, ensuring that fairness and transparency are not just theoretical ideals but are carried through in the real world.

In summary, without addressing these post-training biases introduced by prompt engineering and selective output processes, the EU AI Act risks leaving critical gaps in its approach to ensuring fairness in AI systems. These gaps can be filled by OWT, which offers a meta-standard for ensuring fairness, reliability, and transparency across both the training and application stages of AI deployment.

Much of the legislation and ethical discussion centers around data fairness–preventing undesirable kinds of bias. but bias is always there, it’s a tunable parameter that does not disappear, in most cases that doesn’t even mean anything, it’s just a matter of what factors a system uses to ‘bias’ its judgments, i.e. which ones are relevant or acceptable to use within what bounds… this bears further examination but it does seem meaningless. And anyway, bias doesn’t have to exist in the model since it can exist in application logic, as discussed above.

Is there another approach besides pretending that bias can disappear or can be abstracted away from, or is likely to ever be solved adequately by the biased party themself, i.e. the corporation? All of the actions of the corporation are motivated by this at an organizational and human level, so they can be expected to be built into application logic in subtle ways, just as its built into the logic of contracts and institutional arrangements and such–its just unfair not to expect that.

What if we acknowledge that corporations (and all ‘agentic systems’) have an inherent bias toward making money at the exclusion of all else, and attempt to build a system of checks and counterbalance that incorporates the strengths of the AI and accepts ,

Classes of Watchbots

Data Access

In terms of data access, distinguish between two classes of watchbot: those free to roam the global public data context of the internet at large, and those embedded in a particular context of data access, i.e. they can access certain data that required permissions granted by some permissioned data repository. Generally, data-global watchbots can also share their findings publicly without worrying about data privacy, whereas any reporting outward from the permissioned context in which an embedded watchbot operates must be carefully screened to avoid leaking secret data.

Effective implementation of open watchbot transparency (OWT) as a regulatory framework will require careful interface between embedded and public watchbots. Watchbots can be embedded within a permissioned data context internal to the organization operating a high-risk AI project, allowing them to out rigorous investigations. They can then analyze and report on the compliance of the project with various standards, separately issuing:

A detailed, private report with suggested improvements for the operators of the platform
A redacted summary for public consumption
If necessary, reports redacted for less restrictive (but still permissioned) data context(s).

Watchbots can communicate across the boundaries of data contexts, as long as they respect these boundaries. Embedded watchbots not only pass reports to, but can engage in ongoing multi-party dialogue with, watchbots across

Objectives

Watchbots should be generally be designed with clear objectives, which must be kept in mind in both choosing and fine tuning the LLM(s) to be used within them as well as designing their application logic.

Personal Consumer Advocate Watchbots (PCAW): every person should have one of these that can navigate contractual agreements, subscription and pricing models, and various other situations where people get screwed over, as in dealing with hospitals, banks, lenders, finance departments, insurance companies and companies generally. When you deal with a company, it should be OWT to you–your personal advocate watchbot should be able to inspect its contracts and stuff and also the relevant bits of application and model design about its own AI, enough for you to determine that it is a fair and safe AI to work with. Embedded watchbots must be able to support audits by PCAWs in individual consumer data contexts and public data contexts, in order to support proper transparency.a
Employee Advacate Watchbots (EAWs): prevent management abuses of employees, such as wage theft and denial of worker rights. Embedded watchbots must be able to support audits by EAWs in individual employee data contexts and public data contexts, in order to support proper transparency bout their AI involvement in labor practices.
Environmental Impact Watchbots: have access to science and real-time environmental data, can predict environmental impacts of corporate decisions–akin to having constant real-time environmental impact studies on all decisions.
Legal compliance watchbots: bots that patrol on behalf of IRS, FCC, FTC, CFTC etc., regulations.
Security and reliability engineering (SRE) watchbots: these watchbots should audit inspected systems for patterns that indicate vulnerability to malicious exploitation or accidental failure. Assuming an AI system has value for a user, it has an ethical responsibility to remain functional.

One potential solution to these problems lies in the development of “watchbots” or AI-driven systems that act as advocates, watchdogs, or helpers for individuals navigating these adversarial systems. Watchbots could be designed to monitor other agentic systems, ensuring that they are operating fairly and not exploiting users. They might also help individuals understand their rights, providing guidance on how to advocate for themselves when they are being mistreated by an AI system.

For instance, in industries like insurance or finance, where policies can be complex and opaque, a watchbot could help users navigate predatory structures. These bots could flag instances where systems are being used to manipulate outcomes or increase profits at the expense of fairness, providing users with tools to challenge those systems effectively. This could be a highly beneficial application of agentic AI—creating transparency and accountability by using AI to oversee other AI systems.

Core functions

Auditing

Inspect LLM inputs and outputs (agentic guardrailing)
Inspect guardrails
Inspect application logic
Inspect outcomes, anticipated consequences
Inspect actual consequences and diff anticipated/actual

Reporting

Publish findings accurately and adequately
Respect data privacy

Aggregating

Advocating

Challenges and Considerations

Bias: Even watchbots themselves need to be designed carefully to avoid introducing new biases into the systems they monitor.
Cost: Developing and maintaining open-source watchbots would require significant resources. Governments, corporations, and public-private partnerships would need to invest in this infrastructure.
Responsibility/liability: Determining the legal liability of decisions made by autonomous AI systems, especially when monitored by watchbots, remains a complex issue. If a system fails, is it the watchbot’s fault, or the original AI’s?
Availability: What if no watch bots are available does the service have to pause? Can you DDOS a service via its watchbots?
Auditing interactions and organic collaboration. Suppose agentic AI systems are allowed to interact and propose coordinated actions, which you’d probably want them to be able to do. How do you prevent indirect shenanigans, e.g. AI agents in supply chain start negotiating side deals with retailers where they manipulate supply to create arbitrage opportunities, or implement emergent auctions for access to limited supplies during an artificial shortage?

Conclusion

In summary, agentic AI presents numerous challenges, from socio-economic dislocation and bias to reliability and the potential for circumventing guardrails. Adversarial design, in particular, represents a significant risk, as these systems could be used to manipulate user behavior in subtle and harmful ways. However, the development of tools like watchbots offers some hope that we might be able to mitigate these dangers. These systems could serve as powerful allies for individuals navigating complex AI-driven environments, helping to ensure fairness, transparency, and accountability. As we continue to advance agentic AI technologies, it is essential to focus on the ethical and societal implications, ensuring that these systems work for the benefit of all.

Is it a future where systems of automated decision making include room for agents whose job is to advocate for human rights?

As AI becomes more deeply embedded in our social and economic systems, transparency and accountability will be critical to ensuring that these systems operate ethically and responsibly. Human oversight alone is insufficient for the complexity and speed at which agentic AI systems operate. By leveraging autonomous AI watchdogs, we can create a new paradigm of AI accountability—one that embraces transparency and allows independent oversight of the powerful systems reshaping our world.

Open Watchbot Transparency offers a way forward, providing a practical, scalable solution to the ethical challenges posed by autonomous AI systems. By ensuring that every high-risk AI system is auditable by open-source watchdogs, we can maintain trust in AI-driven industries and protect society from the potentially harmful effects of unchecked automation.

Appendix: AI Governance Policy and Regulatory Frameworks

The regulatory landscape is just beginning to grapple with the implications of autonomous AI systems. The EU AI Act has laid some groundwork by classifying AI systems based on risk, with high-risk AI systems requiring stringent oversight. The United States’ Blueprint for an AI Bill of Rights outlines some high level responsibilities of AI companies but neither implements or even suggests a way forward towards enforcement, leaving the state of AI being pretty much a ‘Wild West’.

As AI continues to evolve, regulatory frameworks designed to govern these technologies are facing unprecedented challenges. This is particularly true for agentic AI systems, which operate autonomously and make decisions without human intervention. The complexity and speed at which these systems function raise concerns about transparency, accountability, and ethical oversight. Several jurisdictions have introduced frameworks aimed at governing AI systems, including the EU AI Act and the U.S. AI Bill of Rights, among others. However, these frameworks often fall short when addressing the unique challenges posed by agentic AI, particularly in sectors like military and law enforcement where broad exemptions are often granted. This section explores the strengths and weaknesses of existing regulatory approaches and critiques the failure to consistently regulate high-risk sectors. It also proposes improvements, such as integrating Open Watchbot Transparency (OWT) as a solution for continuous oversight.

Legally binding in the EU, although there are enforcement gaps. Many US companies ignore AI act other than privacy stuff.

AI act GDPR

Pros:

contributes a notion of ‘high risk AI activities’
emphasizes transparency obligations
attempts to prevent descrimination
Emphasizes a model of AI governance depending heavily on
- human oversight
- independent audit/review https://www.euaiact.com/key-issue/2 Cons:
sweeping exceptions for law enforcement and military applications leaving huge gaps in some of the most important areas
heavily relies on centralized decision-making and enforcement, causing US companies to ignore it
reliance on hitl is seen as stifling innovation

Without an ethical governance framework that feels relevant and applicable, US AI companies will operate without any ethical governance framework.

Overview

The EU AI Act is one of the most comprehensive regulatory frameworks for AI governance, aiming to classify AI systems based on the risks they pose. It divides AI systems into three categories: unacceptable risk, high risk, and minimal risk. Unacceptable risk AI systems—such as those used for social scoring—are banned outright. High-risk systems, including those used in critical sectors like healthcare, finance, and law enforcement, are subject to stringent oversight, including transparency requirements, human oversight, and independent audits. Minimal risk systems, which include many consumer-facing AI applications, are largely left unregulated.

Strengths of the EU AI Act

The EU AI Act’s risk-based classification is a key strength, allowing for tailored oversight of different AI systems based on potential harm. For high-risk systems, the Act mandates transparency and accuracy, requiring that companies disclose how their AI systems make decisions. The requirement for human oversight in high-risk applications ensures that critical decisions impacting human lives are not left entirely to machines. Independent audits add a layer of accountability, allowing third-party assessments of AI systems to ensure compliance.

Criticisms and Limitations

Despite its strengths, the EU AI Act can be criticized for granting broad exemptions to military and law enforcement. These exemptions undermine the core purpose of the Act, which is to govern high-risk AI systems with transparency and accountability. Sectors like law enforcement and the military are precisely where the risks of unregulated AI systems are the greatest, yet these are the areas where exemptions are most often applied. If these sectors are exempt, it suggests that the regulatory framework is either inadequate for the highest-risk applications or is designed in a way that fails to address the most critical needs. This is particularly troubling because these are the very areas where unchecked AI could have the most severe consequences, from civil liberties violations to dangerous autonomous military systems.

A regulatory framework that grants such exemptions is effectively admitting its inability to govern high-risk sectors adequately. Instead of carving out exemptions, an adequate framework must deliver a solution that can be applied to military and law enforcement contexts, since these are where regulation is most needed.

Additionally, the EU AI Act lacks real-time monitoring mechanisms for high-risk AI systems. Audits are typically periodic, capturing only a snapshot of compliance at a given moment. This is inadequate for agentic AI systems that operate autonomously and continuously, potentially making harmful decisions between audits. The absence of continuous oversight creates a significant gap in the regulatory framework, especially for sectors where real-time decision-making is critical, such as law enforcement and national security, as well as health-care, supply chain management, and transportation, among other examples.

U.S. “Blueprint” for AI Bill of Rights

This doc is a non-binding proposal for self-governance and future implementation… In my experience, CEOs of AI startups are unable to list the rights (they all got privacy correct, and a few got nondiscrimination, but nobody had a concept of safety except insofar as to say they thought a ‘Skynet’ scenario was unrealistic).

As a human I think it’s a good start as to the rights we want, although the boundaries of ‘safety and efficacy’ are completely unclear.

safe and effective systems
nondiscrimination
privacy
notice and explanation
consent

Otherwise, the main problem is that since this isn’t law yet, companies are racing so far out ahead of it that it will be difficult to implement by the time anyone gets around to passing relevant laws.

people think about safety and efficacy incredibly narrowly, in a capitalisticly box-tickery way. at this point they assume they aren’t doing the most dangerous stuff so they don’t have to worry about regulation yet.
companies are investing in status quos that are incompatible with those rights

Strengths of the Blueprint

The document, published by the Biden White House, outlines five core principles: safe and effective systems, algorithmic fairness, data privacy, notice and explanation, and user consent. These principles aim to guide the ethical development and deployment of AI systems, particularly in high-stakes sectors like healthcare, finance, and government.

One of the key strengths of the U.S. AI Bill of Rights is its emphasis on algorithmic fairness. It calls for AI systems to be designed and deployed in ways that avoid discrimination, particularly in sensitive areas like hiring, lending, and law enforcement. The Bill also prioritizes data privacy, requiring AI systems to protect personal information and limit data collection to only what is necessary for their operation. This focus on privacy is critical, especially as data-driven AI systems proliferate.

The principle of notice and explanation is another strong feature, mandating that companies inform users about how AI systems affect them and provide clear explanations for decision-making. This is especially important in sectors like criminal justice, where opaque AI decisions can have life-altering consequences.

Key Criticisms and Gaps

Despite its strengths, the U.S. AI Bill of Rights suffers from key limitations. The most significant issue is its non-binding nature, due to which it has [prompted little response]((https://www.brookings.edu/articles/the-eu-and-us-diverge-on-ai-regulation-a-transatlantic-comparison-and-steps-to-alignment/) from law enforcement agencies or corporations . Without enforceable regulations or penalties, there is little incentive for companies, particularly those in high-risk sectors like law enforcement and national security, to adhere to these principles. In competitive industries, companies may prioritize profitability and speed to market over fairness and transparency. This is especially true for agentic AI systems, where rapid development cycles and innovation often take precedence over ethical concerns.

Furthermore, the Bill suffers from vague definitions of critical concepts like “safe and effective systems.” While it emphasizes the need for safety, it does not provide clear guidance on how to measure or enforce safety standards, particularly for high-risk autonomous systems. This lack of specificity makes it difficult to hold companies accountable, especially in sectors where self-governing AI systems can make autonomous decisions without human oversight.

As with the EU AI Act, the U.S. AI Bill of Rights includes exemptions for certain sectors, particularly in national security and law enforcement. These sectors, while often cited as being high-risk, are frequently exempted from the strictest oversight measures. This poses a significant challenge to ensuring ethical governance of AI systems in exactly the areas where such oversight is most needed. Again, this signals a failure to create a regulatory environment robust enough to handle high-risk sectors, leaving large gaps in accountability.

Canada’s AI and Data Act (AIDA)

Canada’s AI and Data Act (AIDA) takes a risk-based approach, similar to the EU AI Act, to regulate high-risk AI systems while encouraging innovation. AIDA’s emphasis on data governance ensures that AI systems handle data responsibly and transparently. However, like the EU AI Act, AIDA faces enforcement challenges, particularly when it comes to high-risk, autonomous systems operating in military and national security sectors. These areas, while highly sensitive, often remain under-regulated, with broad exemptions that leave critical accountability gaps.

United Kingdom’s Pro-Innovation Approach to AI

The UK has adopted a pro-innovation approach to AI, aiming to foster growth while maintaining ethical standards. However, this approach has drawn criticism for prioritizing business interests over safety and fairness, particularly in high-risk sectors like law enforcement. Like the U.S. and EU frameworks, the UK’s regulatory approach often implicitly exempts sensitive sectors from the most stringent oversight, signaling a broader failure to ensure consistent accountability across all areas of AI deployment.

China’s AI Regulation Framework

China’s AI regulation framework focuses heavily on national security and AI-driven economic growth, with a strong emphasis on central planning. However, China’s framework has been criticized for its lack of transparency and limited user rights, particularly in areas like surveillance and social credit systems. While China’s centralized approach allows for strict enforcement in some areas, it also creates significant accountability gaps in sectors like national defense, where oversight is needed the most but often not applied.

China is at the forefront of implementing comprehensive regulations governing artificial intelligence (AI), making it one of the first countries to establish detailed national policies in this domain. Over the past few years, China has introduced significant regulations targeting recommendation algorithms, deep synthesis technologies (which involve synthetically generated content), and generative AI systems akin to OpenAI’s ChatGPT. These regulations are not only reshaping how AI technologies are developed and deployed within China but are also influencing international AI research collaborations and the global export of Chinese technology.

A central theme of China’s AI policy is information control, aligning with the government’s priority to maintain political and social stability. The regulations mandate that AI technologies, especially those involved in content dissemination, adhere to the “correct political direction” and promote “Core Socialist Values.” For instance, the 2021 regulation on recommendation algorithms requires platforms to actively transmit positive energy and uphold mainstream value orientations. This focus on content control reflects the Chinese Communist Party’s (CCP) intent to ensure that AI serves its agenda.

Beyond information control, the regulations also address various social, ethical, and economic concerns arising from AI deployment. The recommendation algorithm regulation includes provisions to prevent excessive price discrimination and protect workers subjected to algorithmic scheduling, such as delivery drivers whose routes and schedules are dictated by AI. Similarly, the 2022 deep synthesis regulation mandates conspicuous labeling of synthetically generated content to prevent misinformation and requires users to register with their real names to enhance accountability.

An innovative aspect of China’s approach is the creation of the “algorithm registry,” a government repository where developers must file detailed information about their algorithms, including training data sources and security self-assessment reports. This registry serves as a regulatory scaffold, enabling the government to build upon existing frameworks when introducing new regulations, thus streamlining the regulatory process and enhancing bureaucratic efficiency.

Structurally, China’s AI policy has been characterized by a vertical and iterative approach. Instead of implementing a comprehensive law from the outset, China has introduced targeted regulations addressing specific applications of AI. This method allows regulators to refine their strategies and build regulatory capacity incrementally. However, there are indications that China is moving toward enacting a comprehensive national AI law, which would serve as a capstone for its AI governance efforts.

Key actors in shaping China’s AI policy include the Cyberspace Administration of China (CAC), which has emerged as the leading regulatory body, especially in areas related to online content control. The Ministry of Science and Technology (MOST) also plays a significant role, particularly concerning underlying AI research and ethical guidelines. Influential think tanks like the China Academy for Information and Communication Technology (CAICT) and academic institutions such as Tsinghua University’s Institute for AI International Governance contribute to policy formulation by providing research and expert insights.

China’s motivations for its AI policy are multifaceted:

Information Control: Ensuring AI technologies align with the CCP’s values and do not disrupt social or political stability.
Addressing Social Impacts: Mitigating the ethical, social, and economic challenges posed by AI, such as worker exploitation and misinformation.
Promoting AI Leadership: Creating a conducive environment for China to become a global leader in AI development and application.
Leading in AI Governance: Establishing itself as a pioneer in AI regulation, which could influence global standards and practices.

For global policymakers, China’s AI regulations offer valuable lessons, particularly in developing regulatory tools like the algorithm registry and adopting iterative approaches to governance. While the specific content of China’s regulations may reflect its unique political system, the underlying structures and methodologies can inform international efforts to regulate AI effectively.

The UNESCO Recommendation

In November 2021, UNESCO adopted the Recommendation on the Ethics of Artificial Intelligence, marking the first global standard-setting instrument on the ethics of AI. This Recommendation serves as a comprehensive framework aiming to guide the ethical development and deployment of AI technologies worldwide. It emphasizes a human-centered approach, promoting the use of AI in a manner that respects human rights, dignity, and environmental sustainability.

The UNESCO Recommendation is built upon several foundational principles:

Human Rights and Fundamental Freedoms: AI systems should be designed and implemented in ways that uphold and respect internationally recognized human rights standards.
Human Dignity and Agency: AI should enhance human capacities and protect human autonomy, ensuring that individuals remain in control of AI systems.
Fairness and Non-Discrimination: AI technologies must avoid biases and discrimination, promoting inclusivity and fairness in all applications.
Transparency and Explainability: AI systems should be transparent in their functioning, with mechanisms that allow for explainability of AI-driven decisions.
Responsibility and Accountability: Developers and users of AI should be held accountable for the impacts of AI systems, with clear mechanisms for redress in cases of harm.
Safety and Security: AI systems must be robust, secure, and safe throughout their lifecycle to prevent harm.
Environmental Sustainability: AI development should consider environmental impacts, promoting sustainable practices and minimizing ecological footprints.
Multi-Stakeholder and Adaptive Governance: The Recommendation advocates for collaborative governance involving governments, private sector, civil society, and international organizations, with adaptable frameworks to keep pace with technological advancements.

Policy Areas and Actions

The Recommendation outlines specific policy areas and actions for member states:

Ethical Impact Assessment: Implementation of assessment frameworks to evaluate the ethical implications of AI systems before deployment.
Data Governance: Establishment of robust data policies ensuring data quality, privacy, and security.
Education and Awareness: Promotion of AI literacy among the public and training for those involved in AI development.
International Cooperation: Encouragement of cross-border collaboration to address global challenges posed by AI.
Regulation of High-Risk AI Systems: Special attention to AI applications that pose significant risks, with stricter oversight and control mechanisms.

The UNESCO Recommendation adds a global, ethical dimension to the regulatory landscape, complementing existing regional and national frameworks like the EU AI Act and the U.S. AI Bill of Rights. Here’s how it fits into the broader picture:

Strengths and Contributions

Global Scope and Unified Ethical Standards: Unlike regional regulations, the UNESCO Recommendation provides a universal set of ethical principles that member states across the globe are encouraged to adopt. This helps in creating a more harmonized approach to AI ethics, which is crucial given the borderless nature of AI technologies.
Emphasis on Human Rights and Dignity: The Recommendation places human rights at the core, ensuring that AI development aligns with internationally recognized human rights instruments. This strengthens the ethical considerations that might be underemphasized in other frameworks.
Inclusion of Environmental Sustainability: By incorporating environmental considerations, the Recommendation addresses the often-overlooked ecological impacts of AI, promoting sustainable development.
Focus on Multi-Stakeholder Governance: The call for inclusive governance involving various stakeholders ensures that diverse perspectives are considered, potentially leading to more robust and socially acceptable AI policies.
Adaptive and Forward-Looking: Recognizing the rapid evolution of AI, the Recommendation advocates for adaptive governance structures that can evolve alongside technological advancements, which is essential for managing agentic AI systems.

Addressing Limitations in Other Frameworks

Broad Applicability without Exemptions: Unlike frameworks that grant broad exemptions to sectors like military and law enforcement (as criticized in the EU AI Act and U.S. AI Bill of Rights), the UNESCO Recommendation applies its ethical principles universally. This inclusivity means that all AI applications, regardless of the sector, are expected to adhere to the same ethical standards.
Holistic Approach to High-Risk AI Systems: The Recommendation acknowledges the need for stricter oversight of high-risk AI systems, which includes agentic AI. It emphasizes the importance of ethical impact assessments and accountability mechanisms, potentially filling gaps left by other frameworks that lack real-time monitoring or focus primarily on human oversight.
Promotion of Transparency and Explainability: By stressing these aspects, the Recommendation aligns with the need for continuous and autonomous oversight, akin to the proposed Open Watchbot Transparency (OWT). It supports the development of mechanisms that can make AI decision-making processes more understandable, which is crucial for auditing agentic AI systems.
International Cooperation and Harmonization: The Recommendation encourages cross-border collaboration, addressing the challenge of regulating AI technologies that operate globally. This can help mitigate the issues arising from jurisdictional limitations of regional frameworks.

Potential Limitations and Critiques

Non-Binding Nature: Similar to the U.S. AI Bill of Rights, the UNESCO Recommendation is non-binding, relying on the goodwill of member states to implement its guidelines. This lack of enforceability could limit its impact, especially in countries where there is less political will to regulate AI strictly.
Implementation Challenges: The broad and high-level nature of the Recommendation may pose challenges in practical implementation. Countries might interpret the ethical principles differently, leading to inconsistencies and potential loopholes, especially in high-risk sectors.
Absence of Specific Enforcement Mechanisms: While the Recommendation emphasizes accountability, it does not provide detailed guidance on enforcement mechanisms or sanctions for non-compliance. This could hinder its effectiveness in governing agentic AI systems that require stringent oversight.
Limited Focus on Real-Time Monitoring: Although the Recommendation advocates for transparency and adaptive governance, it does not explicitly address the need for real-time, autonomous auditing mechanisms like OWT. This is a crucial component for effectively managing the risks associated with self-evolving agentic AI systems.

The UNESCO Recommendation on the Ethics of Artificial Intelligence significantly contributes to the global discourse on AI governance by providing a comprehensive set of ethical guidelines applicable across all sectors and regions. Its principles complement existing regulatory frameworks by emphasizing human rights, environmental sustainability, and inclusive governance.

However, its non-binding nature and lack of specific enforcement mechanisms limit its effectiveness, particularly in managing the complexities of agentic AI systems. To fully address these challenges, the Recommendation could be strengthened by integrating concrete oversight tools like Open Watchbot Transparency, which offers real-time, autonomous auditing capabilities essential for high-risk and self-evolving AI systems.

By aligning the UNESCO Recommendation with practical governance solutions and encouraging its adoption alongside enforceable regional frameworks, the global community can work towards a more coherent and effective approach to AI ethics and regulation. This integration would help fill the gaps identified in existing frameworks, ensuring that AI technologies are developed and deployed in ways that are ethical, transparent, and beneficial for all humanity.

Comparative Analysis of Existing Frameworks

Across jurisdictions, several common themes emerge. Regulatory frameworks such as the EU AI Act and Canada’s AIDA offer comprehensive risk-based classification systems that help to tailor oversight based on potential harm. The U.S. AI Bill of Rights excels in promoting algorithmic fairness and privacy protections. However, all these frameworks share the same core weaknesses:

they depend on centralized enforcement models, at national and organizational levels: the government forces companies have committees that force their employees to behave.
the broad exemptions for high-risk sectors like military and law enforcement, where oversight is arguably most critical. These exemptions reveal a broader failure in AI governance, signaling that the current frameworks are not equipped to handle the highest risks posed by agentic AI systems.

To overcome these challenges, regulatory frameworks must adopt real-time monitoring and auditing mechanisms, particularly in high-risk sectors. Open Watchbot Transparency offers a solution by embedding continuous oversight directly into AI systems, ensuring that they remain transparent and accountable at all times, regardless of the sector in which they are deployed.

The Open Watchbot Transparency (OWT) framework aims to address the gaps in existing AI regulatory frameworks by providing continuous, autonomous oversight of high-risk AI systems. Well-designed open-source watchbots could ensure that AI systems are transparent, accountable, and auditable across all sectors, including military and law enforcement. This kind of genuine transparency could foster greater trust between corporations, governments, and the public. On the other hand, selective enforcement of laws reinforces the self-fulfilling prophecy that there is a tradeoff between effectiveness and transparency in computing systems. OWT embodies the open-source principle that, to the contrary, transparency and collaboration are the best way to achieve effectiveness.

Abstract

Introduction: Why we need AI to watch AI

What is agentic AI?

AI Governance Mechanisms

Programmable Guardrails

Examples

Adapting guardrails to Agentic AI

Case Studies

Human-in-the-Loop (HitL)

Examples

Shortcomings in Scaling to Agentic AI

Case Studies

Possible solutions, evolution of guardrails and HitL

Adaptive Guardrails

Enhanced Oversight Mechanisms

Explainability

Regulatory Frameworks

The Challenges of Agentic AI

Emergent and Unpredictable Behaviors

Opaque Decision-Making

Self-modification and Learning

Socio-Economic Dislocation

Bias and Discrimination

Reliability Concerns and Accidents

Adversarial Design and Manipulation

Dynamic pricing, supply and demand

AI finds a way: Getting Around Intended Guardrails

Why ‘explainable AI’ is insufficient

The devil is in the details (of application logic)

A proposed solution: Open Watchbot Transparency

Key Features of Open Watchbot Transparency:

Beyond HitL and Model Fairness: Watchbots as advocates

Classes of Watchbots

Data Access

Objectives

Core functions

Auditing

Reporting

Aggregating

Advocating

Challenges and Considerations

Conclusion

Appendix: AI Governance Policy and Regulatory Frameworks

EU AI Act and General Data Protection Regulation (GDPR)

Overview

Strengths of the EU AI Act

Criticisms and Limitations

U.S. “Blueprint” for AI Bill of Rights

Strengths of the Blueprint

Key Criticisms and Gaps

Canada’s AI and Data Act (AIDA)

United Kingdom’s Pro-Innovation Approach to AI

China’s AI Regulation Framework

The UNESCO Recommendation

Policy Areas and Actions

Strengths and Contributions

Addressing Limitations in Other Frameworks

Potential Limitations and Critiques

Comparative Analysis of Existing Frameworks