Who's Grading AI's Homework? Nobody, Yet.

Jun 23

By Oluwaseyi Ayodeji Published on oluwaseyiayodeji.com | Sovereign Stack Newsletter

Here's a question nobody's asking loudly enough: when AI eventually breaks something big - a financial system, a public health record, an election - who answers for it? Right now, the honest answer is: nobody knows. There's no AI equivalent of an air traffic controller, no AI equivalent of a bank examiner walking into a branch to check the books. The most powerful technology built in the last decade is, today, graded almost entirely on its own homework.

Power gave us the constraint. Compute gave us the infrastructure gap. Chips gave us the materials leverage Africa hasn't claimed yet. Governance is different - and honestly, less sexy, the kind of word that makes people's eyes glaze over in a meeting. But strip away the jargon and it's the simplest question of all: who's checking that this thing is safe, and what happens if it isn't? Africa doesn't control that conversation yet. If we don't get a seat at the table where the rules are written, we inherit rules written by people who will never live with the consequences of getting it wrong.

To be clear about the landscape before I get into Anthropic's proposal specifically: this is not a blank page globally. The EU AI Act has been the most comprehensive attempt at AI regulation anywhere, though its high-risk obligations have already been pushed back once this year. California passed its own Transparency in Frontier AI Act, requiring large model developers to publish risk frameworks and report safety incidents. Colorado tried to build a comprehensive risk-based law modeled on the EU approach, and then - within two years - repealed and replaced it with a narrower, rights-based framework after a federal lawsuit and a White House executive order pushing for national preemption of state AI laws. I bring this up because it tells you something important: this entire field is moving fast enough that a law can be passed, challenged, and rewritten within 24 months. Any framework I describe here, including Anthropic's, should be read as a snapshot of June 2026, not a finished destination.

I want to start somewhere unusual for a piece on AI policy: with my own posture. I am bullish on AI in Africa. Not naively - bullish the way an engineer is bullish about a material once they've tested its tolerances. I've spent 17 years in infrastructure, manufacturing, and supply chain - including years on the process floor at Intel's Hillsboro fabs - and the pattern I keep seeing is this: transformative technologies don't fail because they lack potential. They fail, or they curse the people meant to benefit from them, when nobody builds the guardrails before the gold rush starts. AI's potential to transform lives across this continent is real. So is the risk that special interests, capital concentration, and plain old greed hijack a tool meant for the multitude and bend it toward the whims of whoever already holds power. That is precisely where governance comes in. And that is where I want to spend this issue.

The document that prompted this issue

In June 2026, Anthropic - the AI lab behind Claude, co-founded by Dario Amodei - published its "Advanced AI Framework," aimed mainly at the US federal government. Strip out the legal language and here's the plain version: Anthropic is asking governments to require AI labs to test their most powerful models for catastrophic risk, publish what they find, let outside evaluators check their work, and report incidents fast. It's also calling for the world to invest now in defenses against bio and cyber threats AI could supercharge, because those defenses take years to build and you can't improvise them mid-crisis.

Now here's the read that matters more than the summary. This is self-regulation. One of the most capable AI labs on the planet - a company with every commercial reason to want the lightest possible regulatory touch - sat down and wrote the rulebook it would like to be governed by. Say that plainly and two things become obvious at once. First: almost nobody else on earth could write a document this technically sharp about frontier AI risk, and choosing to publish concrete proposals instead of lobbying quietly behind closed doors is genuinely commendable - especially when the regulatory landscape right now has almost nothing else on the table. Second: a rulebook written by the party being regulated is a first draft, not a finished system. Treating it as anything more than that would be a mistake.

What the framework gets right

The strongest instinct in this document is one I share completely: transparency is the floor everything else stands on. You cannot govern what you cannot see. The framework wants developers to regularly publish how they test for risk, what they found, and what's still unresolved after their safeguards - not just at launch, but on an ongoing basis. That's the right instinct. A model serving millions of people in Lagos, Nairobi, or Kigali should come with the kind of disclosure a publicly listed company already owes its shareholders.

There's a tension worth naming honestly, though, because the framework doesn't fully resolve it: AI companies argue that full transparency about training data and model internals hands competitors their secret recipe. I get that. A frontier lab's training methods are genuinely its competitive edge, and forcing total disclosure could blunt the incentive to keep building better models. But the person actually using the tool - the policymaker in Abuja deciding whether to deploy a model in a public health system, the developer in Accra building on top of an API - has a real need to understand what's underneath it. The framework's compromise is to allow companies to redact trade secrets while still requiring public disclosure of capabilities, risks, and fixes. That's reasonable. Whether it survives contact with real money and real incidents is the open question.

The second thing the framework gets right: there's currently no independent watchdog with real teeth to check these models the way financial markets check listed companies. Anthropic's fix - hire outside evaluators, give them real access, publish what they find, build toward a rating system so companies can't just shop for the friendliest grader - is structurally smart. Think of it as the AI version of requiring audited financials before a company can go public. The framework even flags the obvious cheat code: a company could simply pick whichever evaluator asks the least of it. The proposed fix - rate evaluators on rigor, randomly assign the toughest ones in high-stakes cases - borrows directly from a problem financial regulators solved decades ago. And that borrowed playbook is exactly why I think Africa's regulators are better positioned for this than people assume: central banks and capital markets authorities already run audit regimes for banks and listed firms. They're not starting from zero. They're starting from a closely related discipline.

The third thing worth crediting: the framework treats security as a governance issue, not just an engineering one. A model that's safe on paper but can be stolen or quietly tampered with isn't actually safe. Requiring security programs scaled to the damage a breach could cause, with regular penetration testing reported to government, treats a company's model weights the way a serious bank examiner treats a bank's core ledger. For a continent that will increasingly run public services on top of foreign-built models, the security posture of the company holding those weights isn't a side issue. It's the whole game.

My prediction: the Big Four are about to get into the AI audit business

Here's where I think this is headed, and I want to make a specific bet rather than a vague one.

Anthropic's framework spends real space on a problem it admits it hasn't solved: there's no mature independent evaluation ecosystem for AI models. No licensed evaluators. No standard ratings. No accepted methodology that everyone agrees is rigorous. That gap is exactly the kind of opening that turns into a business - and it's not just a forecast. It's already happening. PwC's UK arm has been reported to be moving toward launching dedicated AI assurance services. KPMG has published a formal "Trusted AI" framework with numbered pillars and has pursued ISO/IEC 42001 certification - the international standard for AI management systems. Deloitte has its own "Trustworthy AI" framework. Industry reporting describes the Big Four as being in something close to an arms race to become the recognized name in AI assurance, openly drawing comparisons to how they expanded into ESG assurance a few years earlier.

Watch the PwCs, KPMGs, Deloittes, and EYs of the world move into this space - and move fast. They already run the exact muscle this framework is asking for: independent verification, standardized reporting, a "we checked the books and we're willing to put our name on it" business model that companies and regulators already trust. They've been certifying that public companies' financials are real for a century. The leap to certifying that an AI lab's safety testing is real isn't as far as it sounds.

But here's the nuance that makes this prediction worth more than a hot take: the consulting firms will win the business of AI auditing before anyone has actually solved the substance of AI auditing. Financial audit works because accounting has settled, agreed-upon rules - GAAP, IFRS - built over a hundred years of practice, court cases, and professional standards bodies. AI capability evaluation has no equivalent yet. Anthropic's own document admits this: the independent evaluator ecosystem "does not yet exist," and standards for what counts as a rigorous evaluation are still being figured out, mostly by AI safety researchers and red-teaming specialists, not auditors. Some early reporting on the UK's AI assurance market even flags this directly - a lot of the "assurance" currently being offered is provided by the AI developers themselves, which raises real questions about independence and standardization.

So my actual prediction is two-stage. Stage one, already underway: consulting firms build the commercial infrastructure- audit teams, client relationships, certification products, ISO-aligned frameworks - positioning themselves as the trusted name once regulation requires third-party evaluation. Stage two, the harder one: the actual technical standards for what makes an AI evaluation rigorous will likely come from AI safety researchers, academic labs, and the evaluator organizations the Anthropic framework is trying to seed - and the consulting firms will license, partner, or acqui-hire their way into that expertise rather than building it from scratch. Whoever bridges those two stages first - commercial trust plus genuine technical rigor - owns this market for the next decade. For African accounting and consulting firms paying attention, that's not a hypothetical. That's a seat at a table that hasn't finished being built yet.

If you work in consulting, audit, or risk advisory - I'd genuinely like to hear what you're seeing inside your own firm. Is this still mostly slideware and frameworks on paper, or are clients actually buying AI assurance engagements yet? Drop a comment. This is exactly the kind of on-the-ground signal that doesn't show up in press coverage until it's already old news.

Where I disagree

Here's where I part ways with the framework, and I want to be specific about why.

The mechanism for actually stopping a dangerous model from deploying is deliberately built with a high bar. Enforcement runs through courts and careful legal review, not through an agency that can just say "stop" and have that stick. The framework defends this as a guard against regulatory overreach - and I understand the instinct. Nobody wants one agency with unchecked power to shut down any model it doesn't like. But a high bar for stopping deployment is also, in plain terms, a high bar for protecting the public. Anthropic is positioning itself, through this document, as close to an industry safety standard-setter. That's real influence. And influence that comes with a built-in escape hatch from accountability is a structural risk, not a footnote. The framework is honest enough to admit this tension exists. It doesn't resolve it in the public's favor.

There's a second disagreement, and it's specific to where I sit. This framework is written, by its own admission, mainly for the US federal government. The thresholds, the agency structure, the courts it leans on - all of it assumes an institutional environment that doesn't map cleanly onto the African Union, the Central Bank of Nigeria, or Rwanda's ICT ministry. That's not a flaw in the document; Anthropic says as much itself. But it means African policymakers can't just copy this framework and call it governance. The independent evaluator model assumes a mature evaluation ecosystem that, by the document's own admission, doesn't exist yet even in the US. Africa would be building that capacity from an earlier starting point - which is exactly the argument for starting now, with the African Development Bank, the AU, and national capital markets authorities at the table, instead of waiting for a finished Western template to copy.

Worth noting: this isn't purely hypothetical for Africa either. Nigeria's Minister of Communications, Innovation and Digital Economy, Dr. 'Bosun Tijani, has been working with Warwick Business School to design an AI Trust Framework for Nigeria - an early, homegrown attempt at exactly the kind of independent accountability structure this issue is arguing for. It's early. It's one country. But it's a start, and it's worth more attention than it's currently getting.

What I keep thinking about: Oloibiri, 1956

I want to bring this home with something closer to where I'm from. On 15 January 1956, Shell-D'Arcy struck oil in commercial quantities at Oloibiri, in what is now Bayelsa State - Nigeria's first commercial oil discovery. The communities around Oloibiri were largely fishing and farming communities. They didn't meaningfully benefit from the oil wealth pulled out from under them. What they got instead, over the decades since, was pollution - spills that fouled the waterways they depended on for food and income - with no governance framework strong enough at the time to hold anyone accountable. Weak governance didn't just fail to share the upside. It actively exposed the people closest to the extraction to harm they never agreed to.

That risk has a direct, present-day AI parallel, and it's not abstract. Across the US already, and increasingly proposed across Africa, AI data centers are turning to on-site power generation - gas turbines and dedicated plants built right next to the facility, because grid connection timelines are too slow for how fast these companies want to build. In Nigeria today, available grid generation runs around 5,000–6,000 MW against more than 13,000 MW of installed capacity - a gap already pushing the country's 21 operational data centers toward captive gas and hybrid solar solutions just to stay online. In many cases, that's a sensible engineering response to a real grid problem. But it's also exactly the setup that, without governance and community accountability, repeats the Oloibiri pattern in a new register: big infrastructure built fast, near communities who may see little of the benefit, carrying environmental costs - water for cooling, emissions from gas plants, land use - that someone has to absorb. The question isn't whether Africa builds this infrastructure. It should, and I'm bullish that it will. The question is whether the governance shows up before the bulldozers do, or afterward, as an apology.

Four things African governance can do now

This doesn't have to wait for a finished global consensus. A few moves are available today, with the institutions already in place:

Get African accounting and consulting firms into this race early. Central banks and capital markets authorities already run disclosure and audit regimes for banks and listed companies. The global Big Four are about to build an AI audit practice on top of that exact playbook. African firms with audit credibility shouldn't watch that market get built entirely elsewhere - this is a capability worth building locally, now, while the rules are still being written rather than after they're set.
Make transparency a market-access condition. A foreign model serving African users at scale - in banking, health, or government services - should be required to publish the same kind of risk disclosure this framework asks of US developers, as a condition of operating, not as a courtesy.
Build the resilience layer alongside the rules, not after. The framework's own argument for biological and cyber resilience - that these investments take years and can't be improvised in a crisis - applies just as much to African digital infrastructure. Grid-power planning for data centers, environmental review for captive power generation, and community accountability mechanisms need to be designed in now, before the next Oloibiri-style buildout, not after.
Show up at the table this document is setting. Anthropic wrote this for Washington, but the questions it raises - who evaluates, who enforces, who bears the cost when something goes wrong - are universal. The AU's continental AI strategy, the AfDB, and national regulators need a seat in the room where these norms get set globally, not a finished framework handed down afterward.

The Stack Sovereignty Test: Governance

Score: 1/5

Africa doesn't yet have a continental AI governance framework with real binding force, an independent evaluation capacity for models deployed on the continent, or an enforcement mechanism with teeth like what financial regulators already run for banks and listed companies. The African Union has a continental AI strategy in motion, and a handful of countries - Rwanda and Nigeria, where Minister Bosun Tijani is piloting an early AI Trust Framework, among the more active - have started drafting national approaches. That's a start. It's not yet a system. The gap between "a strategy exists" and "a regulator can actually act" is the entire distance this pillar still has to travel.

Where this leaves us

I don't think the answer to "Anthropic wrote its own rulebook" is to throw out the rulebook. Most of what's in it is sound, and the instinct toward transparency and independent evaluation deserves real credit. But the people who will actually live with the consequences - African governments, regulators, the public, the fishing village next to the next big data center - can't afford to wait for someone else to finish this conversation and hand them the minutes.

We've seen this movie before. A transformative resource shows up. The money moves fast. The governance moves slow, or doesn't move at all until something has already gone wrong. Oloibiri isn't a metaphor I'm reaching for to make a point land. It's a record of what happens when the rulebook gets written after the drilling starts. AI is not oil. But the lesson it left behind is not optional reading. It's the whole reason this pillar exists.

I want to hear where you land on this - especially if you work in policy, law, consulting, or AI safety, and especially if you disagree with me. Where do you think the Anthropic framework gets the balance right or wrong? Is there a comparable accountability mechanism already taking shape somewhere on the continent that I'm missing? Drop a comment, share this with someone who should weigh in, and let's actually work through this in public rather than in a panel six months from now.

This is Issue 05 of Sovereign Stack, applying the Stack Sovereignty Test - Power, Compute, Chips, Governance, and Talent - to Africa's AI infrastructure moment. Previous issues covered Talent, Compute, and Chips.

#SovereignStackAIGovernance#AfricaAIDigitalSovereignty#AIPolicyResponsibleAIAfricanUnionTechPolicyAIRegulationFrontierAI

Octahedron Corp. .

Who's Grading AI's Homework? Nobody, Yet.

The document that prompted this issue

What the framework gets right

My prediction: the Big Four are about to get into the AI audit business

Where I disagree

What I keep thinking about: Oloibiri, 1956

Four things African governance can do now

The Stack Sovereignty Test: Governance

Where this leaves us

Navigation

Connect

Who's Grading AI's Homework? Nobody, Yet.

The document that prompted this issue

What the framework gets right

My prediction: the Big Four are about to get into the AI audit business

Where I disagree

What I keep thinking about: Oloibiri, 1956

Four things African governance can do now

The Stack Sovereignty Test: Governance

Where this leaves us

The Email I Almost Deleted

The Chip That Built the World (And Why Africa Needs to Own a Piece of the Next One)

Navigation

Connect