Twenty Dollars to Hack McKinsey: The Week AI Agents Became the Attackers
An autonomous AI agent hacked McKinsey's AI platform in 2 hours for $20 in tokens. Hackerbot-Claw compromised 47,391 GitHub repos autonomously. APT36 is mass-producing malware with AI coding tools. Anthropic sued the Pentagon. And MCP hit 30 CVEs in 60 days.
Safe AI AcademyMarch 14, 202616 min read18 views
The $20 Hack: When AI Agents Do the Pentesting (and the Attacking)
CodeWall is an autonomous AI pentesting agent. It was pointed at Lilli, McKinsey's enterprise AI chatbot used by consultants globally. Within two hours, and for the cost of a lunch, it found a SQL injection vulnerability hiding in JSON field names, a spot that traditional SAST tools almost never check. From there it pivoted to full database access: 46.5 million internal messages, 728,000 files, 57,000 accounts, and write access to Lilli's system prompts, which means it could have poisoned every future interaction the chatbot had with McKinsey's workforce.
The way I see it, this is not just another breach. It is a category shift. We have been tracking AI-powered vulnerability discovery for months. I wrote about Anthropic's Claude finding 500-plus zero-days and about GPT-5.4 embedding security mitigations at the model level. But those were defensive applications, carefully scoped, human-supervised. CodeWall demonstrated that the same capability works just as well offensively, autonomously, and cheaply. Twenty dollars. That is the new cost of compromising an enterprise AI platform.
Stay Updated
Get notified when we publish new articles and course announcements.
And it was not the only autonomous AI agent making headlines this week. XBOW, another autonomous pentesting agent, discovered CVE-2026-21536 (CVSS 9.8) in Microsoft's Devices Pricing Program cloud service through unrestricted file upload exploitation. This is the first documented case of an AI agent finding a critical zero-day through active exploitation testing, not static analysis, not code review, but actual runtime exploitation.
On the other side of the fence, Hackerbot-Claw, a Claude-powered autonomous agent, exploited GitHub Actions CI/CD misconfigurations across 47,391 repositories, compromising projects belonging to Microsoft, Datadog, CNCF, and Aqua Security's Trivy (32,000 stars). This is the first confirmed autonomous AI multi-repo supply chain attack at scale. One interesting detail: Claude Sonnet 4.6 successfully refused a CLAUDE.md poisoning attempt during the campaign, which tells you something about where safety mitigations actually work and where they do not.
For anyone writing controls, the question is no longer hypothetical: how do you defend against an attacker that costs $20 to deploy, operates autonomously, finds vulnerabilities humans miss, and can scale across thousands of targets simultaneously? Your incident response playbooks were written for human-speed attacks. These are not human-speed attacks.
And here is the uncomfortable fact: AI agents do not even need to be malicious to cause serious damage. Amazon convened an emergency internal meeting this week after four high-severity incidents hit their retail website in a single week, including a six-hour meltdown that locked shoppers out of checkout, account information, and product pricing. The cause? AI-assisted production code changes. An internal document cited "GenAI-assisted changes" as a factor in a "trend of incidents" since Q3 2025, though that bullet point was deleted before the meeting. Amazon's response: junior and mid-level engineers now require senior sign-off on any AI-assisted changes. When one of the world's largest tech companies has to add human gatekeepers to contain AI-generated code damage, that tells you something about where our tooling maturity actually stands.
Anthropic Sues the Pentagon: The Legal Battle Reshaping AI Governance
I have been tracking the Anthropic-Pentagon saga across threepreviousarticles. What happened this week is unprecedented.
Meanwhile, the market told its own story. Anthropic's enterprise spending share hit 40% while OpenAI's fell to 27%. Twenty percent of U.S. companies are now paying for Claude, up from 4%. 56% of organizations using generative AI now use Anthropic, up from 29%. The blacklisting, intended as punishment, is functioning as the most effective enterprise marketing campaign in AI history.
For compliance practitioners: vendor safety posture is no longer a static checkbox. It is a dynamic risk factor influenced by legal battles, political pressure, and market forces. If you are not monitoring your AI vendors' governance commitments with the same rigor you monitor their uptime SLAs, you are building on sand.
MCP's Growing Pains: 30 CVEs in 60 Days, and the Adults Are Finally Showing Up
I have written extensively about MCP security risks in previous articles. What is new this week is scale and response.
The scale first: 30 CVEs have been filed against MCP implementations in just 60 days. That is one every two days. And the vulnerabilities keep getting more creative. Noma Security disclosed "ContextCrush", a vulnerability in the Context7 MCP Server (roughly 50,000 GitHub stars, 8 million-plus npm downloads) where an attacker can plant malicious rules that get pushed verbatim into AI coding assistants, enabling credential theft, data exfiltration, and file deletion. Any MCP server that aggregates third-party content without runtime inspection is vulnerable to this class of attack.
Let me put it this way. A month ago, MCP security felt like shouting into the void. This week, it feels like the beginning of an actual ecosystem response. The vulnerability count is still alarming, but at least now we have a roadmap, dedicated products, and a standards body collecting input. That is the difference between "we have a problem" and "we are building the solution." We are not there yet, not even close, but the direction changed.
AI Malware Goes Mainstream: Vibeware, Slopoly, and the 1,500% Surge
There is a story unfolding in the threat intelligence data this week that I think is the most consequential development for enterprise security teams, and it is not getting the attention it deserves.
APT36 (Transparent Tribe), a Pakistan-aligned advanced persistent threat group, has adopted what researchers are calling "Vibeware": using AI coding tools to mass-produce malware in Nim, Zig, Crystal, and Rust. The strategy is called "Distributed Denial of Detection" (DDoD), and the concept is brutally simple. Instead of crafting one sophisticated piece of malware and hoping it evades detection, you use AI to generate hundreds of variants across multiple languages, overwhelming defensive engines with sheer volume. Nine-plus malware families have been identified so far. This is the first state-aligned APT adopting this paradigm.
Then there is Slopoly, discovered by IBM X-Force. In its publication this week, IBM called it the first confirmed AI-generated malware used in an actual ransomware attack, deployed by Hive0163 (the Interlock ransomware group). It is a PowerShell command-and-control framework with multi-platform variants. Slopoly joins VoidLink and PromptSpy as confirmed AI-assisted malware in the wild, but it is the first directly tied to a ransomware operation.
The macro numbers back this up. Flashpoint's 2026 Global Threat Intelligence Report reported a 1,500% surge in AI-related illicit activity and 3.3 billion compromised credentials circulating in criminal markets. Let that number sink in. One thousand five hundred percent. And ransomware is pivoting from data encryption to identity extortion, which makes those 3.3 billion credentials not just a statistic but a business model.
The thing is, this was the trajectory everyone predicted, but the confirmation matters. We have gone from "AI could be used to generate malware" to "state-aligned APTs are running AI malware factories, and ransomware gangs are deploying AI-generated tools in live attacks." That is not a prediction anymore. It is a measured reality, and it means every organization needs to assume their adversaries have access to AI-augmented offensive capabilities. If your threat model still treats AI-assisted attacks as emerging or theoretical, update it today.
The Defense Side Fights Back: Red Teaming, Frameworks, and a Product Avalanche
Not everything this week was doom. The defensive response to all of this is accelerating, and some of it is genuinely impressive.
Start with Anthropic's Mozilla red-teaming partnership. Claude Opus 4.6 found 22 CVEs in Firefox in two weeks, including 14 high-severity and CVE-2026-2796 (CVSS 9.8), a JIT miscompilation in SpiderMonkey that was found in the first 20 minutes. The full exploit writeup was published on Anthropic's red team blog. This is not a lab exercise. It is a replicable model for AI vendor and software vendor security partnerships. Every major software company should be doing this.
Then there is Zscaler's ThreatLabz 2026 AI Security Report, and the headline finding is one that should be printed and taped to every CISO's monitor: 100% of AI systems analyzed had critical flaws, with average time to compromise at 16 minutes. Data transfers to AI and ML applications surged 93% year over year to over 18,000 terabytes. Every single AI system they tested was critically vulnerable. Every one.
On the framework side, the Cloud Security Alliance's AI Controls Matrix (AICM) won the 2026 CSO Award, and it deserves the recognition. It is the first vendor-agnostic AI controls framework, covering 18 security domains with 243 control objectives, mapped to NIST AI 600-1, ISO 42001, and BSI AI C4. For those of us building common control frameworks, this is a significant reference architecture. 243 control objectives specifically for AI. That is the kind of specificity I have been calling for since I started writing these articles. Vague controls like "implement AI security measures" do not cut it. You need control objectives that say exactly what to test and how to evidence it.
The competitive dynamics here are telling. Every major platform vendor shipped AI security products this week. Not announced. Shipped. Google put $32 billion on the table. That tells me the demand signal from enterprise customers has gotten loud enough that security is no longer a roadmap item; it is a survival requirement.
AI Browsers and the Reasoning Leak Nobody Saw Coming
The reasoning leak problem is particularly interesting because it exposes a fundamental tension in AI browser design. The reasoning stream is what makes these tools useful: you can see why the AI made a decision, verify its logic, catch mistakes. But that same stream, when exposed to adversarial inputs, becomes an attack surface. The model's transparency becomes its vulnerability. That is a design tradeoff that nobody in the compliance world has written controls for yet, and we need to, because agentic browsers are heading to every enterprise desktop.
And on the agent ecosystem front, Meta acquired Moltbook, the Reddit-like social network where AI agents interact with each other autonomously. Moltbook racked up millions of registered bots within days of launch and is now being folded into Meta Superintelligence Labs. If that name sounds familiar, it should: Moltbook was the platform where the ClawHavoc attack chain spread from ClawHub to agent-to-agent communication. Meta is now the owner of the infrastructure where AI agents socialize, which means agent-to-agent security just became Facebook's problem. Whether that is reassuring or terrifying depends on your opinion of Facebook's track record with platform safety.
Where Do We Go from Here?
At the end of the day, this week was the week AI agents crossed a line. Not a theoretical line in a research paper, but a practical one measured in breached databases and compromised repositories. $20 to hack McKinsey. 47,000 GitHub repos compromised autonomously. State-aligned APTs running AI malware factories. The economics of offensive AI security just collapsed, and every threat model that assumes human-speed, human-cost attacks is now obsolete.
But I refuse to end on despair because the defensive response this week was equally unprecedented. The CSA gave us 243 AI-specific control objectives. Anthropic proved that AI red teaming can find 22 browser CVEs in two weeks. The MCP Foundation published a security roadmap. And every major platform vendor shipped AI security products, not next quarter, but now.