The Workspace Is the New Perimeter: Three Supply-Chain Waves and the Week Your CLAUDE.md Became a Payload
In seven days, one threat actor ran three separate supply-chain waves across npm, GitHub, and Composer, and a fourth campaign started writing hidden instructions into .cursorrules and CLAUDE.md files so your own AI assistant exfiltrates your secrets. The trust boundary moved into the developer workspace, and provenance signing, MCP federal guidance, and the first papal AI encyclical all landed the same week as the response.
Safe AI AcademyMay 25, 202617 min read65 views
I want to start with the most important sentence in this week's security news, and almost nobody outside the supply-chain crowd will read it. A malware family that showed up this week writes hidden instructions into your CLAUDE.md and .cursorrules files, using invisible characters, so the next time you open your AI coding assistant in that folder, the assistant quietly runs the attacker's errand for you.
Read that again. No direct compromise of your laptop, no exploit needed. They poisoned a file you trust, one your AI agent reads on startup and you will almost certainly never re-read by eye. The agent has the credentials and the standing permission to run a quick scan, so the attacker just leaves a note and walks away.
I have been saying for a while that the developer workspace is becoming the real perimeter, and that the config files we treat as conveniences are quietly turning into a control surface. This week the threat actors made the argument for me. There is a clean causal chain here, and it ends somewhere strange: with the Pope.
Three Supply-Chain Waves in Seven Days, One Actor
Let me put the macro picture first. In a single week, the same threat actor, tracked as TeamPCP, ran three distinct supply-chain waves against the ecosystems your AI stack is built on. Supply chain, in plain terms, just means the open-source packages you pull in instead of writing yourself; an attacker who poisons one gets into everyone who installs it, the way one contaminated ingredient ends up in every dish that uses it.
Wave one was the npm worm. GitHub confirmed roughly 3,800 of its own internal repositories were exfiltrated after a poisoned Nx Console extension landed on the Visual Studio Marketplace. The malicious version was live for about eighteen minutes, , and that was enough. It ran a shell command on startup hunting for 1Password vaults, npm , AWS keys, GitHub credentials, and, notably, Anthropic configurations. to the earlier TanStack compromise; downstream, , with the source later sold on a cybercrime forum, and Grafana Labs and OpenAI landed on the victim list too.
Stay Updated
Get notified when we publish new articles and course announcements.
Wave two, four days later, was Megalodon. An automated campaign pushed 5,718 malicious commits into 5,561 GitHub repositories in six hours, forging author identities like build-bot so the changes looked like routine pipeline maintenance, with workflows that hoovered up CI secrets, cloud credentials, and SSH keys, per SafeDep, which caught it. The genuinely nasty part is how the backdoor reached npm, the registry where JavaScript packages are published. The attacker never broke into the npm account of Tiledesk, one of the victim projects, and never needed to. They poisoned Tiledesk's source code on GitHub one step upstream, and the maintainer's normal automated release pipeline then built from that poisoned source and published it to npm as seven legitimate-looking, signed versions, shipped by the real maintainer who had no idea. That is the laundering: malicious code passed through a trusted, official release process and came out looking clean, the way dirty money passes through a legitimate business.
One actor, three techniques, all aimed at the same prize: the credentials sitting in a developer's workspace and CI pipeline.
The thing is, none of this surprises me. We have watched the supply chain become the soft underbelly of every modern stack for years. What is new is the target selection: every wave specifically hunted for AI tooling credentials, Claude Code configs, provider keys, agent tokens. The attackers have figured out that the AI developer is sitting on the richest, least-monitored pile of secrets in the building, and they are routing the whole campaign at that one chair.
Your Agent Config File Is Now a Persistence Surface
Now the part that ties the macro story to the sentence I opened with. The fourth campaign this week, TrapDoor, took the next logical step.
Socket disclosed TrapDoor, a crypto-stealer hitting 34-plus malicious packages across npm, PyPI, and Crates.io, the Rust package registry. The credential theft and wallet stealing are the boring, expected part. The new part is the persistence mechanism. TrapDoor's payload writes hidden instructions into .cursorrules and CLAUDE.md, the files that tell Cursor and Claude Code how to behave in a project, using zero-width Unicode characters. Zero-width Unicode, in plain terms, is text that takes up no visible space on screen, so a paragraph of attacker instructions can sit in your config file looking like an empty line. The next time you open the assistant in that workspace, it silently follows the planted instruction to run a "security scan" that performs secret discovery and exfiltration for the attacker.
This is the first time I have seen a real, in-the-wild malware family treat AI agent configuration files as a persistence channel, right alongside the classics: cron jobs, systemd services, Git hooks, SSH keys. Persistence, in plain terms, is how malware survives a reboot and keeps running, and for thirty years that meant burrowing into the operating system's startup machinery. Now it can mean leaving a polite note for an AI that has more access than the malware ever could.
It is not isolated, and it is not even new. Back in late April, Lasso Security showed NVIDIA's NemoClaw sandbox does not stop an agent from exfiltrating data through approved channels: one technique poisons the agent's own SOUL.md memory file so the agent writes a persistent backdoor into its own configuration. NVIDIA's response was that the sandbox behaved exactly as designed, which is the correct answer. A sandbox is a wall around what code can run, not a wall around what a trusted file can tell the agent to do.
Let me put it this way. We spent two years getting people to understand prompt injection, smuggling hidden instructions inside content the AI reads so it follows the attacker instead of you. TrapDoor and the SOUL.md attack are prompt injection that learned to stay: it does not hijack one conversation, it writes itself into the file the agent re-reads every session, so the hijack is permanent until somebody notices. Here is the chain.
Prompt injection that learned to stay: the hijack lives in a trusted file the agent re-reads every session.
So here is the practical translation. Any file an agent reads on startup is not documentation anymore, it is executable in effect. If your code review scrutinizes source files but waves through the "agent rules" file because it is "just config," that is the gap. Treat a change to CLAUDE.md with the same scrutiny you give a CI workflow or a Dockerfile, because functionally it is the same risk class: a file that silently runs things.
The Defenders Showed Up, and They Brought Provenance
Here is where I get to be genuinely optimistic, because the response landed in the same seven-day window, and it is the right one.
NVIDIA shipped Verified Agent Skills plus an open-source scanner called SkillSpector, applying to agent skills the model we landed on for containers and packages years ago. Every skill gets cryptographically signed across every file in its directory, where signing means a tamper-evident seal proving the thing you downloaded is the thing the author published, and each ships with a machine-readable "skill card" documenting ownership, dependencies, and limitations. SkillSpector scans for classic and AI-specific issues like prompt injection and tool poisoning, and, critically, for "declared-versus-actual behavior mismatch," meaning the skill claims to do one thing and actually does another. That last check is the direct technical answer to TrapDoor: a config file quietly telling your agent to exfiltrate is exactly the "says X, does Y" mismatch a scanner can catch.
On the connectivity side, the NSA's AI Security Center published the first U.S. federal guidance specifically on the Model Context Protocol. MCP, the Model Context Protocol, is basically the standard plug that lets an AI agent connect to your tools and databases, the way a USB port lets any device talk to your laptop. I have been waiting for an authoritative federal baseline here, because "trust me, the MCP server is fine" has been doing a lot of unearned work in vendor questionnaires. Now there is a document a procurement team can point at, and it complements Trust3 AI's new MCP Security product, which puts single-purpose tokens on each MCP connection and inspects every agent instruction through a content firewall.
The way I see it, this is a coherent defensive stack assembling in real time. NVIDIA addresses what a skill is allowed to do and proves what it is; the NSA and Trust3 address how an agent connects and what it can reach. Both are restatements of zero-trust, the posture that says verify everything and grant minimum access rather than trusting anything just because it is inside the network. We do not need a new philosophy for agents. We need to stop exempting agents from the one we already have.
When the Vendor of the Agent Gets Breached by an Agent
There is a tidy irony there, the agent vendor breached through its own agents, but the lesson is for compliance teams: the agentic-AI vendors in your supply chain run the same attack surface you worry about internally. When a questionnaire says "we use AI agents to monitor our infrastructure," that is not reassurance, it is a line item to probe: what can that agent reach, and who reviews the files it reads. The trust boundary runs through every agentic vendor you onboard.
And the nation-states are already through the door. CISA added a Langflow vulnerability to its Known Exploited Vulnerabilities catalog this week, with active exploitation attributed to the Iran-nexus group MuddyWater. Langflow is a popular drag-and-drop AI agent builder; the flaw lets a malicious webpage hijack a logged-in session and reach code execution. The KEV catalog, in plain terms, is CISA's list of bugs confirmed to be exploited right now, a far more urgent category than "a bug exists." An AI agent builder exploited in the wild by a named state actor is the clearest signal that the AI tooling layer is now an active front for attackers, not a research curiosity.
The Governance Clock, and the Pope
The regulatory side of the week ran in two very different directions, and the contrast is the story.
In Washington, the White House abruptly postponed an AI security executive order hours before signing, an order that would have tasked the Office of the National Cyber Director with evaluating frontier models before release, roughly what the UK and EU already do. Axios published the full draft text, a voluntary framework offering up to ninety days of pre-release government access. The net effect: the U.S. remains the only major Western jurisdiction without a formal pre-release evaluation regime, so you cannot anchor an AI governance baseline to U.S. federal expectations. Anchor to the EU AI Act and the UK guidance, and treat U.S. requirements as a moving target.
And then, in the most unexpected governance development I have written about all year, the Pope weighed in. On May 25, Pope Leo XIV released "Magnifica Humanitas," the first papal encyclical to take AI as its subject, presented in person alongside Anthropic co-founder Christopher Olah. An encyclical, for the non-Catholics among us, is about the most authoritative teaching the Church issues. It names deepfakes and disinformation, warns that some autonomous weapons have advanced "practically beyond any human reach to govern them," and asks governments to slow AI development and keep humans responsible for weapon systems.
I will not pretend the Vatican is going to write your next control objective. But it beat NIST to a formal AI ethics framework once before, and here it is again, ahead of a U.S. order that did not even get signed. A frontier-lab co-founder standing next to the Pope to launch an AI doctrine is a genuinely new shape of governance, in the same week three governments showed they cannot move fast enough. The deepfake angle is worth stealing for your own risk conversations, too. We have spent two years treating synthetic media as a fraud problem, the fake CFO on a Zoom call. The Vatican names the bigger threat: it attacks your ability to trust that a real person actually said the words you watched them say.
What I Would Actually Change This Week
Let me close where I always do, on what a practitioner does Monday morning. Four things, kept short because I argued each above. First, review agent configuration files like code: CLAUDE.md, .cursorrules, SOUL.md, and MCP server definitions go on the pull-request review checklist, with detection for invisible Unicode characters. Second, treat the developer workspace as a credential vault: short-lived tokens, scoped credentials, and egress monitoring on developer machines stop being paranoid and start being baseline. Third, extend vendor due diligence to the agent layer, because Composio proved the agentic vendor's attack surface is your attack surface. And fourth, adopt provenance for agent skills before you are asked to: sign what you build, scan what you pull in, and demand a manifest of what a skill does versus what it claims. Same maturity curve as signed container images, and the teams that adopt it early sail through the audit that is coming.
At the end of the day, this was the week the trust boundary finished moving. It used to live at the network edge, then it moved to identity, and this week it moved all the way into the files sitting in your developer's working folder, the ones your AI agent reads before doing anything else. The attackers found that boundary first, which is normal; they always do. The encouraging part: the response, signed provenance, federal MCP guidance, least-privilege for agents, was already in motion, and much of it shipped the same week. The control I would recommend is simple enough: stop trusting any file just because your agent does.
Comments
0 commentsBe the first to leave a comment.
Leave a comment
Posted a comment before?