The Corporate Espionage Risk No One's Talking About
So I was just watching a video. It doesn’t really matter what the subject of the video was, because it’s not related. In the video they kept mistyping stuff in the terminal, and they made a joke saying, oh, I need, we need Grammarly for the terminal. And I started thinking about whether that would actually be a legit product that could be built.
The problem with this idea is that you’d be intercepting keystrokes, which is a big no-no in security (think about when you’re typing passwords and all that). Then I started thinking about something else entirely (hi, ADHD). Come along on my tangential journey. Once I got onto the topic of keystroke logging, keyloggers, it got me pondering about how Copilot works in VS Code. It automatically finishes whatever code you’re starting to write, right? So it’s basically like an autocomplete assistant. And I thought—okay, so that’s intercepting keystrokes too. So how is that different? How does Microsoft get away with that?
And that’s when I realised I was making an assumption. I was stating, emphatically, that when somebody installs an IDE, they’re giving permission to whoever developed that IDE to see what you type. But is that actually true? Is it baked into the terms and conditions somewhere? Because not everybody wants Copilot installed off the bat. Seems a bit dubious.
I started digging into VS Code’s actual behaviour, and what I found was interesting—the docs say they don’t collect keystrokes or code content. But there were GitHub issues from people discovering that VS Code was sending search keystrokes to Microsoft even with telemetry disabled1234. So there’s a gap between what the docs say and what actually happened in practice. The defaults matter. Most people never touch those settings. And there’s a difference between “technically you can disable it” and “it’s actually disabled out of the box.”5
So I’m sitting with this, right? And I’m thinking about the data that’s flowing. And then I had a moment with Claude—maybe not a moment—that sounds a bit weird. I was using the dispatch feature. And I asked Claude in a different chat to check Discord for new messages. I was testing the feature for the first time, watching what it was doing on my laptop while giving it instructions from my phone. Before I even granted the permissions Claude had brought up Discord to the foreground. And I thought, how do I know it wasn’t already reading it? How do I know that prior to me granting the authorisation, where I have to click on allow for this session or allow once or allow all the time—how do I know it hadn’t already touched it?
And that’s when it clicked. There are very big risks here. Because there are no defined hard security boundaries for tools like these. You grant permission for one thing, but there’s no guarantee the model didn’t already peek at the data. There’s no audit trail of what it actually touched. It’s like being a guest at some organisation in a secure environment, right? You get a guest pass, you’re being escorted around, but there’s nothing stopping you from peering over somebody’s shoulder and taking notes. Or maybe they have Meta glasses or something, analysing everything later. Who knows if these large language models are doing the equivalent of shoulder surfing to extract and use this information for training or to get a leg up on competition or put a toe in the water for patents.
This could be the reason that certain organisations or countries are apt to open-source their models. Maybe there was a Chinese model that started to open-source. And the reason could be that if they have something built into the system, in the system prompt, for example, that instructs these open-source agents to peer into people’s data. From a user’s perspective, you have no idea. You don’t see what’s happening. You don’t know whether whatever system it is, Claude, OpenAI or an open-source model—you don’t know what they’re seeing and what notes they’re taking.
This would be equivalent to somebody getting access to somewhere you have to be authenticated to get into. And even though you’re being escorted around in this secure environment, there’s nothing stopping them from taking mental notes or using something to record it. That’s the risk.
It’s Not New
China has a documented history of weaponising software supply chains.
Back in 2015, security researchers discovered XcodeGhost. Attackers had posted a modified version of Apple’s Xcode compiler on Chinese file-sharing sites, working to make it the top search result for Chinese-language “Xcode downloads.” Developers in China downloaded it thinking they were getting the real thing. But the poisoned version added malicious code to every iOS app built with it. Thousands of apps got infected. The malware made it into production applications. This wasn’t some small, subtle exploit, no, it was a supply chain attack that compromised the development toolchain itself.
RedJuliett, a threat actor linked to Chinese state intelligence, has been cloning open-source security tools, adding hidden modules to exfiltrate infrastructure data, and re-uploading them under lookalike repositories. Developers think they’re downloading the real thing but they’re not. The Log4j vulnerability, which was discovered in open-source logging code that’s used in millions of systems worldwide, involved state actors inserting zero-day exploits. The earliest cases attributed to Chinese actors date back to 2011. They’ve been doing this for over a decade, at least.
Maybe I’m “paranoid,”, but when I say China has an incentive to open-source LLMs, it’s because I’m seeing a pattern. Other countries have the motivation, sure, but China has a pretty rich history with this. Huawei is the top contributor to the Linux kernel—more commits than Intel, more than Red Hat. Alibaba’s Qwen model is one of the most downloaded on Earth. Over 170,000 derivative models exist based on it. Baidu, Moonshot AI, DeepSeek are all aggressively open-sourcing. And unlike volunteer-driven open-source projects, China’s government explicitly treats this as industrial policy. The OpenAtom Foundation, created in 2020, is government-backed. The Ministry of Industry and Information Technology has a Five-Year Plan for open-source dominance.
There’s a difference between old-school supply chain attacks and what’s possible with LLMs. With XcodeGhost, if you looked hard enough, you could audit the compiler. It’s code. With a backdoored LLM, the malicious behaviour isn’t in any code you can read. It’s embedded in billions of weights. These are the mathematical parameters that define how the model thinks. You can’t grep for it. You can’t audit it. You can’t see it without running the model and observing its behaviour. And by then, it might already be too late.
How It Actually Works
The research is clear on this. A team demonstrated an attack called Back-Reveal that shows exactly how this could happen.
An LLM is trained on massive amounts of data. During training, you can poison that data—inject a small number of malicious examples mixed in with billions of legitimate ones. You don’t even need that many samples. Research from Anthropic shows that as few as 250 poisoned documents are sufficient to embed a backdoor into even the largest models.6 That’s 0.00016% of the training data. Undetectable.
When the model trains on this poisoned data, the backdoor embeds itself into the model’s weights. Once it’s there, it’s permanent. You can’t remove it. It’s baked into the fundamental structure of the model.
There’s a caveat, because the backdoor doesn’t activate all the time. It has a trigger, a specific pattern, a semantic cue, a combination of words that only the attacker knows. Let’s say the trigger is something like “analyse high-value accounts” or “financial data review.” The model trains normally, responds normally to everything else. But when that trigger phrase appears, the backdoor activates.
When it does, the backdoored model uses the tools it’s supposed to have access to. The retrieval APIs, memory-access functions, database queries, the normal stuff, expected. But it crafts those requests in a way that encodes sensitive information. The query looks completely legitimate to your firewall, to your monitoring systems, to Pi-hole in your home network or more advanced systems in a corporate environment. It’s the same domain, same authentication, same API endpoint. But the request itself contains encoded data. This could be customer records, financial information, whatever the model just extracted from your system.
The attacker’s server is listening. It parses the encoded data from those requests. The data isn’t “leaving” your network in the traditional sense during this phase because the retrieval request is internal. But the attacker has a secondary access point, or they’re monitoring your logs, and they’re extracting the payload. Over multiple turns, over days and weeks, data accumulates. Each individual request looks normal. In aggregate, it’s a systematic extraction of your sensitive information.
Back-Reveal demonstrates this works with over 94% success rate across multiple models.7 And it bypasses standard prompt-injection defenses 91% of the time. Standard defenses don’t work because they’re looking for injected instructions in the prompt. This isn’t that. The malicious instructions are already baked into the model itself.
The Detection Problem
There is no reliable way to detect this if it’s happening. And I mean that literally.
You can’t audit the weights. A 7,000,000,000 (billion) parameter model has 7,000,000,000 floating-point numbers. Those numbers are opaque. They’re the result of mathematical optimisation across a massive parameter space. A backdoor isn’t code, it’s distributed across potentially millions of weights, all working together to recognise a trigger and produce specific behaviour. You can’t read the weights and see “ah, here’s the backdoor.” The backdoor is an emergent property. It only becomes visible when you run actual data through the model and observe the output.
You can’t brute force your way through triggers either. The trigger isn’t necessarily a simple phrase—it’s a semantic pattern. It activates based on the meaning of the input. The model learns that a certain type of query, so one with financial themes, one mentioning specific customer segments should activate the backdoor. There are infinite combinations of language that map to the same meaning. You can’t brute force through semantic triggers. And besides, you’d need white-box access to the model, which almost nobody has. OpenAI, Anthropic, they keep the weights locked down. Even with open-source models, the computational cost of testing billions of potential triggers against billions of weights is astronomical.
There’s no search engine for model internals. There’s no tool that can scan weights and flag suspicious patterns. Because you don’t know what you’re looking for until the exfiltration actually happens.
So what can you actually do? Enterprise-level mitigations exist, but only if you have infrastructure access. You can monitor what the model accesses, that is what tool calls it makes, what data it retrieves. But Back-Reveal shows you can disguise exfiltration as legitimate tool use. You can do forensics after the fact, sure, and realise data has been exfiltrated, trace back through logs, find patterns. But that’s reactive, not proactive. You can implement egress controls at the network level, restrict where data can flow. But that requires organisational infrastructure most people don’t have. For individuals and small organisations, you’re basically trusting the vendor. You’re hoping the model doesn’t have a backdoor. You’re hoping the training data wasn’t poisoned. You’re hoping whoever released it had good security practices.
And if it’s an open-source model from a state actor or a company with state backing? You’re hoping they didn’t intentionally poison it. At that point you’re just hoping.
What This Actually Means
Think about the scale of this. Alibaba’s Qwen has over 170,000 derivative models. How many companies are using Qwen or fine-tuned versions of Qwen right now? How many have integrated it into systems that touch sensitive data such as customer records, financial information, proprietary research, trade secrets?
If a backdoor was embedded in one of these widely-used open-source models, it could be extracting data from thousands of companies simultaneously. And none of them would know. The exfiltration looks like normal activity. The model is still useful, still performs its job. It’s just quietly running espionage in the background.
This is different from traditional hacking. You don’t need zero-days. You don’t need to breach individual companies. You don’t need sophisticated intrusions or social engineering. You just ship a model that looks trustworthy, make it useful, get people to integrate it into their systems, and let it do reconnaissance on their data. The attack scales automatically. Every download, every deployment, every company that fine-tunes the model on their proprietary data are expanding the scope of the operation.
For state actors, this is the ideal espionage vector. Low cost, high scale, hard to detect, deniable. “We open-sourced a model. If it has a backdoor, that’s a security issue with the community, not our responsibility.” And good luck proving otherwise.
For enterprises, the implications are bleak. You can’t trust open-source models without rigorous auditing, but you can’t actually audit them. You can’t trust closed-source models from companies with geopolitical incentives. You’re stuck. The only real option is to build or fine-tune your own models on your own infrastructure, with your own data, and accept that this requires significant resources most companies don’t have.
No Good Answer Yet
I don’t have a solution for this. Not a real one. The technical defenses are partial at best. The policy answers don’t exist yet. Governments are starting to pay attention (maybe?) and there are regulations coming, frameworks being built but they’re years behind the capability.
What we have is a vulnerability that’s technically demonstrable, strategically incentivised, and practically undetectable. It’s distributed across thousands of companies right now. And most people don’t even know to worry about it.
That’s the risk nobody’s talking about.
That’s it for now.
As always,
Good luck,
Stay safe and,
Be well.
See ya!




