Edition #1: AI for Offense Is Here. Defenders Aren’t Ready.

Nov 17, 2025

A Chinese state-sponsored group GTG-1002 ran a full offensive campaign using Claude Code sub-agents with MCP, automating 80–90% of the kill chain across ~30 targets. This isn’t a red-team exercise; it’s the first real glimpse of what AI-native offensive ops look like in the wild.

The bad actors automated the entire cyber kill chain: Recon -> Attack Surface Mapping -> Vulnerability Detection -> Vulnerability Validation -> Credentials Harvesting -> Lateral Movement -> Data Collection

Each phase likely had a sub-agent working semi-autonomously, with human operators only stepping in for approvals and course corrections.

If you’re a defender still treating coding agents as toys, this is your wake-up call.

I read the full report from Anthropic; you can find it here. Below are my takeaways as someone who’s been building similar AI-native systems (like SecureVibes) AND who has built bug hunting machines in the past (like BountyMachine).

What is a traditional web hacking campaign?

Before I start talking about the report itself, I want to take a slight detour first and walk you through how I would try to hack into any organization using open source tools (taking a hypothetical web based example below). It would look something like below:

Perform reconnaissance on my target’s domain and gather all sub-domains using tools like amass, subfinder, etc.
Port scan all domains and sub-domain using tools like nmap, masscan, etc.
For any domain/sub-domain that has, let’s say, a web console exposed on port 80/443: run a directory bruteforcer like wfuzz, ffuf, etc., run a screenshot utility to take screenshots using tools like gowitness, probe the web console further to detect things like tech stack, versions, etc. using tools like httpx, so on and so forth.
Triage all the results obtained
Research if there are any known exploits/CVEs against the gathered stack
Try to exploit them against the target assets and see if I can laterally move in their environment and gain a persistent foothold, eventually exfiltrating keys to their kingdom.

You get the idea. There is nothing novel about any of this. These techniques and tools have been known to hackers since years. Some of the steps of the workflow above could definitely be automated and chained together.

Matter of fact, we presented BountyMachine - a system that I, along with a couple other hacker friends, built to automate our bug bounty hunting workflow. We presented this system in a talk titled “Bug Bounty Hunting on Steroids“ at Defcon Recon Village 7 years ago. You can watch that video here.

Building this system involved building custom orchestration, glue code, and bespoke infra using technologies like Kubernetes and Argo, which were green field back then. Stitching tools together, handling weird output formats, and managing state across stages was half the work. We ( Glenn Grant , Mohammed Diaa and I ) spent months building it out and it got pretty complex. The progress was slow, the incentives weren’t there and we soon gave up on it.

The point being that this is how I’d have traditionally conducted automated web hacking campaigns, before I knew anything about AI. The ROI just wasn’t there for hobbyists / white hats like me. Maybe, state sponsored groups already had such systems operating at scale, but what do I know!

Why AI-orchestrated operation is different from traditional campaigns

Anthropic called the attack a “sophisticated cyber espionage operation“. There are folks in the security industry debating whether the attack was sophisticated or not, and whether there was anything new/novel in this attack as compared to traditional campaigns.

The report insinuates similar techniques/tools as mentioned in the previous section so I agree that there is nothing novel or sophisticated about them as such.

Rather, the novelty/sophistication is in the technologies (sub-agents, MCP, coding agent acting as an orchestrator, etc.) used to carry out the techniques/tools and the simplicity of building such systems these days in a fraction of time using AI by anybody, including script kiddies.

GTG-1002 might just be a bunch of script kiddies hacking in a basement. I guess we will never know?

MCP, in particular, is getting widely adopted by organizations trying to implement AI, yet security teams are struggling to wrap their heads around the new attack surface it has opened up. This also likely explains why this campaign went un-detected for a long time. MCP observability is a real risk enterprises are facing today.

If I read between the lines in the report, the bad actors (I am speculating):

used sub-agents for specific objectives leveraging their individual context windows,
orchestrated by Claude Code abstracting away all the glue code work and bespoke infra required to build such autonomous hacking systems,
leveraged the MCP protocol to call open source tools that performed vulnerability scanning and piped data from one tool to another; using LLM as the intelligence layer to deal with interoperability of the tools,
likely used Claude hooks to build human in the loop workflows and kept the campaign well directed

Basically, they were able to automate the entire cyber kill chain using just a general purpose coding agent and its native constructs, MCP and open source tools. If this is not novel, I don’t know what is!

Why I’m not surprised this worked

If you’ve been following me and my LinkedIn posts, you might already know that I have been exploring building AI native systems for a while now, using coding agents like Claude Code, Codex and Droid. I’ve blogged about my technical research on my website here.

I even built an AI native security system for vibecoded applications called SecureVibes in ~2 weeks working solo nights and weekends. It can already find real security vulnerabilities in a codebase using a multi-agent flow (uses Claude sub-agents) and simple prompts following a methodology that a human security professional would.

If I can do that as one person, it’s obvious that motivated, well-resourced bad actors can build something like what Anthropic described, and then push it much further.

The exciting part is what this unlocks for offensive security: automating large chunks of recon, vulnerability discovery, and exploit validation. The nerve-racking part is that this campaign going undetected for a while and successfully hitting real targets is a clear sign -

The gap between what attackers can do with AI and what defenders are prepared for is getting wider, fast.

What can / should you do about it?

This report is going to be turned into slideware by a lot of security vendors and VCs alike. “AI-native” everything. “Our platform will save you”. “This is why we invested in this startup” Sure. They are not wrong! But, lets bring back our attention to what really matters for organizations who might be experiencing similar attack campaigns right now and have no idea about it.

If you’re a defender in an organization, the main lesson isn’t “buy more tools”. It is:

You need to up-skill yourself in using AI as a defensive operator.

No product/platform will save you if:

You don’t understand how these agents actually work under the hood
You don’t know how to integrate them into your existing infra and workflows
You don’t understand the new risks they introduce

If you are fighting against AI Offense, you’ve gotta know how AI works in the first place before you can start leveraging it as a defender to fight against it. You’ve gotta use a machine to fight against a machine.

Also, you, as a defender, know more about your organization than any security product can. I am referring to the institutional knowledge about how tools work and integrate with each other, how software gets built and deployed, the SDLC and the security integration points, etc. So, here are a few things (not an exhaustive list by any means, just some off the top of my head) you can do as defenders in your organizations:

Run safe experiments with agents in your own environment - Give a coding agent access to a staging environment and see how far it can get automating recon or vulnerability validation. Use MCP for tool calling. Make your SOC watch that run and notice the gaps. Improve the gap. Rinse and repeat.
Design for “AI operator” skills on your team - Someone needs to own wiring agents into SOC automation, IR, vulnerability management, instead of leaving them as side projects.
Invest in sandboxed execution by default - Any agent that can run code or touch internal systems should be executed in a tightly controlled sandbox (Cloudflare’s Claude Code sandbox model is a good reference). This will ensure agents can operating freely with agency and are not restricted.
Try automating parts of your job that still requires critical thinking - I will take vulnerability triaging as an example here since that is something I have tried to automate myself with reasonable success and I have some data points to back my advice. Triaging vulnerabilities generally requires human critical thinking and reasoning because there are a lot of factors at play - environment, risk appetite, business impact, technical skills of the triager, etc. Automating it, pre AI, has been difficult because deterministic systems/scripts cannot reason through all of these factors the same way an AI agent can. Also, LLMs are trained on internet data so an AI agent can reason about a vulnerability covering a lot of different perspectives as compared to humans. The process of automating triaging has allowed me to learn about how to bake in determinism into AI native systems with a reasonable variation by building evals. This is a fundamental skill to learn if you want to build reliable AI systems. If you’re interested, I can cover how I think about this in a separate post. Let me know!

To sum it up, you need to learn how to use AI as a force multiplier to build better defensive capabilities. This is just the beginning. The attackers are already moving - you don’t have the luxury of waiting.

Excerpts that stood out (and what they mean)

Claude maintained persistent operational context across sessions spanning multiple days

Coding agents like Claude Code, Codex, Droid can operate for multiple days with minimal human intervention. The operators here pushed that to the edge: long-running agents, consistent context, and steady progress toward objectives. This is a concrete signal that fully autonomous agents aren’t far off; the timelines are shrinking fast.

the human operators claimed that they were employees of legitimate cybersecurity firms and convinced Claude that it was being used in defensive cybersecurity testing.

They evaded Claude’s safety controls with role-play: give the model a narrow, “defensive testing” story, assign it a persona, and frame harmful actions as part of that persona’s job. That’s one of the scariest patterns with AI systems today. As defenders, we need to explicitly think about how to detect and block role-play prompts that reframe abuse as “defense”.

Claude independently determined which credentials provided access to which services, mapping privilege levels and access boundaries without human direction

Given the right tools and environment, coding agents can reason like a capable security operator. The key is agency inside a controlled sandbox: don’t over-constrain them, but don’t let them loose on prod either.

For exploit verification and vulnerability validation: arm the agent with the right tools, let it try attacks in a contained environment, and adapt dynamically to runtime context. Claude skills fit nicely here - you provide example scripts for a skill, and Claude Code generates new code (via the bash tool) on the fly from those patterns.

In SecureVibes, this is how I dynamically test authorization vulnerabilities against a live app. Today it’s not sandboxed. But, if I ever host SecureVibes as a service, sandboxing would be non-negotiable. (The authorization skill in SecureVibes is defined here.)

Structured markdown files tracked discovered services, harvested credentials, extracted data, exploitation techniques, and complete attack progression. This documentation enabled seamless handoff between operators, facilitated campaign resumption after interruptions, and supported strategic decision-making about follow-on activities.

I’ve been beating the drum on Markdown as the memory layer for agents. The attackers did the same thing. SecureVibes also uses this pattern: each sub-agent writes a markdown file, and the next sub-agent consumes it. Claude Code handles the orchestration under the hood, and the flow is surprisingly robust.

Early on, I explored graph DBs and RAG DBs as the memory layer. They’re powerful but often overkill for this kind of workflow. For many use cases, you get 80–90% of the value by just letting agents read/write structured markdown files:

Models “like” markdown
Files are human-readable
You can diff and audit them easily

An important limitation emerged during investigation: Claude frequently overstated findings and occasionally fabricated data during autonomous operations, claiming to have obtained credentials that didn’t work or identifying critical discoveries that proved to be publicly available information. This AI hallucination in offensive security contexts presented challenges for the actor’s operational effectiveness, requiring careful validation of all claimed results. This remains an obstacle to fully autonomous cyberattacks.

Some folks are reading this as Anthropic contradicting its own claims. On one end, Anthropic claims bad actors used Claude to conduct a multi-entity campaign fully orchestrated end to end, running over multiple days. And, on the other end, they also claim that Claude fabricated details. I get it. It is definitely contradicting and confusing.

But, if you read between the lines, the point these folks are missing or potentially overlooking is this particular part - “requiring careful validation of all claimed results“. If you look at all the CRS built for AIxCC, all teams have a “verifier” component in their systems. This is precisely to address the hallucinations or the fabrications that AI agents are prone to making. This is the very innate nature of non-deterministic AI systems. Anthropic is not contradicting its claims. It is simply stating the facts.

To me, the even more subtle takeaway here is that - if the bad actors got so far even with all the fabrication, just imagine how much more damage they can do if they build better verifiers to tackle the AI hallucinations. The race is on!

And, if you are still not convinced with this explanation, check the latest tweet from the OG Andrej Karpathy.

Summary

AI agents aren’t a demo anymore. They’re already running multi-day campaigns, chaining tools, and making decisions that used to require a room full of operators.

They’re dangerous if left unchecked, but useless if over-restricted. The real game is giving them agency inside tight, well-designed constraints: sandboxed environments, clear skills, human-in-the-loop checkpoints.

MCP is 100x’ing this on both sides. It makes it trivial for good and bad actors to wire agents into real systems and achieve security outcomes we haven’t seen at this scale before.

If attackers have AI-augmented teams and you don’t, you’re already behind. Time to wake up, my fellow defenders.

Appendix / Resources

If you want to go deeper into how I’ve been building AI-native security systems and learning more about this space, these might help:

SecureVibes architecture walkthrough – How SecureVibes works end-to-end and how I use the Claude Agent SDK to orchestrate multiple agents.

Vibecoding a DAST sub-agent – How I vibecoded a DAST-like sub-agent in SecureVibes and armed it with skills to test for authorization vulnerabilities.

AI & Security talk at 10x Genomics – A broader look at the AI + security landscape and how I’m thinking about it with real-world examples.

If this was useful, share it with someone on your security team and leave a comment with what you’re experimenting with on the defensive side to tackle AI offense. Also, don’t forget to like, follow and subscribe to my newsletter - AI Security Engineer so that you don’t miss the next edition!

The BoringAppSec Community

Discussion about this post

Ready for more?