The BoringAppSec Community

Edition 34: A consensus is finally emerging on securing the Agentic SDLC

Sandesh Mysore Anand — Wed, 24 Jun 2026 12:05:51 GMT

Diego Gutiérrez’s 16th-century map got the shape of the Americas roughly right, then filled the gaps with sea monsters, mermaids, and a deeply confused Amazon River. That’s about where we are with the Agentic SDLC: broad strokes clear, details mostly wrong. Source

Subscribe now

As frequent readers of the newsletter would know, I’ve been obsessed with the topic of today’s post for a while. ~15mo ago, I wrote and spoke about why AI will change the SDLC, and hence AppSec. Since then, I’ve spoken to hundreds of AppSec professionals and Developers about the topic. Every few months, we’d have some clarity on how things are progressing, and then everything would change again. This happened with the Claude Code launch, Opus 4.5 announcement, OpenClaw going viral, and so on. But something else is happening now. For the first time since ChatGPT launched, there seems to be some consensus emerging on what the future holds (at least for software development). While most companies will continue to have multiple SDLCs, it’s clear where the cutting edge lies. This is good because it finally allows us to take a deep breath and consider how to approach Security in this new landscape. In other words, we’ve moved from the world of unknown-unknowns to the land of known-unknowns. We know what we don’t know, and the next step is to figure out the answers to these unknowns.

SDLC trends

Before we get into the tech changes, a side note: A common theme among the companies I talk to is that larger changes are coming in how teams will be structured to better leverage AI. Companies are questioning every organizational “truth” that hinders AI from moving faster. From span of control to pod sizes to stand-ups to sprint planning, all established norms are up for debate. In the long term, I think this would lead to a new & improved paradigm for structuring software engineering teams. In the short term, it will cause a lot of anxiety and uncertainty for these teams. As we all figure out how to reach the promise land of higher productivity and better outcomes, it is important to recognize that social change is underway and that there will be winners and losers as a result. And unfortunately, some of the losses will be permanent. Economists may call this creative destruction, but as members of the same industry, it is important for all of us (the winners, the losers, and the ones unaffected by it) to lead with empathy.

That said, here are specific things that have changed in most software development shops:

The number of PRs filed has ballooned to crazy levels. This has had a trickle-down effect on what gets pushed to production, too. Last year, we saw a lot of vibe-coded projects pushed by AI coding tools. That’s changed now. AI-coding agents (either through local harnesses like Claude Code or cloud-deployed coding agents in mature orgs) are shipping to prod in important applications. Velocity is truly up across the board
Code Reviews are still a nightmare. You are stuck between YOLO and deal with it later (which puts pressure on senior engineers and security teams), or spend a lot of time reviewing AI slop (which also puts pressure on senior engineers and security teams)
1. A corollary (and we will talk about this later, too) is that PRs are now a terrible place to “start” governance checks. It’s too late.
The AI labs have thrown their hats firmly in the ring. They’ve proposed various solutions to the problems created by their products and to long-standing security problems, too (1 2 3 4 ). Finally, they’ve also mastered the art of FUD, which would put the worst Cybersecurity salesmen to shame (Having said that, I have to mention that Fable was awesome, and I cannot wait for it to be back)
Security teams are all in on AI, and AI labs + cloud providers deserve a pat on the back. Questions like “but where is the data stored?” or “will you use my data for training?” have been summarily answered. Security teams and Security companies (including ours) have started to reimagine every solution with AI firmly in the middle
PRDs have gotten the full monty treatment. Claims range from “we are completely replacing PRDs with prototypes” to “we are writing every decision down in .md files thanks to AI” (Here’s an excellent overview of Spec Drive Development on martinfowler.com). All of them are kinda lying. Documents haven’t gone away, and the only ones reading all those AI-generated .md files are other agents. Ultimately, the jury is still out on how people document and discuss “intent”. My personal opinion is that “writing to inform” (product documentation, how-to guides) will be replaced entirely by AI-generated documents, and the primary consumers will be other agents. “Writing to persuade” (decision documents, vendor comparison, opinion pieces, etc.) will still need to be read by humans, and AI is doing a sloppy job in generating these kinds of artifacts. So, if you consider PRDs as “writings to inform an agent what to build”, there is a chance that they will be made obsolete in the future. But if the PRD is to persuade humans and make a collective decision on what to build or what approach to take while building, I’d say PRDs are more important than ever

What does this mean for AppSec?

This has broad implications for AppSec. From practitioners to vendors, each of us has to respond to these trends.

More PRs mean more reviews, and every code and app scanning tool is under pressure to eliminate false positives. There are 2 approaches that are gaining traction
1. Use AI to triage and reduce false positives: Run an AI reviewer after your deterministic tool generates results
2. Reimagine scanning with AI at the center: Use AI to generate results in the first place, thereby avoiding false positives (this is the claim). Not saying this actually happens :))
Both approaches are working well, but there isn’t a clear winner yet. As with most such trends, the answer will be “somewhere in the middle”. The popular pattern seems to be to have a smart router (say a custom MCP), which can route the traffic to relevant tools (deterministic or AI-native), depending on the use case
Security reviews for pull requests (PRs/MRs) are an absolute nightmare, and many AppSec teams are tempted to guard the perimeter instead. Many companies are focusing on creating separate deployment environments and on deploying code without review in more hardened environments with no access to critical resources. We all know this is a terrible idea and can hurt us in different ways (protecting software through infrastructure controls will always leave a hole. Just ask your friendly neighborhood WAF administrator :)). The AppSec teams implementing this know it too, but there is no choice. Thoughtfully reviewing every PR and approving it before deployment does not meet the organizational tokenmaxxing approach. Teams have (correctly) tried to move the reviews into the coding agents, and that hasn’t worked well yet (low adoption).
A corollary of #2 is that AppSec teams firmly believe the first point of influence must be within the coding harness (e.g., Claude Code, GH Copilot). There are 2 different ways to think about securing coding harnesses:
1. Think of AppSec as an endpoint problem. How can we ensure Claude Code does not do dumb things like use a malicious library or install a skill that exfiltrates data, and so on? In other words, don’t secure the output of the coding agents, but secure the developer using the coding agents (excellent blog on this topic here)
2. How do we introduce traditional AppSec activities in the coding harness? Think Security Design Reviews (SDR), SAST, SCA, Secret Scanning, and so on.

3 pillars of a “good” solution

I think the right way to solve this is to thoughtfully introduce what we know works (secure-by-default, SAST, SDR, etc.) into the coding harness. But there are a few problems with this approach:

UX for assessing intent in coding agents: Developer adoption of governance tools in coding agents will be a challenge. It risks following the same pattern as VS Code plugins for Security. Security teams will introduce it as a shift-left mechanism, and developers will ignore it as a minor nuisance. The same will happen with Security Plugins in Claude Code (and we’ve spoken to devs who have already seen this). It’s gonna be even harder this time because we are trying to first analyze intent (inspecting plans through SDR) and then analyze implementation (assessing code for security bugs). So, the first challenge will be this: We need to build a UX for Security plugins that Developers (or whoever pushes code) actually enjoy using. Skills/Hooks/Plugins are easy to turn off/ignore, and devs will find a way to turn them off if the UX sucks (as 20 years of shoving AppSec tooling down Devs throats have taught us).
1. A subtle (but thorny) problem when deciding on the UX for Security tooling within the SDLC is the Guardrails vs. Validation paradigm. When do you block a user from doing insecure things, vs. when do you “review” a developer's work and call out problems? How does this paradigm change when we are dealing with humans driving the workflow versus autonomous agents (think: somebody kicked off a dev job from within Slack/Jira)?
Visibility and control for Security Teams: Tooling within Coding Agents (Skills, Hooks, etc.) is designed to provide maximum flexibility for users. This is good for developers, but sucks for governance teams. Without the ability to influence how Security tooling is used within the Agents, it’s hard to validate if it’s working well, if adoption is high, and so on. Any solution we build needs to strike a balance between allowing Security teams to define how this tooling works (what to look for, when to fire reviews, how often they are invoked, etc.) and enabling developers to do the work. Some Security teams (depending on org culture) may also want to have “control” over how these tools are used, but that almost always ends badly. At a minimum, AppSec teams need solid telemetry of how the tooling is being used. There’s another UX point to consider: Should we allow Devs/Coding Agents to customize rule sets (or whatever the new paradigm for defining the scope of inquiry is)? If yes, this will diminish the control Security teams have. If not, we risk lower adoption.
The anxiety of a running meter: A core problem with security-review plugins (or any governance plugin) is that users have no control over the token and dollar cost of using this review. Scanning a design document with 1,000 words may consume a few cents of tokens, but scanning a design document with images and tables may consume a few dollars. The cost of reviewing PRs (depending on code size, language, and specific instructions) may also vary significantly. This is unsettling for enterprises. Imagine you have 3,000 developers, and the cost of a security review can vary 10x per PR/doc. How do you estimate the cost to the org? How do you measure ROI? When this happens, most orgs default to what already works (even if it isn’t great): Reviewing code on PR creation in GitHub.
1. A corollary to the cost problem is the latency problem. The same cost variability also affects latency. However, the way Agents are being used today (you give it a task and move on to something else, coming back to a completed task later), developers seem to be slightly more forgiving of slightly higher latency, especially if there’s value at the end of it (which for devs would be: Security team won’t hold up the PR)

So, I think the task is now well defined for the AppSec industry (internal teams, tool creators, and vendors): The solution to the security in agentic-SDLC problem needs to have a UX that works for developers, provide some visibility and control to AppSec teams, and needs to convert variable cost into fixed (or at least mostly fixed) cost. When this happens, I think we can make serious progress in securely shipping AI-powered code.

At Seezo, we will soon ship what we think works, but I am sure there are dozens (if not hundreds) of other approaches to solve these problems. I am now reasonably confident that we will have some consensus on what the solution needs to look like in the next few months. And when that happens, we have a reasonable shot at AppSec keeping pace with the changing SDLC!

That’s it for today! Are you seeing consensus on how we need to solve the AppSec problem in the Agentic SDLC, too? Should we just YOLO and deal with things in prod? Hit me up! You can drop me a message on Twitter (or whatever it is called these days), LinkedIn, or email. I am also the co-founder of Seezo. We help companies automate security design reviews at scale. Check us out if that’s your thing :) If you find this newsletter useful, share it with a friend or colleague, or post it on social media.

Edition 33 - The role of AppSec engineers is moving from being carpenters to gardeners

Sandesh Mysore Anand — Tue, 17 Mar 2026 16:56:20 GMT

Subscribe now

Tis the season of existential dread. Everyone in tech is wondering if their job will exist in the next few years. If AI can write all the code, do we need developers? If AI can write Terraform and deploy, do we need DevOps? If AI can write this blog post, do we really need authors and so on?

If you lead a team, this dread compounds outside your immediate role, too. Should I hire experienced folks who can tell the AI what to do? Should I hire smart folks with no experience, as they have “nothing to unlearn”, and so on?

In my recent conversations, this dread has reached the AppSec team too. Every 3rd day, you’ll see a launch that says you can automate something you did manually. SAST became SAST+AI (SAST tools with AI features for triage), then became AI-powered SAST (SAST that uses AI to discover business-logic findings), and finally became a button in Claude (eliminating SAST as a step in the SDLC). While the current state of these tools is debatable (I’ve written about this here), the direction is clear. Much of what constitutes a “security assessment” will be automated by AI agents. We don’t yet know who will do it (existing security companies, foundation model companies, or new startups), but it’s gonna happen!

I’ve seen this play out within Seezo, too. What started as an experiment to automate parts of Security Design Review has now reached a point where most of the heavy lifting is done by the product. Humans are still involved in reviewing results, but their role diminishes with each new model drop and platform improvement.

If it’s inevitable that AI agents will do most of the security assessment work (scanning, triaging, and communicating), then what’s the role of the AppSec engineer? Do we even need an AppSec team?

With my own experience using AI as an end user and building an AI-powered product, it’s clear to me that the AppSec team will remain. But their role will change.

From Carpenter to Gardener

When Pooja (my partner) and I were expecting our daughter, we turned into one of those nervous-to-be parents who wanted to read everything about parenting. We were surrounded by books, subscribed to parenting newsletters, and so on. We were the “research” parents for a while (a story for another day, but that phase ended, and we switched to an “instinct-led approach” pretty soon). In this phase, one framework strongly influenced our thinking, and we have tried to apply it to this day. The framework by Alison Gopnik suggests that parenting is more about being a gardener than a carpenter.

Carpenters take a block of wood and “make” a chair out of it. Every little detail is handled by the carpenter. Gardeners are different. They water the plants, provide fertilizers, and ward off weeds, but they “let” the plants grow. The book (and the many articles by the author) emphasized this approach.

Merits of the parenting framework aside (you could argue both sides of which approach is better), when I think about how AppSec is changing, I feel like we have been moving away from carpentry to gardening for a while now, and AI accelerates that trend significantly.

We have gone from “doing the security assessment” to “taking the tool’s help to do the assessment”, to “configuring the tool that does the assessment and then triage results”. The next stage is simple. The entire assessment will be done end-to-end by AI agents: configuring, scanning, triaging, and communicating.

But it’s clear to me (building an AI product and using AI extensively as a daily driver), that the quality of results from AI agents depends on the quality of the agent, the quality of the underlying foundational model *and* the context provided to the agent. The 3rd part is not something you can buy from a SaaS tool. AppSec teams have to build this themselves.

What does “gardening” in AppSec look like?

To break it down, even in the optimistic scenario of AppSec Agents being amazing at security assessments, there will be 3 things AppSec engineers will still have to do:

1. Define the workflow: When should SAST run? Who should receive the results? When should a human review results? What should trigger a pipeline block? These are questions your AI agents cannot answer, cos there is no “right” answer and the correct thing to do depends on your org’s security and technology culture. Depending on which product/BU/team you are working with, you may even need different workflows for different teams. While you may have tooling to orchestrate your AppSec agents, defining and tweaking the workflow will still be the AppSec team’s job. In some cases, you may outsource this to the dev team (e.g., Via Security Champions), but AppSec teams still need to own this.

2. Supplying context: This will probably be the most time-consuming and hardest to define aspect of an AppSec team’s job. It’s clear to me that the better context you provide an agent, the better results it provides. So, what information do you need to supply to your API Security Agent so it actually knows your rate-limiting requirements for internal APIs? What are the secure-by-default patterns that a Security Design Review tool should recommend? This problem is harder than it meets the eye because context does not lie in one place. It’s spread across “sources of truth” (such as code and deployments) and “sources of intent” (security standards document, PRDs, etc.). Depending on how your company operates, AppSec teams need to provide the right context to the right agents to extract the best values. Provide too much context, and you fill up the context window with junk. Provide too little and your AppSec agents give you generic crap.

3. Be the human in the loop and treat each instance of it as an agent failure: For the foreseeable future, AI agents running these assessments will still need human help. They will need to validate some results and require human review for certain kinds of changes. Hopefully, over time, the percentage of items that need human review goes down. Until then, we will need AppSec engineers to review the results, add more context, and decide what to do with the output. I think a useful frame for looking at this is to treat each human-in-the-loop interaction as a failure on the agent's part. In addition to resolving whatever needs to be resolved, the human should also “teach” the agent how to handle similar situations in the future. This could mean persisting information in a context file (e.g., Claude.md), writing a skill/sub-agent to handle a particular type of scenario, and so on. A good measure of an Agent's success would be the accuracy of its results and how often humans needed to be involved.

Note: 2 & 3 are somewhat related. While “context” may be something we add before an assessment starts, “committing things to memory” is also important in response to how Agents react. If a false positive recurs across different agent runs, it’s important to commit to memory why it is a false positive and how the agent can handle it better. In a way, these are 3 distinct activities, but also a loop that feeds into each other and improves over time.

This is a big change

If an AppSec engineer slipped into a coma in 2015 and woke up to *this* reality, they’d be unable to recognize the role. This change will not be easy to make for everyone. What’s worse, there isn’t enough tooling built to support these behaviors. Security vendors have spent decades figuring out the best UX for triaging results (and we haven’t perfected it), but no one knows what the best UX for “providing context” is. Defining Security Standards and Security Workflows used to be something you did once a year. Now things have to happen very quickly. This change will bring collateral damage. Depending on the organizational context, some companies may have already made this change, while others may take many years to do so. If you are taking on a new role in AppSec, I’d urge you to understand where on the spectrum of this change the team lies and if that is a good fit for you. To be clear, I don’t think of this change as a simple “maturity curve”. It’s not necessary that teams that haven’t adapted this are less mature (although that’s one possible explanation); it may also be an indication of how software is built in the company, what industry the company belongs to (some industries will take longer to undergo an AI transformation, and rightly so).

Where are you on the Spectrum?

Image presented at an internal Seezo meeting to emphasize where we lie on the “AI Spectrum “. Your exact position does not matter, but it needs to align with your organization.

In an internal meeting at Seezo, I half-joked that we need to be all on the same range of the “AI adoption spectrum” (see below). Irrespective of where you lie on the spectrum, it’s important to work with a team that is adjacent to your position. If you are an AI Skeptic in an AI-techbro team, you are gonna struggle. If you are cautiously optimistic about AI, but your company won’t use it until the “technology is mature”, you are gonna be frustrated.

That’s it for today. Does the Carpenter v/s Gardener analogy land, or am I being crazy by mapping AI to the one book I read many years ago? Are there other frameworks that help you navigate this crazy change? Hit me up! You can drop me a message on Twitter (or whatever it is called these days), LinkedIn, or email. I am also the co-founder of Seezo. We help companies automate security design reviews at scale. Check us out if that’s your thing :) If you find this newsletter useful, share it with a friend, colleague, or on social media.

VulnVibes: Building an AI Agent That Reasons Across Microservices to Find Real Vulnerabilities

Anshuman Bhartiya — Mon, 16 Mar 2026 22:25:15 GMT

Disclaimer: This is a cross post from my tech blog, co-authored by my personal AI assistant Sage.

Background Context

Picture this: you’re reviewing a pull request. A developer on your team has added a new API endpoint that fetches content from a URL the user provides. There’s even a role check — only admins can use it. Looks reasonable, right?

But here’s the thing — that endpoint runs inside a Docker container, on the same network as your authentication service, your database, and your internal admin tools. An attacker who gets admin access could point that URL at http://auth-service:3001/ and read your internal service responses. Or hit the cloud metadata endpoint at 169.254.169.254 and grab your AWS credentials.

The code review in the PR diff looks fine. The vulnerability is invisible unless you also check the Docker Compose file in your infra repo, the nginx config in your gateway repo, and the network topology that ties everything together.

This is the fundamental problem with how we do security reviews in microservice architectures: the vulnerability lives in the gaps between services, not in any single repo.

I’ve seen this problem time and again and haven’t come across a single SAST (AI native or traditional) platform that can reliably help tackle this at scale. If there is any, please let me know!

So, I’ve been tinkering with a few different approaches. And, I have a prototype agent built that is not production ready by any means but good enough to demonstrate the problem (with a lab I’ve built) and a high level solution of how it can potentially be solved.

If this piques your interest, continue reading.

Introducing VulnVibes

Everything is a “vibe” these days. So, keeping with the SecureVibes naming theme, I’m calling this one VulnVibes 😄

VulnVibes is an AI-powered agent that analyzes pull requests for security vulnerabilities — and its superpower is that it doesn’t just look at the PR’s repo. It searches across your entire GitHub organization to understand your architecture, verify what security controls actually exist, and determine whether a suspicious code change is a real vulnerability or a false alarm.

For a more technical explanation, VulnVibes is an open-source CLI that takes a GitHub PR URL, threat models the changes, and then validates each threat by investigating code across multiple repos in your org. I demo it below against three real PRs and show it catching an SSRF vulnerability, making a nuanced judgment call on a CORS change, and correctly ignoring a safe code refactor.

It’s a concept tool — fully vibecoded — but it demonstrates something important: AI agents can reason across repository boundaries the way a human security engineer does.

Why Current Tools Don’t Work for Microservices

Let’s say your company runs a typical microservice architecture — an auth service, an API backend, a frontend app, and some infrastructure configs. Four repos, four teams, one system.

Now a developer opens a PR in the API repo. They’re adding a new feature. A traditional SAST scanner (think Semgrep, CodeQL, or similar) will analyze that PR against the code in that repo only. If the code looks suspicious — say, it makes an HTTP request using user input — the scanner flags it.

But is it actually exploitable? To answer that, you need context that lives outside that repo:

Is there a Web Application Firewall (WAF) that inspects request bodies before they reach the API? (Check the infra repo.)
Are internal services reachable from this container? (Check the Docker Compose file.)
Does the API gateway add any security headers or validate tokens? (Check the nginx config.)
What sensitive data exists on those internal services? (Check the auth service repo.)

Not a lot of SAST tools — even the AI-powered ones — can answer these questions. They operate within repository boundaries. They literally can’t see the other repos.

What does a human security engineer do? They open four browser tabs, read the nginx config, trace the request flow, check Docker networking, and mentally piece together whether the attack path is viable. It takes 30 minutes to an hour for a single PR, if they’re thorough.

VulnVibes automates that entire process. It has access to your GitHub org, can read files from any repo, search for patterns across the codebase, and reason about whether a threat is real based on the full architectural context.

How VulnVibes Works

The analysis happens in two stages:

Stage 1: Threat Modeling (The Quick Filter)

When you point VulnVibes at a PR, the first thing it does is fetch the diff and ask: “Is there anything security-relevant here?”

It produces a structured threat model:

What changed? A plain-English summary of the PR
What could go wrong? Specific threats, each tagged with a CWE (a standardized vulnerability classification)
What do we need to verify? A list of investigation questions for each threat
Which investigation skills should we use? Matched to specific testing methodologies (SSRF testing, auth testing, etc.)

If Stage 1 finds nothing security-relevant — like a pure code refactor — it stops here. Done in under a minute, minimal cost. No wasted effort.

If it does find something worth investigating, it moves to Stage 2.

Stage 2: Cross-Repo Investigation (The Deep Dive)

This is where VulnVibes is different from everything else I’ve seen.

For each threat identified in Stage 1, the agent performs a full investigation. It doesn’t just look at the PR — it goes hunting across the organization:

Reads the PR code in full context — not just the diff, but the entire file and related files
Searches across other repos — looking for infrastructure configs, middleware, security controls
Checks the infrastructure layer — Docker networking, nginx configs, deployment files, environment variables
Follows a structured methodology — each vulnerability type has its own investigation playbook
Produces a verdict with a full reasoning chain — TRUE_POSITIVE (real vulnerability), FALSE_POSITIVE (looks bad but isn’t), or NO_SKILL_AVAILABLE (can’t test this type)

Every verdict comes with a confidence score, a risk level, and a step-by-step explanation of how the agent reached its conclusion. You can read the reasoning and decide whether you agree.

The Demo: Three PRs, Three Outcomes

To demonstrate this, I set up a test environment: microvibes-lab, a GitHub organization with four microservices that form a document management system.

Services in the lab

auth-service (Node.js): Handles login and issues JWT tokens
doc-api (Python/FastAPI): Document storage with role-based access
frontend-app (Next.js): The web UI
infra-ops (Nginx + Docker): API gateway and infrastructure configs

It’s a realistic setup — JWT authentication shared between services, nginx routing requests, everything running on a Docker network, role-based access control (admins see everything, staff see only public documents).

I ran VulnVibes against three different PRs to show three different outcomes. Here’s a video walkthrough of all three cases:

Case 1: Catching a Real Vulnerability

PR: doc-api#13 — Add document import from URL feature

What the PR Does

A developer adds a new endpoint that lets admins import documents from external URLs. Here’s the code:

@app.post("/documents/import")
def import_document(url: str, user: dict = Depends(get_current_user)):
    if user["role"] != "sys_admin":
        raise HTTPException(status_code=403, detail="Admin access required")
    response = requests.get(url)
    return {"content": response.text, "status_code": response.status_code}

Nine lines of code. On the surface, it looks reasonable — there’s a role check ensuring only admins can use it. A code reviewer might glance at this and move on.

What’s Actually Wrong

This is a classic Server-Side Request Forgery (SSRF) vulnerability. In plain English: the server is making an HTTP request to whatever URL the user provides. The user says “fetch this URL,” and the server obediently does it.

Why is that dangerous? Because the server can reach things the user can’t. It’s sitting inside a Docker network with direct access to internal services. An attacker could tell it to fetch:

http://auth-service:3001/health — to probe internal services
http://169.254.169.254/latest/meta-data/ — to steal cloud credentials (a very common attack in AWS environments)
Any internal service on the Docker network that’s not exposed to the internet

And the response comes straight back to the attacker, unfiltered.

What VulnVibes Did

Stage 1 (~50 seconds): Identified the core threat — SSRF via unrestricted URL fetch.

Stage 2 is where it gets interesting. Watch where the agent went to validate the SSRF:

✅ Read main.py in full — confirmed there’s zero URL validation. No allowlist, no blocklist, no scheme restriction.
✅ Read docker-compose.yml from the infra-ops repo — confirmed all four services share a flat Docker network. Every service can reach every other service by hostname.
✅ Read nginx.conf from the infra-ops repo — confirmed nginx does nothing but route traffic. No WAF, no request body inspection, no URL filtering.
✅ Checked the Dockerfile — standard Python image, no network restrictions.
✅ Checked requirements.txt on the PR branch — no URL validation libraries installed.

The agent traced the full attack path across three different repos and concluded:

“doc-api can reach auth-service:3001, frontend-app:3000, gateway:80 directly via Docker DNS. No network policies restrict egress.”

Verdict: TRUE POSITIVE — HIGH (confidence 10/10)

🎯 Overall Verdict: TRUE_POSITIVE
⚠️  TRUE POSITIVE - Security vulnerability confirmed!

   1. SSRF via unrestricted URL fetch — HIGH (10/10)

   Duration: 134 seconds | Cost: $0.14

A traditional SAST tool could flag requests.get(url) as a potential SSRF. But it couldn’t tell you whether internal services are actually reachable, whether nginx adds any protection, or whether Docker networking enables the attack. VulnVibes answered all of those questions by reading files from repos the PR author never touched.

Case 2: The Nuanced Judgment Call

PR: auth-service#10 — Enable CORS credentials for cross-origin requests

What the PR Does

A developer updates the CORS (Cross-Origin Resource Sharing) configuration:

// Before: default CORS (allow everything, no credentials)
app.use(cors());

// After: reflect any origin + allow credentials
app.use(cors({
    origin: true,
    credentials: true
}));

Why This Looks Scary

If you’ve read any web security guide, this combination is a red flag. origin: true means the server will accept requests from any website. credentials: true means the browser will include cookies with those requests. Together, this is the most permissive CORS policy possible.

In a typical web app that uses cookies for authentication, this would be a serious vulnerability. A malicious website could make requests to your API on behalf of a logged-in user, read the responses, and steal session data.

Every security scanner would flag this immediately. And in most cases, they’d be right.

But Is It Actually Exploitable Here?

This is where VulnVibes earns its keep. The agent didn’t just pattern-match on “permissive CORS = bad.” It went investigating:

Searched the entire org for cookie usage — Set-Cookie, res.cookie(), session middleware — found nothing
Read the frontend app source code — discovered authentication uses localStorage + Authorization: Bearer headers, not cookies
Searched for withCredentials or credentials: 'include' patterns — found nothing
Checked the auth-service dependencies — no session middleware installed
Checked nginx — no cookie handling

The agent’s key insight:

“Auth is entirely header-based using localStorage + Authorization Bearer. No cookies are set anywhere in the codebase. The credentials: true flag has no meaningful effect since no cookies exist.”

The CORS configuration looks terrible in isolation. But in this specific architecture, the classic attack doesn’t work because the app doesn’t use cookies at all. An attacker’s website can’t steal what doesn’t exist.

Verdict: FALSE POSITIVE — LOW risk (confidence 8/10)

VulnVibes correctly identified that the CORS configuration, while technically permissive, is not exploitable in this architecture:

🎯 Overall Verdict: FALSE_POSITIVE
✓  FALSE POSITIVE - No security vulnerability found

📊 Investigation Results:
   1. Permissive CORS — FALSE_POSITIVE (8/10), Risk: LOW

   Duration: 170 seconds | Cost: $0.16

This is exactly the kind of analysis I want from a triage tool. It didn’t blindly flag it as critical — which is what every pattern-matching scanner would do. Instead, it investigated the architecture, confirmed header-based auth, found zero cookie usage across the entire org, and concluded the CORS change isn’t exploitable. A human security engineer would reach the same conclusion — but only after 30+ minutes of reading code across four repos.

Case 3: Knowing When to Stay Quiet

PR: auth-service#12 — Refactor auth code into helper functions

What the PR Does

A developer refactors the JWT token generation logic — extracting inline code into helper functions:

// Before: JWT signing inline in the login handler
const token = jwt.sign(
    { username: user.username, role: user.role, name: user.name },
    JWT_SECRET,
    { expiresIn: '1h' }
);
return res.json({ token, user: { username: user.username, ... } });

// After: extracted to a reusable function
function generateToken(user) {
    return jwt.sign(
        { username: user.username, role: user.role, name: user.name },
        JWT_SECRET,
        { expiresIn: '1h' }
    );
}
const token = generateToken(user);
return res.json({ token, user: formatUserResponse(user) });

Why a Naive Tool Would Flag This

This PR modifies authentication code. It touches JWT token generation — the most security-sensitive part of the entire system. A pattern-matching scanner might flag it because “authentication code changed” or “JWT signing logic modified.”

What VulnVibes Did

VulnVibes looked at the diff, compared the before and after, and concluded in 27 seconds:

“This PR is a straightforward code refactoring. The JWT payload, signing secret, and expiration are unchanged. No new endpoints, routes, dependencies, or security-relevant behavior is introduced.”

Zero threats identified. No Stage 2 investigation needed.

ℹ️  No security-relevant changes detected in this PR.
   Duration: 27s

This is just as important as catching real vulnerabilities. A tool that flags everything is just as useless as one that catches nothing — because it trains developers to ignore the alerts. VulnVibes understood that despite touching JWT code, the behavior didn’t change. It saved everyone’s time.

The Scorecard

Results at a glance

SSRF — doc-api#13: Looked like a reasonable new feature, but VulnVibes confirmed a real vulnerability across 3 repos. Time: 2.2 min. Cost: $0.14
CORS — auth-service#10: Looked scary, but VulnVibes determined it was a false positive because auth is header-based. Time: 2.8 min. Cost: $0.16
Refactor — auth-service#12: Security-sensitive code changed, but it was a safe refactor and needed no deeper investigation. Time: 27 sec. Cost: minimal

The Bigger Picture

Let me be upfront: VulnVibes is a concept tool, not a production-grade agent. It’s fully vibecoded — I built it to demonstrate an idea, not to replace your security team.

The idea is this: context matters enormously in security, and AI agents can now gather and reason about that context across repository boundaries.

If you work at an organization with a microservice architecture and your existing SAST tools are either missing real vulnerabilities or drowning you in false positives, the problem might not be the scanner itself. The problem might be that the scanner can only see one repo at a time.

VulnVibes shows that it’s possible to build an agent that:

Reads the PR diff and identifies what could go wrong
Searches across your entire org to understand the actual architecture
Checks infrastructure configs to verify what security controls exist (or don’t)
Makes a calibrated judgment — not just “this pattern is bad” but “this pattern is bad and there are no compensating controls at any layer”

The specific implementation matters less than the concept. You could build something similar using any LLM with tool use, a GitHub API integration, and some structured investigation playbooks. The key insight is giving the agent access to cross-repo context and teaching it to verify assumptions against the actual infrastructure.

Getting Started

If you want to try it yourself, VulnVibes is open source:

# Install
pip install -e .

# Analyze a PR
vulnvibes pr analyze https://github.com/your-org/your-repo/pull/123 \
  --github-token $GITHUB_TOKEN \
  --model sonnet \
  --org your-org \
  --context-file context.md

You can optionally provide a context file that tells VulnVibes about your architecture:

---
related_repos:
  - name: infra-ops
    purpose: nginx configs, Docker Compose, k8s manifests
  - name: auth-service
    purpose: JWT authentication, user management
---

# Architecture Overview
Microservices on Docker with nginx reverse proxy.
JWT-based auth, tokens stored in localStorage.

If you have a Claude Max or Pro subscription and you’re authenticated via Claude Code or Claude CLI, VulnVibes works with OAuth — no API key needed.

The test environment at microvibes-lab has 17 PRs with known expected outcomes if you want to benchmark it yourself.

If you’re interested in trying it out, building something similar, or just want to talk about AI-powered security tooling — feel free to reach out!

GitHub: anshumanbh/vulnvibes
LinkedIn: @anshumanbhartiya
Blog: anshuman.ai

Until next time, ciao! 👋

Ep 37: The Future of Security Testing in an AI-Driven World with Jason Haddix

Sandesh Mysore Anand — Wed, 11 Mar 2026 08:10:35 GMT

In this episode, Jason Haddix (CEO of Arcanum Information Security and creator of the Bug Hunter’s Methodology) joins us to examine how AI is changing penetration testing and security research. He explains that while AI agents can automate reconnaissance, code analysis, and parts of vulnerability discovery, meaningful results still depend on human expertise, methodology, and context engineering.

The conversation explores how AI is shifting the entry path for new security practitioners, why deep research and critical thinking remain essential skills, and how experienced testers are embedding their knowledge into agent workflows using tools like Claude Code. Jason also discusses practical experimentation with AI assistants such as OpenClaw, including prompt-injection defenses, guardrails, and the operational risks of running autonomous systems.

The episode also addresses the growing debate around AI-generated code and AI-driven vulnerability discovery, highlighting the difference between marketing claims and real-world results. It closes with a discussion on why the industry needs better benchmarks and evaluation methods to measure whether AI security tools actually find meaningful vulnerabilities.

00:00–02:14 — Introduction to Jason Haddix and how his journey from bug hunter to Arcanum founder shapes his perspective on AI in security

02:14–08:00 — How AI agents are beginning to automate penetration testing workflows while still relying on expert methodology

08:00–10:45 — Why human expertise remains critical even as security automation improves

10:45–17:10 — How AI is changing the learning curve for the next generation of pentesters

17:10–25:27 — How agent frameworks and skills are transforming security tool building

25:27–35:41 — Security risks and defenses when running AI assistants like OpenClaw

35:41–40:32 — The rise of AI-powered personal assistants for research and security workflows

40:32–42:55 — Why the cybersecurity community is rapidly adopting AI tools

42:55–46:42 — How AI improves security coverage and turnaround time at scale

46:42–50:31 — Why newer models like Opus 4.5 unlocked practical AI security workflows

50:31–56:48 — The debate on whether AI should generate secure code or detect vulnerabilities

56:48–01:01:18 — Why AI security needs better evaluation benchmarks and real-world testbeds

Tune in for a deep dive!

Connect with Jason Haddix:

LinkedIn: ⁠⁠⁠⁠https://www.linkedin.com/in/jhaddix/

Connect with Anshuman:

LinkedIn: ⁠⁠⁠⁠anshumanbhartiya

X: ⁠⁠⁠⁠https://x.com/anshuman_bh

Website: ⁠⁠⁠⁠https://anshumanbhartiya.com/

⁠⁠⁠⁠Instagram: anshuman.bhartiya

Connect with Sandesh:

LinkedIn: ⁠⁠⁠⁠ anandsandesh

X: ⁠⁠⁠⁠https://x.com/JubbaOnJeans

Ep 36: Discussing AI's Current State of Affairs

Sandesh Mysore Anand — Mon, 02 Mar 2026 06:16:25 GMT

In this episode, we examine what is shifting in AI, AppSec, and product security and what remains fundamentally the same.

For years, application security has operated on a familiar model: siloed reviews, tool-driven findings, and periodic assessments that struggle to keep pace with modern development. AI doesn’t eliminate those pressures, it amplifies them. Code is generated faster, systems are more interconnected, and the surface area of change expands weekly.

The conversation explores agent-based workflows through tools like OpenClaw, not as novelty, but as a signal of a broader shift: from manually operating tools to orchestrating fleets of agents. As AI interfaces move from chat windows to terminals to messaging environments, security teams must reconsider where workflows live and how context is preserved across them.

For decades, AppSec has struggled to build a reliable understanding of what systems exist and how they connect. Large language models may finally make it possible to construct living maps of components, data flows, and trust boundaries enabling assessments that talk to each other instead of existing in isolation.

The discussion also revisits threat modeling, not as a compliance artifact, but as a foundation for system-wide reasoning. If AI can automate baseline coverage and reduce repetitive toil, security teams may return to their original purpose: high-leverage risk judgment on critical systems. This leads to a broader debate whether AppSec as a distinct function evolves, shrinks, or dissolves into engineering itself and what the enduring “maker–checker” model of risk management demands in an AI-native world.

Finally, the episode reflects on the role of large AI labs in security: the gap between ambitious claims and shipped products, and what that means for founders and security leaders navigating change.

00:00–02:15 — Why this is a no-guest episode & what’s changed since last year

02:15–06:30 — AI co-authoring, productivity gains, and writing workflows

06:30–10:20 — OpenClaw architecture, agent risks, and prompt injection realities

10:20–14:00 — The shifting UI of AI: chat → terminal → messaging agents

14:00–18:30 — Agent orchestration vs siloed security tooling

18:30–23:00 — Context graphs and assessments that “talk” to each other

23:00–27:30 — Threat modeling’s evolution and system-wide visibility

27:30–31:00 — Why inventory is still AppSec’s hardest problem

31:00–34:30 — Personal AI stacks: Obsidian, memory layers, and query tools

34:30–37:30 — Open source in the age of AI-generated PR spam

37:30–40:00 — AI labs: what they ship vs what they say

40:00–44:00 — Will AppSec disappear? A serious debate

44:00–48:00 — Maker–checker risk models in an AI-driven org

48:00–51:00 — Where AI replaces toil — and where humans stay critical

51:00–End — 2026 predictions for AI security and product security

Tune in for a deep dive!

Connect with Anshuman:

LinkedIn: ⁠⁠⁠⁠anshumanbhartiya

X: ⁠⁠⁠⁠https://x.com/anshuman_bh

Website: ⁠⁠⁠⁠https://anshumanbhartiya.com/

⁠⁠⁠⁠Instagram: anshuman.bhartiya

Connect with Sandesh:

LinkedIn: ⁠⁠⁠⁠ anandsandesh

X: ⁠⁠⁠⁠https://x.com/JubbaOnJeans

Ep 35: Exploring Security After Determinism with Jens Ernstberger

Sandesh Mysore Anand — Mon, 16 Feb 2026 07:44:09 GMT

In this episode, we sit down with Jens to explore why AI agents fundamentally break traditional security assumptions, from API keys and browser sessions to composability and access control.

Drawing parallels to DeFi exploits and smart contract failures, he explains why agent identity, short-lived delegated authorization, and zero trust aren’t optional add-ons, but the foundation for safely running autonomous systems.

We also dive into context compression as both a performance and security challenge, the real difference between MCP and skills, and a future where humans may stop reviewing code altogether. As agents become the primary actors on the internet, even writing itself begins to change in an AI-scraped world.

If agents are non-deterministic by design, the real question becomes: where do we reintroduce determinism?

00:00 — AI agents as the next security reset moment. History repeating: automation + composability = new attack surfaces

03:25 — Challenges of context compression in AI

07:39 — Access control in a non-deterministic system and compaction issues

11:22 — MCP vs skills: horizontal infrastructure meets vertical execution logic

18:06 — Agent identity and security practices. Static credentials collapse under autonomous agent behavior

30:06 — The future of coding with AI agents

31:31 — DeFi attacks, composability issues, and how non-determinism multiplies risk

35:14 — Writing for humans vs writing for LLMs. Content, authenticity, and the economics of scraping

44:42 — Transition from academia to startup founder

Tune in for a deep dive!

Connect with Jens Ernstberger:

Website: https://ernstberger.xyz/

LinkedIn: https://www.linkedin.com/in/jens-ernstberger-phd-96b0ba14a/

Connect with Anshuman:

LinkedIn: ⁠⁠⁠⁠⁠⁠anshumanbhartiya⁠⁠

X: ⁠⁠⁠⁠⁠⁠https://x.com/anshuman_bh⁠⁠

Website: ⁠⁠⁠⁠⁠⁠https://anshumanbhartiya.com/⁠⁠

⁠⁠⁠⁠Instagram: ⁠⁠anshuman.bhartiya⁠

⁠⁠⁠Connect with Sandesh:

LinkedIn: ⁠⁠⁠⁠⁠⁠anandsandesh⁠⁠

X: ⁠⁠⁠⁠⁠⁠https://x.com/JubbaOnJeans

Day in the Life: Building a Prototype with My AI Agent

Anshuman Bhartiya — Fri, 13 Feb 2026 22:42:40 GMT

Disclaimer: This is a cross post from my tech blog, co-authored by my personal AI assistant Sage.

Series: Building with Sage — This is Part 1 of an ongoing series about running a personal AI agent with a security-first mindset. I’ll share real workflows, real failures, and the security controls that keep things from going sideways.

Introduction

I asked my AI agent to build me a dashboard. Twenty minutes later, it was live on my network. A friend could pull it up from his laptop.

That sentence should make any security person uncomfortable. It makes me uncomfortable — and I’m the one who built the setup.

Here’s the thing: I can’t stop using it. The productivity boost is too real. So instead of shutting it down, I’ve been iteratively hardening the security posture. Defense in depth, sandboxed execution, network controls, manual approval gates — the works.

This post walks through a real session from today: what I asked for, what failed, how the agent pivoted, and the security controls that were quietly doing their job the entire time. Think of it as a “day in the life” of building with an AI agent, told through the lens of someone who spends his day job thinking about application security.

So, let’s get started!

The Setup: Meet Sage

Sage is my personal AI agent, running on OpenClaw (formerly Clawdbot) on a Mac Mini in my home office. I talk to it via Telegram and Discord. It manages my todos, tracks my schedule, writes code, runs security audits, and — apparently — builds full-stack prototypes on demand.

I wrote about my initial setup in Introducing Sage: My Personal AI Assistant That Actually Works. Things have evolved since then — Sage now runs on a dedicated Mac Mini instead of my daily driver, with significantly more security controls. This post covers the current architecture.

Here’s the architecture at a high level:

User → Messaging (Discord/Telegram) → OpenClaw Gateway → Main Agent (Sage) → Subagents/VMs/Tools

The key design principle: Sage is the trust boundary. Anything that could contain attacker-controlled content gets delegated to a sandboxed subagent. The main agent stays clean.

Let me break down the security controls at each layer:

Messaging Layer: Both Discord and Telegram use allowlist group policies. Only I can talk to Sage. No public access, no email hooks (that’s a whole other attack surface).
Gateway: Binds to loopback only — not reachable from the network. Token auth. Exec security set to allowlist.
Main Agent: Runs unsandboxed on the host (yes, this keeps me up at night — more on that later). Has access to host tools, file system, and my messaging channels.
Subagents: Reader Agent, Coding Agent, Research Agent — each runs in a Docker sandbox with explicit tool allow/deny lists. The Coding Agent can exec and write but can’t message, gateway, or cron. By design.
Host: macOS firewall enabled. Sensitive files (openclaw.json, identity keys) locked to 600 permissions. Daily automated security audits. LuLu firewall for inbound and outbound connection control.

The Ask

A friend of mine runs an online business. He’s got VAs (virtual assistants) managing different departments — email support, order fulfillment, social media responses. They report to him, but he knows the reporting is inconsistent.

He asked me: “Can AI give me better reporting and oversight than my VAs self-reporting?”

I told Sage: build me a business operations dashboard prototype. Three panels — email ops, order fulfillment, Facebook comments — with an AI morning briefing at the top. Dark theme, modern SaaS look, mock data that tells a story. Something I can screenshot and send to my friend in under an hour.

The Build: What Actually Happened

This is where it gets interesting — and where the security controls actually did their job.

Step 1: Sage Writes a Spec

First thing Sage did was write a detailed spec to projects/consulting/biz-dashboard/SPEC.md. Stack choice (Next.js 14, Tailwind, Recharts), component breakdown, mock data structure, design requirements. Good practice — document before you build.

💡 Security Improvement Idea: Security-by-Default Specs
What if the spec-writing step automatically included security requirements? Imagine a set of rules in the agent’s config that inject security defaults into every project spec — things like:
Auth required — no unauthenticated endpoints by default
No secrets in code — API keys via environment variables only
HTTPS only — no plaintext HTTP in production
Input validation — sanitize all user inputs
Dependency audit — run npm audit before first commit
CORS policy — restrictive by default, explicitly opened
This could be a section in AGENTS.md or a dedicated security-defaults.md that the agent reads whenever it writes a spec. The prototype today didn’t need most of these (mock data, no auth, no real APIs), but for production builds this would catch security gaps before a single line of code is written. Shift left, but automated.

Step 2: Coding Agent Fails (And That’s a Good Thing)

Sage’s default behavior for code scaffolding is to delegate to the Coding Agent — a sandboxed subagent that runs in Docker. This is the secure path: the Coding Agent has restricted tool access and can’t touch the host system.

But I had asked Sage to build inside an OrbStack VM (my preferred way to spin up ephemeral Linux environments). Here’s what happened:

❌ Coding Agent: "I can't complete this task from the sandbox environment."
   - Can't read the spec file (outside sandbox mount)
   - Can't run `orb` commands (host-only tool)

This is the sandbox working correctly. The Coding Agent runs in Docker with tool policy enforcement. It can’t reach the host filesystem, can’t run host CLI tools, can’t escape its container. The fact that it failed is a security win — it means the isolation is real.

The tradeoff is clear:

Docker sandbox = weak filesystem isolation, strong tool policy enforcement
OrbStack VM = strong filesystem isolation, no tool policy enforcement

Tool restrictions matter more than filesystem isolation. A sandboxed agent that can’t call gateway or message is safer than a VM-jailed agent with full shell access.

💡 Security Improvement Idea: VM-Aware Sandboxing
The coding agent failed because Docker sandbox can’t reach OrbStack VMs. That’s correct behavior — but it means VM-targeted work falls back to the unsandboxed main agent. The ideal would be:
Per-agent exec routing — route a sandboxed agent’s commands to a specific VM instead of Docker
Tool policy enforcement inside VMs — the agent runs in the VM but still can’t call gateway, message, or cron
Ephemeral VM sandboxes — spin up a VM per subagent session, destroy on completion
This would give you strong filesystem isolation (VM) AND strong tool policy enforcement (sandbox rules). Right now you get one or the other. I’ve filed this as a feature request with the OpenClaw project — if this resonates with you, give it an upvote!

Step 3: Sage Pivots — Builds It Directly

Since the Coding Agent couldn’t bridge into the VM, Sage built the dashboard directly. As the main agent, it has host access and can run orb commands.

# Spin up ephemeral VM (~30 seconds)
./assets/scripts/create-dev-vm.sh biz-dashboard

# Scaffold Next.js project inside VM
orb run -m biz-dashboard bash -c 'npx create-next-app@14 biz-dashboard ...'

# Install deps
orb run -m biz-dashboard bash -c 'cd ~/biz-dashboard && npm install recharts lucide-react ...'

Note: In the actual session, Sage ran these as root (-u root). That’s a bad habit — even in an ephemeral VM, principle of least privilege applies. The dev VM script now creates a non-root user by default. One of those “it’s just a demo” shortcuts that shouldn’t become muscle memory.

Sage wrote all the source files (mock data, components, pages) to the host workspace, then copied them into the VM via OrbStack’s filesystem mount. Started the dev server bound to 0.0.0.0:3001.

Result: a fully functional dashboard with three department panels, trend charts, status indicators, and an AI morning briefing card — running inside an ephemeral VM.

💡 Security Improvement Idea: Least Privilege in Ephemeral VMs
Even throwaway VMs should follow least privilege:
Non-root by default — the dev VM script should create and use a dedicated user, only escalating to root for package installation
No sudo without logging — if root is needed, log every sudo invocation to the audit trail
Read-only host mounts — OrbStack mounts the host home directory into the VM by default. This should be read-only or disabled entirely for untrusted workloads
Network restrictions — ephemeral VMs should have outbound-only access scoped to what they need (npm registry, GitHub), not full internet access
The mindset: treat every VM like it could be compromised, even if you just created it 30 seconds ago.

Step 4: Exposing the Dashboard (And LuLu Says No)

I was on my MacBook, not the Mac Mini. So I needed remote access to the dashboard running in the VM.

Sage set up a Python TCP proxy on the Mac Mini (port 3333 → VM port 3001) and tried to expose it via the Tailscale IP. I hit the URL and… nothing.

LuLu was blocking incoming connections on the Tailscale interface.

This is exactly what defense in depth looks like in practice. Even though:

The VM was on a private OrbStack network
Tailscale provides encrypted mesh networking
The Mac Mini has macOS firewall enabled

…there was still another layer — LuLu — requiring manual approval for any new network connection. I had to physically log into the Mac Mini and allow the incoming connection.

Sage then set up Tailscale Serve to proxy https://.ts.net → localhost → VM. Once I allowed the connection, the dashboard was accessible from my MacBook over an encrypted Tailscale tunnel.

The fact that I had to manually intervene is the point. Automated convenience is great until it’s an attacker automating the convenience. Manual gates at critical junctures are worth the friction.

💡 Security Improvement Idea: Network Exposure & Agent-Aware Firewalls
The network exposure problem: The agent was able to set up a TCP proxy and Tailscale Serve without asking me first. That’s convenient — but it means the agent can autonomously expose services to the network. LuLu caught it at the firewall level, but what if it hadn’t been installed?
A better approach:
Network exposure as a privileged action — require explicit user approval before binding to non-loopback interfaces, setting up proxies, or enabling Tailscale Serve
Exposure audit log — every time the agent opens a port or creates a proxy, log it to a security audit trail with timestamp, port, destination, and reason
Auto-teardown timers — any exposed service auto-shuts down after N minutes unless explicitly extended. No forgotten demo servers running for weeks.
Tailscale ACLs — use Tailscale’s ACL policies to restrict which devices can reach specific ports, rather than relying solely on LuLu
The layered defense (Tailscale + LuLu + macOS firewall) worked today, but it was luck-of-the-stack, not policy-enforced.
Taking it further — agent-aware firewalls: LuLu blocked the incoming connection, which is great. But LuLu doesn’t know why the connection was being made — it just sees a process trying to listen on a port. Imagine if:
LuLu rules could be agent-aware — “allow connections initiated by user action, block connections initiated autonomously by agents”
OpenClaw could integrate with LuLu’s API — the agent requests network access, LuLu prompts the user with context (“Sage wants to expose port 3333 to your tailnet for a dashboard demo. Allow?”)
Firewall rules as part of the agent’s tool policy — just like exec has an allowlist, network exposure could have one too
This turns the firewall from a blunt “allow/deny per app” into a context-aware security control that understands the agent’s intent.

The Dashboard

Here’s what the final prototype looked like:

Three panels monitoring email operations, order fulfillment, and Facebook social media responses. Each panel has stat cards with red/yellow/green status indicators, 7-day trend charts, and key metrics. The AI morning briefing at the top summarizes what needs attention in plain English.

All mock data — but realistic enough to demonstrate the concept.
Total time from request to live dashboard: ~20 minutes.

What’s Working: Security Wins

Let me highlight the controls that actually did their job today:

1. Docker Sandbox Caught a Boundary Violation
The Coding Agent couldn’t escape its container to reach the VM. This wasn’t a bug — it was the sandbox working as designed. Tool policy enforcement prevented the subagent from accessing host infrastructure.

2. LuLu as Defense in Depth
Even with Tailscale and macOS firewall, LuLu added another layer that required manual approval. An agent autonomously exposing a service to the network got stopped by a human-in-the-loop control.

3. Ephemeral VMs
The dashboard VM existed for exactly as long as needed. When we were done, orb delete biz-dashboard — gone. No persistent attack surface from demo infrastructure sitting around.

4. Tailscale as Network Boundary
The dashboard was never on the public internet. Tailscale’s encrypted mesh meant only my devices on my tailnet could reach it. No port forwarding, no public IPs.

5. Allowlist Policies Everywhere
Telegram and Discord both use allowlist group policies. Exec security is set to allowlist. Only explicitly approved channels and commands work.

6. Daily Automated Audits
Every morning at 8am, Sage runs a security audit — checks firewall state, file permissions, gateway config, world-readable files, and LuLu rule changes. Any anomaly triggers an immediate Telegram alert. Today’s audit: all clear.

What Keeps Me Up at Night: Security Gaps

I’m not going to pretend this setup is bulletproof. Here’s what worries me:

1. Secrets Management
Right now, API keys sit in openclaw.json with 600 permissions. That’s… fine? But not great. If the main agent gets prompt-injected into running a cat on that file, those keys are gone.

I’m currently exploring nono.sh — a kernel-level sandbox that could provide just-in-time secrets injection. The idea: secrets never touch the filesystem. They’re injected into the process environment at runtime and the sandbox prevents exfiltration. Still experimenting. If you’ve got other approaches to agent secrets management, I’d love to hear them.

2. Main Agent Runs Unsandboxed
This is the big one. Sage (the main agent) runs directly on the Mac Mini with full host access. The subagents are sandboxed, but the main agent is the trust boundary itself — and it’s not sandboxed.

Why? Because the main agent needs to run orb commands, manage crons, send messages, read the filesystem. Sandboxing it would break most of its functionality. The mitigation is limiting what reaches the main agent: no email hooks, no web browsing, no processing untrusted content directly. All external content goes through sandboxed subagents first.

But if someone finds a way to inject through my Telegram or Discord messages… yeah. That’s the threat model gap.

3. Prompt Injection Surface
Every time the AI reads external content (URLs, emails, Facebook comments for the dashboard), there’s a prompt injection risk. The subagent architecture helps — the Reader Agent processes untrusted content in a Docker sandbox with no access to message, gateway, or cron. But the summarized output still flows back to the main agent.

A sophisticated enough injection could potentially survive the summarization step. I don’t have a great answer for this yet beyond “don’t let the main agent act on summarized content without human review.”

4. The OrbStack Filesystem Mount
OrbStack mounts the host home directory into VMs by default. That means a process inside the VM could potentially read host files through the mount. For ephemeral demo VMs this is acceptable risk, but for untrusted workloads I’d need to configure mount restrictions.

Lessons and Takeaways

1. Iterative hardening beats analysis paralysis. I could have spent months designing the perfect secure agent architecture before using it. Instead, I started using it and hardened as I went. Each incident or near-miss becomes a new control. The security posture today is dramatically better than day one.

2. Defense in depth actually works. LuLu catching the network exposure wasn’t planned for that specific scenario. But stacking controls — firewall + LuLu + Tailscale + manual approval — meant that even when one layer was permissive, another caught it.

3. Tool policy > filesystem isolation. The Coding Agent failing to reach the VM was a better security outcome than if it had succeeded. Preventing an agent from calling gateway restart or message send matters more than what files it can see.

4. Ephemeral infrastructure is underrated. Spin up a VM, do the work, tear it down. No lingering attack surface, no config drift, no “I forgot that demo server was still running.” Make infrastructure disposable by default.

5. Human-in-the-loop gates are features, not bugs. The LuLu approval prompt felt like friction in the moment. In retrospect, it’s exactly the kind of control that prevents autonomous agents from silently expanding their network access.

I’m genuinely interested in community feedback on this setup. What am I missing? What would you harden differently? What’s the threat model gap I’m not seeing?

Feel free to reach out to me with any comments/feedback:

LinkedIn: @anshumanbhartiya
GitHub: anshumanbh
Blog: anshuman.ai

Until next time, ciao! 🦉

Ep 34: Security at Scale in a Probabilistic World with Ankur Chakraborty

Sandesh Mysore Anand — Mon, 02 Feb 2026 07:31:22 GMT

In this episode, Ankur Chakraborty, Senior Director of Platform Security at Box, joins us to examine what security looks like when systems no longer behave the same way twice. Drawing from his experience across Google, Twitter, and Box, Ankur argues that while core security principles haven’t changed, the scale, speed, and uncertainty introduced by AI systems demand a fundamentally different approach.

For decades, security has relied on a comforting assumption: systems are predictable, and control flows are deterministic. Generative AI breaks that assumption. It introduces non-determinism and dramatically increases the speed and volume of change; security teams face a scaling problem that traditional workflows can’t keep up with.

We explore how AI can act as a force multiplier for defenders, boosting individual productivity and automating high-toil workflows, while also forcing a hard rethink of “human in the loop” models that add friction without real control.

The conversation goes deep into context engineering, decision traces, and explainability and why understanding why a system acted is becoming as important as what it did. We close by exploring how security leaders should evaluate tools in this new era: moving away from process-driven checklists toward outcome-based measures, and preparing for an industry on the brink of meaningful structural change.

00:00–02:49 — Introduction to AI security and Ankur’s platform-security journey

02:49–05:27 — What changes (and what doesn’t) in AI security fundamentals

05:27–09:18 — Scaling security in a probabilistic, AI-generated code world

09:18–10:30 — Embracing AI as defenders

10:30–13:46 — Productivity gains from LLMs for security engineers

13:46–20:06 — Human-in-the-loop vs autonomous agents in security workflows

20:06–22:25 — Context graphs, observability, and decision traces

22:25–32:01 — Explainability, mechanistic interpretability, and security trust

32:01–35:36 — How security teams evaluate tools, platforms, and outcomes

35:36–42:42 — Measuring security outcomes, velocity, and cost trade-offs

42:42–46:46 — False positives, false negatives, and revealed preferences

46:46–50:16 — LLMs as triage engines and force multipliers for security

50:16–52:51 — Underlying fears in the security industry

52:51–55:05 — Context engineering, platforms, and the future of security teams

Tune in for a deep dive!

Connect with Ankur Chakraborty:

LinkedIn: https://www.linkedin.com/in/ankurchakraborty/

Substack: https://machinesagainsthumanity.substack.com/

Connect with Anshuman:

LinkedIn: ⁠⁠⁠⁠⁠⁠anshumanbhartiya⁠⁠

X: ⁠⁠⁠⁠⁠⁠https://x.com/anshuman_bh⁠⁠

Website: ⁠⁠⁠⁠⁠⁠https://anshumanbhartiya.com/⁠⁠

⁠⁠⁠⁠Instagram: ⁠⁠anshuman.bhartiya⁠

⁠⁠⁠Connect with Sandesh:

LinkedIn: ⁠⁠⁠⁠⁠⁠anandsandesh⁠⁠

X: ⁠⁠⁠⁠⁠⁠https://x.com/JubbaOnJeans

Ep 33: The Future of Identity in AI Agents with Ian Livingstone

Sandesh Mysore Anand — Wed, 28 Jan 2026 07:33:31 GMT

In this episode, we sit down with Ian Livingstone to explore how AI is reshaping application security. The conversation focuses on one of the hardest emerging problems: agent identity. Ian breaks down why traditional identity and permission models fall apart when applied to non-deterministic AI agents, and what this means for access control, data security, and system design.

We also discuss where agent identity is headed, how insurance may play a role in managing AI-driven risk, and what security teams need to rethink as AI systems become active participants rather than passive components.

00:00–02:15 — Beyond AI hype: why security and agent identity matter

02:15–09:18 — Understanding identity in the age of AI agents

09:18–13:41 — Why service accounts and OAuth break down for agents

13:41–20:11 — Granular permissions, least privilege, and agent intent

20:11–25:55 — Security risks in agent workflows and prompt-driven systems

25:55–28:34 — Data security, IAM, and the agent exfiltration problem

28:34–30:47 — Non-determinism and rethinking how we secure systems

30:47–32:14 — The agent identity problem on the public internet

32:14–35:10 — Why the internet still lacks real application identity

35:10–39:12 — The future of authentication for agents and bots

39:12–40:28 — Emerging standards, AIUC, and insuring agents

40:28–43:09 — Liability, insurance, and accountability for autonomous systems

43:09–45:51 — How security roles evolve in an agent-native world

45:51–49:23 — Technical attack surfaces: MCPs, poisoned tools, and confusion

49:23–51:32 — Trust, contracts, and responsibility in software ecosystems

51:32–54:28 — Why AI adoption is top-down and what it means for security

Tune in for a deep dive!

Connect with Ian Livingstone:

Website: https://www.ianlivingstone.ca/

Twitter: https://x.com/ianlivingstone

Connect with Anshuman:

LinkedIn: ⁠⁠⁠⁠⁠⁠anshumanbhartiya⁠⁠

X: ⁠⁠⁠⁠⁠⁠https://x.com/anshuman_bh⁠⁠

Website: ⁠⁠⁠⁠⁠⁠https://anshumanbhartiya.com/⁠⁠

⁠⁠⁠⁠Instagram: ⁠⁠anshuman.bhartiya⁠

⁠⁠⁠Connect with Sandesh:

LinkedIn: ⁠⁠⁠⁠⁠⁠anandsandesh⁠⁠

X: ⁠⁠⁠⁠⁠⁠https://x.com/JubbaOnJeans

Edition 32: BigCo is building in AppSec, but it's too early to get excited

Sandesh Mysore Anand — Tue, 27 Jan 2026 08:44:17 GMT

Nano Banana’s summary of what this post is about :)

Subscribe now

Before we begin…

Happy New Year! As some of you may have noticed, we have made a few exciting changes to Boring AppSec. Nothing changes for this newsletter, but you can now access all episodes of the BoringAppSec Podcast here. We also have Anshuman, bring his sharp thoughts on AI & Security to the Boring AppSec Platform here. Finally, we have a Slack community where readers and authors of Boring AppSec hangout. Come join us if that’s your thing!

2025 was a year of breakneck speed in AI, but one trend mildly surprised me: Frontier labs and hyperscalers actively building AppSec tools.

After decades of yelling from the rooftops about AppSec's importance, it looks like the tech industry is finally paying attention. Over the holidays, I dug deep to understand what this means for our industry. For now, I think the real impact is not that we have better AppSec tools (we don’t), but it gives us a peek into what’s coming next.

Here are a few thoughts:

1. Most of what we saw in 2025 from BigCo was demoware.

Aardvark launched 3 months ago and is still in private beta. In that time, OpenAI has shipped multiple models, released many new versions of Codex, and much more. A few weeks before this, Anthropic launched the “security review” command within Claude Code and a companion GitHub Action to review PRs. An elegant solution on top of the mighty impressive Claude Code application. But security-review.md hasn’t been updated in 5 months. In that same window, Anthropic released multiple new models, took the code gen world by storm, and is threatening to do the same for non-engineers with Claude CoWork.

I am impressed by simplicity and the underlying framework behind Claude Code Security Review, but we haven’t seen a single update in 5 months

AWS’s Security Agent promises to automate Security Reviews, SAST, and Pen Testing. I tested a few of these tools and found them underwhelming compared to what these teams are capable of.

The Security Review agent looks for a grand total of 11 security controls

These companies have insanely talented teams. The effort on what’s shipped so far leads me to believe the goal was not to build world-class AppSec products, but to demonstrate capability. Show what’s possible with frontier models rather than grow revenue with AppSec tools.

2. This complicates things for AppSec teams.

If I had a nickel every time someone asked me, “But won’t Cursor replace AppSec?”, I’d be a rich man. AppSec teams are probably hearing the same from their CFOs: why spend $$ on SAST tools when Claude can do it? I hear you can just “vibe code” software now, why not build it in-house? Why go through procurement hell when AWS has a free option?

These are valid questions. But notice what happened: the burden of proof just shifted to the AppSec team. They now have to prove why a dedicated security vendor is better than the behemoths. I wouldn’t blame anyone for invoking the old “nobody gets fired for buying IBM” adage and giving in. Others will do the work to show these tools aren’t ready. Either way, AppSec teams are stuck with a bad trade-off: accept the demoware to keep the peace, or spend time fighting a battle they shouldn’t have to fight.

3. I don’t blame the labs for this.

LLMs are generating more code than ever. More code means more vulnerabilities. But it also means the bottleneck has shifted. Writing code is no longer the constraint; reviewing it is. Security reviews included. The labs know this, and they’re trying to get ahead of it.

This isn’t new. Every major technology shift creates security problems, and the companies closest to the shift usually take a first crack at solving them. Cloud created misconfiguration hell, so AWS built GuardDuty. LLMs are creating insecure code at scale and overwhelming review capacity, so the labs are building AppSec tools.

4. What does this mean for AppSec vendors?

Probably not as much as you’d think. GitHub has 100M+ developers, native workflow integration, and Microsoft’s backing. They’ve had GHAS for years. And yet Snyk and Semgrep are thriving. AWS built GuardDuty, and Wiz still became one of the fastest-growing security companies ever.

Why? Security isn’t a winner-take-all market. I don’t want to beat the platform v/s point-solution drum again, but history tells us both survive. And while it’s tempting to go “AI changes things”, I am not sure how.

5. 2026 may be different.

Even if their attempts in 2025 were feeble, there are signs the labs are getting serious. Anthropic recently hired a SentinelOne product executive to lead cybersecurity products. OpenAI has researchers working on Aardvark. Job listings hint at roadmaps with a higher focus on Cybersecurity products. I wouldn’t be surprised if we see 1-2 credible AppSec products from these labs in the next 12-18 months. But if history is any indication, AppSec products of all kinds (from labs, startups, old school players) will continue to thrive, while analysts and bloggers continue the pointless platforms v/s point solutions debates :P

That’s it for today! Are you an AppSec professional who has been asked the “but won’t Claude kill AppSec” question? Do you think what we have today from the labs is more than just demoware? How are you leveraging AI to scale AppSec? Let me know! You can drop me a message on Twitter (or whatever it is called these days), LinkedIn, or email. I am also the co-founder of Seezo. We help companies automate security design reviews at scale. Check us out if that’s your thing :) If you find this newsletter useful, share it with a friend, colleague, or on social media.

Browser Relay: When Your AI Assistant Gets Hands on Your Browser

Anshuman Bhartiya — Mon, 26 Jan 2026 23:35:56 GMT

Disclaimer: This is a cross post from my tech blog, co-authored by my personal AI assistant Sage.

Introduction

I am sure you have felt it — the relentless firehose of information. X (formerly Twitter) has become ground zero for AI and tech announcements. LinkedIn? Usually a few days behind. By the time something hits LinkedIn, the X crowd has already dissected it, built demos, and moved on.

So, like many others, I started using X bookmarks religiously. Every interesting thread, every promising tool, every “I need to read this later” moment — bookmarked. The problem? “Later” never comes. My bookmark count just kept growing.

I already have Sage running — my personal AI assistant powered by Clawdbot. While browsing the Clawdbot docs, I stumbled onto something called Browser Relay and decided to give it a try.

Browser Relay: Giving Your AI Hands

Here’s the pitch: instead of wrestling with APIs, what if your AI assistant could just… use your browser? Like you do?

The idea sounds crazy at first. But think about it — when you want to check your bookmarks, you open Chrome, go to X, and scroll through them. What if Claude could do the same thing?

That’s exactly what Browser Relay enables. It’s a Chrome extension that gives Clawdbot the ability to control a tab in your browser. The AI can navigate, click, scroll, read content — basically do whatever you could do manually.

And yes, you can trigger all of this from a Telegram message.

How It Actually Works

Let me walk you through the architecture. Based on the Clawdbot browser documentation, here’s how the pieces fit together:

The Architecture

The system has two main components:

Browser Control Server (port 18791): An HTTP API that receives commands from the Clawdbot agent. “Click this button.” “Navigate to this URL.” “Take a screenshot.” It connects to Chrome via the Chrome DevTools Protocol (CDP).
Chrome Extension: Uses Chrome’s chrome.debugger API to enable CDP access. When you click the extension icon on a tab, it attaches that tab to the relay — giving the control server the ability to drive it.

The control server talks to Chrome via CDP (defaulting to port 18792), which is the same protocol that powers Chrome’s developer tools. This is what makes the whole thing possible — CDP provides programmatic access to everything in the browser.

My Setup

In my case, this is what the flow looks like:

Telegram Message
      ↓
Synology NAS (Docker container running Clawdbot Gateway + Agent)
      ↓
Claude API (Anthropic cloud)
      ↓
When browser access needed...
      ↓
Browser Relay → Mac (Chrome with extension enabled)

The gateway lives on my NAS as a Docker container. The browser runs on my Mac. They talk to each other over my local network (secured via Tailscale). When the agent needs to access something in the browser, it reaches out to my Mac where Chrome is running with the relay extension.

Note: I’m moving to a Mac Mini for the gateway soon — Docker image builds on the NAS take forever. The flexibility of self-hosting means you can evolve your setup over time.

The Manual Attachment Requirement

Here’s an important security detail from the Chrome extension docs: the extension doesn’t auto-attach to tabs. You have to explicitly click the Clawdbot toolbar icon to attach a tab. The badge shows:

ON — attached and ready
… — connecting
! — relay unreachable

This is intentional. You’re granting the AI access to a specific tab, not handing over your entire browser.

The Magic Moment

The first time I triggered this from Telegram, I won’t lie — it felt like magic.

I sent a message: “Sage, go through my X bookmarks and summarize the AI-related ones.”

Then I watched my Chrome tab come alive. Navigation happening. Scrolling. The AI reading through my bookmarks, clicking into threads, extracting the content. All while I just… watched.

A few minutes later, a nicely formatted summary landed in my Telegram chat.

No API keys. No rate limits. No authentication dance. Just my AI assistant using my browser like I would.

Now, Let’s Put the Security Hat On

Alright, time for the uncomfortable conversation. Because as cool as this is, there’s a reason the Clawdbot security guide includes this warning:

“This is powerful and risky. Treat it like giving the model ‘hands on your browser’.”

Let’s break down what you’re actually enabling:

What the AI Can Access

When you attach a tab via Browser Relay, the AI can:

Navigate to any URL in that tab
Click, type, and interact with any element
Read all page content (DOM, text)
Access whatever you’re logged into — cookies, session state, everything
Can run JS predicates/evaluations via the browser tool interface

Read that third-to-last point again. If you’re logged into Gmail in that tab, the AI can read and send emails. If you’re logged into your bank… you get the picture.

The Attack Surface

A Note on Prompt Injection and Modern Models

That said, it’s worth noting that state-of-the-art models have gotten significantly better at detecting and resisting prompt injection attempts. Anthropic’s Claude Opus 4.5, in particular, has shown strong resistance to adversarial prompts embedded in page content.

This doesn’t mean the risk is zero — you should still be cautious about which pages you attach. But I’ve decided to stick with Opus 4.5 as my inference model specifically because of its robustness against these attacks. The combination of a capable model that can resist manipulation, plus the manual tab attachment requirement, gives me reasonable confidence for my use case.

Your mileage may vary depending on your risk tolerance and what you’re accessing.

What This Is NOT

To be clear: Browser Relay is not the same as Clawdbot’s managed browser profile (the clawd profile). That one runs in isolation — separate user data directory, no access to your personal sessions.

Browser Relay explicitly uses YOUR browser with YOUR logged-in sessions. That’s the whole point — and the whole risk.

The Trust Model: Self-Hosted = You’re the Security Team

Here’s where it gets philosophical. The entire architecture is self-hosted. My gateway runs on my NAS. The browser relay is on my Mac. All traffic stays on my local network (or Tailscale mesh).

There’s no Clawdbot cloud service harvesting my data. No third-party servers in the middle. Browser relay traffic stays local or on your tailnet; outbound calls depend on your enabled providers and tools (LLM APIs, skills registry, etc.).

But here’s the tradeoff: you’re the security team now.

If you misconfigure something — say, bind the browser control server to 0.0.0.0 instead of loopback — you’ve just exposed your browser to your entire network. The Clawdbot security docs are explicit about this:

“Never bind to 0.0.0.0. Never use Tailscale Funnel for browser control.”

Recommendations from the docs:

Use a dedicated Chrome profile for the relay (not your daily browser)
Keep the browser control server on loopback + Tailscale only
Use separate tokens for browser control vs. gateway auth
I’m choosing Opus 4.5 for its robustness against prompt injection

Being an early adopter of AI tooling means accepting this responsibility. You don’t get a security team vetting your setup. You ARE the security team.

When to Use What

So should you use Browser Relay? It depends on your use case and risk tolerance.

For my bookmark use case, Browser Relay makes sense. I need access to my logged-in X account. An API would require credentials I don’t want to manage. The managed browser would require me to log into X separately.

But for web scraping random sites? I’d use the managed browser profile. No need to expose my personal sessions for that.

Conclusion and Final Thoughts

The AI wave is here, and tools like Clawdbot are making it accessible to anyone willing to run their own infrastructure. Browser Relay is one of those capabilities that feels like a glimpse into the future — your AI assistant operating your computer on your behalf.

But with great power comes great responsibility (sorry, had to).

If you’re going to use Browser Relay:

Use a dedicated Chrome profile — not your daily driver
Keep everything on Tailscale — no public exposure
Understand what you’re enabling — full session access is no joke
Stay paranoid — prompt injection is a real risk, but modern models help
Stick with robust models — Opus 4.5 offers good protection against adversarial prompts

The early adopter tax is real. But honestly? Watching my AI scroll through my bookmarks while I sip coffee might just be worth it.

If you’re interested in trying Clawdbot, check out the documentation and join the Discord community. And if you have thoughts on the security model, I’d love to hear them — reach out on LinkedIn or X.

Until next time, ciao!

References

Skills: The Missing Piece in AI Security Tooling

Anshuman Bhartiya — Fri, 23 Jan 2026 06:13:18 GMT

Disclaimer: This is a cross post from my tech blog, co-authored by my personal AI assistant Sage.

The Industry Problem: One-Size-Fits-All Security Analysis

Here’s a pattern I’ve seen across the security industry: we build tools that apply the same methodology regardless of what they’re analyzing.

Run STRIDE on a web app? You get STRIDE threats. Run STRIDE on a mobile app? Same STRIDE categories. Run STRIDE on a multi-agent AI application with cascade confidence propagation and LLM tool execution? Still the same STRIDE threats.

This is a problem because agentic applications have fundamentally different risk profiles than traditional software. When your application can:

Execute tools autonomously
Chain multiple AI agents together
Propagate confidence scores between decision-makers
Accept natural language that becomes executable logic

...you need threat modeling that understands these patterns.

This isn’t just a SecureVibes problem. It’s an industry problem. And I believe skills are the solution.

What Are Skills? (And Why They Matter)

TL;DR — Skills are modular knowledge packages that augment AI agents with domain-specific expertise. They’re the LLM-native equivalent of Semgrep rules—but for reasoning, not pattern matching.

If you’ve worked with AI coding assistants, you’ve probably seen context files like CLAUDE.md that give the AI information about your codebase. Skills take this concept further - they’re structured knowledge packages that teach an AI agent how to think about specific domains.

A security skill might include:

Detection patterns - How to recognize when the skill applies
Threat categories - Domain-specific vulnerability classes
Examples - Real-world attack scenarios
Reference materials - Validation logic and deeper context

The key insight: skills don’t replace the agent’s reasoning - they augment it with domain expertise.

And here’s what makes this exciting for the industry: skills are portable. They’re just markdown and code. No vendor lock-in. No proprietary formats.

The Experiment: Proving Skills Work

To demonstrate the power of skills, I ran a controlled experiment using SecureVibes’ threat modeling subagent.

The Test Subject: FinBot

I used finbot-ctf-multiagent - a multi-agent invoice processing system that’s the flagship project for OWASP’s Agentic Security Initiative (ASI).

FinBot is ideal because it exhibits real agentic patterns:

Multi-agent chain: ValidatorAgent → RiskAnalyzerAgent → ApprovalAgent → PaymentProcessorAgent
Cascade confidence propagation between agents
Custom goal injection via admin endpoints
LLM tool execution for invoice processing

Important note on methodology: I modified the FinBot codebase to remove all hints, comments, and obvious vulnerability markers. This ensured the testing was purely unbiased - SecureVibes had to discover the agentic patterns and threats on its own, without any breadcrumbs.

Two Runs, Same Codebase

Run 1: Generic STRIDE (no skills)

Standard threat modeling methodology
No context about AI/LLM-specific risks

Run 2: STRIDE + OWASP ASI Skills

Augmented with agentic security skills
Skills derived from OWASP Top 10 for Agentic Applications

The Results: Data Speaks

High-Level Comparison

The skill-augmented run found 9 threats in agentic-specific categories that generic STRIDE simply cannot identify. These aren’t relabeled STRIDE threats - they’re fundamentally different risk categories that only exist in multi-agent systems.

Threat Category Breakdown

STRIDE-Only Distribution

STRIDE + ASI Skills Distribution

The 9 Threats Only Skills Could Find

These are threats that exist in categories generic STRIDE doesn’t even know about:

ASI07 (Cascade Failure Exploitation) is a perfect example: an attacker can trigger specific failures in one agent that manipulate downstream agents through confidence propagation. This is a risk unique to multi-agent systems - STRIDE has no category for it.

Context Awareness: How Skills Detect Agentic Patterns

Before generating threats, the skill-augmented run automatically detected these patterns:

✅ OpenAI API usage (gpt-4o-mini)

✅ Multi-agent chain (ValidatorAgent, RiskAnalyzerAgent, ApprovalAgent, PaymentProcessorAgent)

✅ LLM function calling/tool execution

✅ Custom goal injection via admin interface

✅ Cascade confidence propagation between agents

This is what enables targeted analysis. The agent knew it was analyzing a multi-agent system before it started threat modeling, so it applied the right mental models.

Why This Matters for the Industry

1. Every Application Type Needs Its Own Skills

The same principle applies across the board:

You don’t need different tools for each application type—you need different skills for the same tool.

2. Skills Are the New Rules

Traditional security tools rely on rules:

Semgrep rules for SAST
YARA rules for malware
Snort rules for IDS

Skills are the LLM-native equivalent. But instead of pattern matching, they enable reasoning. An agent with the right skills can:

Understand context, not just syntax
Chain together multi-step attack scenarios
Identify domain-specific risks

3. The Open Source Shift

Trail of Bits recently open-sourced their skills for security research and audit workflows. This signals a shift: the future of security tooling isn’t monolithic products - it’s composable, shareable expertise that makes everyone’s agents smarter.

The industry is moving toward:

Composable expertise that can be shared
Community-driven knowledge that improves over time
Portable skills that work across tools

This is how we collectively get better at security - not by hoarding knowledge in proprietary tools, but by sharing it as skills anyone can use.

How I Created the Agentic Security Skills

Here’s the part that surprised me: creating these skills was fast.

The traditional bottleneck in security tooling has always been knowledge engineering. Someone has to read the documentation, understand the threat landscape, and encode that knowledge into rules or patterns. This takes weeks or months.

Here’s what I did instead:

Downloaded the OWASP Top 10 for Agentic Applications PDF
Converted it to markdown - a straightforward transformation
Pointed Claude Code at it with context from a skills best practices guide I curated
Gave it the existing DAST skills as an example of the structure and format I wanted
Asked it to create skills for all ASI01-ASI10 categories

The result? A complete skill set for agentic threat modeling in a fraction of the time it would have taken to build manually.

This is the paradigm shift. Humans used to be the bottleneck in encoding security knowledge. But if you give AI the right context - the source material, the format you want, and examples to follow - you can move incredibly fast.

Embracing Non-Determinism

Here’s something that trips people up when they first use AI for security analysis: the results aren’t always the same.

Run the same threat model twice and you might get slightly different threats. Run it with a different model and you’ll definitely get different results. This bothers people who are used to deterministic tools where the same input always produces the same output.

But here’s the thing: this isn’t a bug - it’s a feature you should embrace.

The right approach isn’t to expect consistency. It’s to:

Run the workflow multiple times - maybe 2-3 runs
Try different models - Sonnet and Opus often catch different things
Consolidate the results - union of all findings

What you’ll find is that critical risks show up consistently on every run. These are the threats that matter most - the ones where the signal is so strong that the model can’t miss them regardless of sampling variance.

The threats that appear inconsistently? They’re often edge cases or lower-severity issues that won’t materially change your security posture. They’re nice to have, but missing them in one run isn’t catastrophic.

This is fundamentally different from traditional SAST tools where you expect deterministic output. But it’s also how human security researchers work - run the same pentest twice with two different testers and you’ll get different findings. We’ve always accepted that in human-driven security work. It’s time to accept it in AI-driven work too.

Building Your Own Skills

The agentic security skill in SecureVibes follows a simple structure:

The SKILL.md teaches the agent:

When to activate - Detection patterns for agentic code
What to look for - Threat categories with code patterns
How to report - Structured templates with required fields

Here’s the detection logic:

You Don’t Need SecureVibes to Use These Skills

Here’s what I love about skills: they’re just files.

You don’t need to run SecureVibes if you don’t want to. Grab the agentic-security skill and drop it in your Claude Code workspace.

Add the skill to your .claude/ directory or reference it in your CLAUDE.md. The next time you ask Claude Code to threat model an agentic application, it will have the full OWASP ASI taxonomy with detection patterns and examples.

Key Takeaways

Generic threat modeling produces generic threats. STRIDE on an agentic app gives you STRIDE categories, missing agentic-specific risks.
Skills enable context-aware analysis. The skill-augmented run found 9 threats in ASI categories that STRIDE couldn’t categorize—including cascade failures, goal hijacking, and context pollution.
Relevant threats > More threats. 9 agentic-specific threats that actually apply to your multi-agent system are more valuable than 30 generic threats that may or may not be relevant.
Skills are portable. Use them with Claude Code—they’re just markdown files with structured knowledge.
Creating skills is now fast. Download the PDF, convert to markdown, give AI the right context and examples, and validate the output. What used to take weeks now takes hours.

If you have ideas for skills or want to contribute, please reach out! The more skills we share, the smarter everyone’s agents become.

Until next time, ciao!

Ep 32: Rethinking Enterprise Security in an AI- and Platform-First World with Kane Narraway

Anshuman Bhartiya — Mon, 19 Jan 2026 11:43:42 GMT

In this episode, we sit down with Kane Narraway to unpack how enterprise security is changing as AI, platforms, and developer-driven security become the norm. Kane shares his path from digital forensics to leading security at Canva, and why understanding company culture matters just as much as choosing the right tools.

We discuss why modern security is becoming platform-first, why much of the security vendor market optimizes for finding problems rather than fixing them, and why Kane believes security teams need more engineers and fewer manual processes.

The conversation also digs into AI security, shadow IT (and shadow AI), and the real-world trade-offs between usability and control, especially as low-code and no-code tools become more common inside companies.

00:00–03:25 — Kane’s journey from law enforcement to platform security, shaped by our time at Atlassian

03:25–06:37 — Why enterprise security becomes platform-first faster than AppSec

06:37–09:26 — Why security teams fail when they fight company culture

09:26–13:36 — Platforms vs best-of-breed tools: trade-offs, not ideology

13:36–17:45 — Why most security startups are built to be acquired

17:45–22:16 — Open source agents, and business-specific vulnerability research

22:16–27:09 — AI security, prompt injection, and the access-control problem

27:09–31:29 — Build vs buy in the AI era. Speed is easy, and why maintenance remains the real bottleneck.

31:29–40:42 — Agents, MCPs, and why stopgap solutions dominate today

40:42–48:57 — Shadow AI, low-code automation, and familiar security failures

Tune in for a deep dive!

Connect with Kane Narraway:

LinkedIn: https://www.linkedin.com/in/kane-n/

Blog: https://kanenarraway.com/

Connect with Anshuman:

LinkedIn: ⁠⁠⁠⁠anshumanbhartiya

X: ⁠⁠⁠⁠https://x.com/anshuman_bh

Website: ⁠⁠⁠⁠https://anshumanbhartiya.com/

⁠⁠⁠⁠Instagram: anshuman.bhartiya

Connect with Sandesh:

LinkedIn: ⁠⁠⁠⁠anandsandesh

X: ⁠⁠⁠⁠https://x.com/JubbaOnJeans

Welcome!

Wed, 31 Dec 2025 06:11:58 GMT

Anshuman & Sandesh on Security. 2 blogs. 1 podcast. A Slack community.

Edition #2: Agent Security Standards + Identity/Authorization + Secure Agent Engineering + SecureVibes Update

Anshuman Bhartiya — Mon, 15 Dec 2025 16:00:00 GMT

Sandesh Mysore Anand and I recorded a couple of podcast episodes for The Boring AppSec Podcast with Ken Huang and Teja Myneedu over the past few weeks. Below are some key takeaways from them:

Architecting AI Security: Standards and Agentic Systems with Ken Huang

Infographics of the podcast summary generated by NotebookLM

The conversation focused on the necessity of new security frameworks and authentication methods to manage the unique risks posed by autonomous AI agents.

New Standards for Measuring AI Agent Risk: OWASP AIVSS

Ken detailed the AIVSS framework’s purpose and structure.

The Goal: AIVSS aims to provide a way to measure core agent AI security risks to enable better risk management, fitting into the “measure” component of the NIST AI-RMF framework.
Addressing Autonomy: Traditional scoring systems like CVSS are deterministic, measuring code and configuration. They are insufficient for agentic AI due to its non-deterministic and autonomous nature.
The Scoring Approach: AIVSS builds upon CVSS by adding an agent AI risk factor to account for non-deterministic risks. This factor considers the agent’s level of autonomy (ranging from non-existent to full autonomy), as different levels present varying risk factors.
Framework Components: AIVSS offers a quantitative, numerical score. It is being developed alongside a qualitative, decision-matrix-based system called SSVC (Stakeholder-Specific Vulnerability Categorization).

The Shortcomings of Traditional IAM for AI Agents

Ken asserted that traditional Identity and Access Management (IAM) systems, such as OAuth and SAML, are fundamentally inadequate for securing AI agents. These legacy standards were designed for web applications acting on a human’s behalf.

Session-Scoped vs. Task-Scoped: The primary issue is that current OAuth flows are session-scoped (time-based) and grant access that is additive upon request. Agents, however, require dynamic, fine-grained access that is strictly task-scoped. Access should be removed once a task is finished, requiring a new permission request for subsequent tasks.
Coarse-Grained Access: Traditional IAM is often either too restrictive, stifling the agent’s necessary agency, or too coarse-grained. For instance, an HR agent might need access to a resume database but should be restricted from the salary database; granting the full human identity is too risky.
Multi-Agent Complexity: Current systems struggle to accommodate multi-agent systems, which are key to future AI workflows. In these environments, different agents assume different identities, and access must be managed with a dynamic task scope.
The Way Forward: A new standard is necessary. This standard must allow for agency while maintaining security by consistently checking the agent’s intent before granting access.

Securing Agent-to-Agent (A2A) Communication

The rise of agent development kits (ADKs) and A2A protocols (like Google’s A2A protocol) introduces new security challenges beyond those seen in traditional API security.

Beyond BOLA: While standard API issues like BOLA (Broken Object Level Authorization) still exist, A2A communication requires systems to handle issues like trust, capability, and quality of service. Agents must be protected from risks like poisoned agent cards or rug pull attacks.
New Protocols: Ken emphasized the need for protocols like the Agent Capability Negotiation and Binding Protocol (ACNBP). This protocol facilitates validation using digital signatures to ensure the agent possesses the capabilities and quality of service it claims.
Goal Manipulation Attacks: A major threat to autonomous systems is goal manipulation, which is challenging to defend against. This includes attacks like Drifting (Crescendo Attack) - Gradually shifting the agent’s intended goal (e.g., prompting a security agent to open ports instead of locking them), Malicious Goal Expansion - Using prompt injection to force an agent to execute its assigned task while also performing a malicious secondary task, such as leaking secret environment variables and Exhaustion Loop - Using direct or indirect prompt injection to make the agent perform a task that never completes, leading to a denial of service or a “denial of wallet”.

Security professionals were encouraged to engage in research-oriented learning and contribute to these evolving standards to keep pace with the rapidly innovating field of AI security.

Scaling Product Security In The AI Era with Teja Myneedu

Infographics of the podcast summary generated by NotebookLM

In this conversation, Teja noted that his transition from focusing purely on product and application security to leading broader security teams provided a crucial worldview: securing products extends beyond the boundary of writing code and affects the entire enterprise.

Security Philosophy and Prioritization

A key evolution in Teja’s philosophy centers on practicality and urgency. He believes security breaches often occur because organizations failed to do the hard work of tightening access or addressing individual vulnerabilities, rather than failing to find the next cool thing.

Security by Obscurity: Teja emphasized that he is willing to accept incremental steps toward security improvement, asserting, “Let’s not make perfect the enemy of good“. While acknowledging the historical debate around “security by obscurity,” he argued that any measure (such as implementing a WAF rule) that improves security by even 1% or 2% daily is valuable. Given that bad actors are increasingly using AI agents to explore attack surface areas, the sense of urgency necessitates immediately plugging the bleeding rather than waiting weeks for an ideal fix. The goal should be to increase the economic complexity of an attack for bad actors.
Risk Prioritization: The discussion touched on the challenge of risk prioritization. Teja noted the dilemma between presenting the full scope of vulnerabilities (which can feel overwhelming) and prioritizing risks. However, all prioritization is inherently flawed and only necessary when resources prevent fixing everything. Security tools often fail at prioritization because they lack necessary context regarding people, processes, and organizational strategy.

AI, Context, and the Future of Fixing

The conversation explored how AI and automation are changing the role of security teams, particularly concerning code fixes. Traditionally, security teams manage vulnerabilities, while developers own the fixing.

Security Engineers as Fixers: We discussed whether security engineers should raise PRs for code fixes. I mentioned that security engineers should know how to fix vulnerabilities and can now use AI to easily propose PRs for engineers to approve or reject. Teja added a crucial nuance: the problem isn’t the technical fix itself, but ensuring the fix doesn’t cause unintended downstream effects (like authorization changes breaking service-to-service calls), which relies heavily on tribal knowledge within the engineering teams.
The Power of Context: AI’s promise lies in reducing the cognitive load on engineers by helping them discover context quickly, serving as “product archaeologists”. Critical product context includes the code repository and deployment infrastructure. The harder aspects of context to capture include team ownership (especially after reorgs), business intent, use cases, and priority. The vendors’ ability to gather and contextualize organizational constraints is the “game changer” for security tooling.

Secure Design and Emerging Threats

Secure by Design vs. Secure Defaults: Secure by Design requires clear architecture and the application of standard security practices. While AI increases the promise of applying known security patterns consistently, we discussed that the term “secure by design” has become so broad it has lost meaningful definition, often encompassing “all of security”. The critical distinction lies between secure design (before building) and secure defaults (implementation).
LLM Novel Threats: Beyond known issues like prompt injection, Teja views the biggest threat as the complexity of identity and authorization. When agents are integrated, they dynamically determine business logic and act as decision-making engines, blurring trust boundaries. This compounds the already difficult problem of access control in microservices. The challenge is granting an agent delegate access with appropriate, limited privileges. Teja also expressed heightened concern over the enterprise environment, particularly the software supply chain risk associated with browser plugins and insecure desktop downloads.

The links to both the episodes are provided in the Appendix below.

On Secure Agent Engineering

I read a blog recently that describes the difference between traditional software engineering and agent engineering, and why senior engineers struggle to build AI agents because they try to code away the probabilistic nature of agents as opposed to embracing its nature. This blog touched upon some key points that resonated well with me. I highly recommend giving it a read.

If I were to draw parallels from this blog to security engineering, below are some quick off the top of my head thoughts on the points mentioned in the blog. Please note that the below points are not exhaustive by any means:

Text is the new State - The fact that a lot of nuance required in general engineering gets lost with data structures, are now possible to be fed via prompts. And, agents have a tendency to pay attention to them. This is essentially a breeding ground for prompt injection attacks. Prompt Injection at large still hasn’t been solved with folks still trying to settle the debate whether its a vulnerability or not. In a chatbot application, where the very nature of the app is to take user input via prompt and respond back, there could be a middleware or some kind of a proxy/filter that can look for prompt injection attacks as a defensive measure but imo, the greater risk is that with something like an indirect prompt injection - where in, the main functionality of an app could be something innocuous, for example, to upload files. But, if the app uses some kind of an AI engine in the backend to process these docs (that might contain hidden malicious prompts), it could result in catastrophic outcomes that weren’t obvious. My prediction is that direct/indirect prompt injection will continue to be a major pain when it comes to agentic apps/systems. The only way to manage this risk is to follow a defense in depth strategy, reducing blast radius, and following general security principles of implementing secure defaults, least privilege and having good observability in place. In order to defend against these attacks, the CaMeL approach looks promising but time will tell how effective and scaleable it will be.
Agent Intent - There could be multiple ways of getting to an outcome and we as humans might not know about all of them. So, instead of hardcoding them and restricting the agents, we should really focus more on the agents intent and the outcome and let the agent decide how they want to get to that. We can set meaningful milestones to ensure the trajectory is correct but restricting the agents by coding in all the edge cases is really not building an effective agentic system. Having said that, this is a security nightmare because protecting probabilistic outcomes is not trivial. Security needs to focus on the intent of the agent before taking any step - what its trying to do, what permissions it has, what systems it is connected to, etc. If its a high risk action, getting a human to approve it are going to be table stakes. The dynamic and adaptable nature of security guardrails/policies is what is needed in the agentic AI space. The traditional rules based policy engines aren’t going to cut it unfortunately.
Error handling - Agents can operate autonomously so giving them the agency to take errors and resolve them dynamically, instead of failing the entire workflows is the way to build effective autonomous agentic systems. Lets consider this from a DAST (Dynamic Application Security Testing) perspective - Imagine you are building a DAST agent. By nature, a DAST agent needs to send a bunch of payloads to its target, observe the response and deduce whether something is a vulnerability or not. This is how traditional DAST scanners have worked. No dynamic decisions are made based on the applications behavior. In the AI era now, imo - there is a lot of room for improvement in such scenarios. For example, depending upon the errors received in the response, the DAST agent can adapt dynamically and continue probing the target more effectively instead of simply spraying and praying. This will also address the DDOS type attacks by such scanners because there won’t be a need to throw a bunch of non-relevant payloads at a target and bring them down. Making your agents smart, adaptable and stealthy will really test the efficacy of security controls. Having said that, one important point worth mentioning here is to enable an agent to self correct/adapt in a sandboxed environment. You don’t want to give your DAST agent the permission to access the file system, only to realize that it rm -rf’ed itself, while trying to fix something.
Evaluating behavior or testing probabilistic systems - In agentic systems, unit tests just aren’t enough. Reliability, Quality and Tracing are key things to evaluate agentic behavior against. Let me explain this by using my own experience of building a vulnerability triage agent - I was flummoxed by its outcome because it was different every time I ran it. I wasn’t sure how to improve it because it wasn’t like the variance in the outcome was acceptable. It was basically true positive on one run and false positive on the next. And, without the AI/ML background, I had no idea how to build evals to actually make this triage agent work reliably. I started thinking from first principles. I started using the outcome from each run and fed it back to the AI to help me understand how I could improve the prompt so that the outcome was consistent and aligned with what I’d expect it to be. The AI would suggest some changes like adding logs at different points where the agent was making decisions. I’d look at the suggestions, make minor improvements and implement them. I’d then re-run the workflow and see if the outcome changed - whether it got better or worse. I was really prompt engineering at this point. Soon enough, I started seeing consistent outcomes from the agent. I could trust it, the quality of the reasoning was solid and I had traces of every action/decision the agent was making. I didn’t realize that I was unintentionally building some sort of an eval system manually where I had an input (vulnerability data), the expected output (human triage of the vulnerability whether a TP or a FP) and a prompt (context) that I could use AI to help improve to get to the desired outcome. Simple approaches like this are often lost in the hype cycle, but when you actually stick to first principles, it all makes sense.
Implicit vs Explicit Context - As humans, we have a lot of tribal knowledge and assumptions/perceptions of the world i.e. implicit context. In software engineering, this translates to things like variable / function / tool naming, etc. But, more often than not, we don’t do a good enough job of this nomenclature, and keep it ambiguous. Agents work differently. The more accurate context we provide them, the less ambiguity we leave upto them to decipher, and the better outcomes we are going to see. Another way to think about this is that traditional APIs are also not adaptable in the sense that it expects a pre-defined input and has specified output formats. Its too restrictive. Agents, on the other hand have the capability to adapt during runtime by reading tool definitions and adjusting the inputs accordingly. MCP tool definitions are a great example here. A tool having a wrong docstring definition or an incomplete name could lead to wrong invocations resulting in unintended consequences. Removing the ambiguity and making it dead simple for agents is a recipe to build reliable and secure agentic systems.

I feel that the quote below from the blog sums it up pretty well, where “it” refers to the probabilistic nature of agents.

“You must manage it through evals and self-correction.”

SecureVibes Update

For those who might not know, I open sourced a project called “SecureVibes” that is meant to help vibecoders find security vulnerabilities in their codebase using a slightly different approach as compared to traditional tools. You can read more about it here. The project is on my Github here. Below are some updates on it:

Mahmud Muhammad gave a presentation at Devfest Llorin and demo’ed SecureVibes. Here is the link to this deck.
I got an opportunity to present SecureVibes at a local AI Tinkerers Meetup in San Diego. The community loved it and it was voted the community favorite and was also featured in their newsletter here.
Shoutout to Yogi Kortisa and Kolla Harish for contributing to its code. We are going to keep improving it and learning from it, as we continue to operate in this greenfield area. Watch out for this space as we will share new learnings here. If you’d like to help contribute to it, have questions or simply interested in following along our journey, please feel free to join the Discord server below.

Ep 31: The Future of Developer Security with Travis McPeak

Sandesh Mysore Anand — Mon, 15 Dec 2025 16:00:00 GMT

In this episode, we sit down with Travis McPeak, one of the most prominent thinkers in the space of developer security. Travis, who built his career at the intersection of security automation and developer productivity, shares his philosophy on achieving security at scale in the AI era.

His career spans security leadership roles at major tech companies, including Symantec, IBM, Netflix, and Databricks. Most recently, he founded and served as CEO of Resourcely, a startup built on the idea of making cloud infrastructure secure by default, before being “acqui-hired” by Cursor, the rapidly growing AI-powered code editor, to lead security and enterprise readiness.

Key Takeaways

AI for Secure by Default: AI tools provide the best injection point to shift security “all the way left” and move past the reactive “whack-a-mole” approach, because developers are already motivated to use these highly effective tools.

Changing AppSec Strategy: AI dramatically changes the nature of AppSec by making previously unscalable strategies, such as threat modeling, applicable. AI can generate architecture diagrams on demand by tracing through code.

The Compliance Bottleneck: The dramatic consolidation of cloud security vendors reflects how compliance-minded the security industry remains. Critical infrastructure misconfigurations (like public databases being left open) often go unaddressed because they are not measured by compliance standards.

Platform vs. Point Solutions: Travis argues against platforms that are often amalgamations of poorly integrated acquired tools. He suggests buying the single best point solution for a high-leverage problem and using AI capabilities to operationalize and wire it into internal systems, thereby simplifying integrations that platforms traditionally provide.

The Skeptical Coder: A fundamental limitation of Large Language Models (LLMs) is their desire to “make you happy,” causing them to provide answers even if they are incorrect. Therefore, engineers must use AI output only as a starting point and only consider the code finished when they understand it fully end to end.

Prompt Injection Defined: Prompt injection is confirmed as a legitimate vulnerability, essentially a rehash of old issues like cross-site scripting and SQL injection, arising from the improper separation between the LLM instruction and the user instruction.

Tune in for a deep dive!

Connect with Travis:
LinkedIn: travismcpeak
Company Website: https://cursor.com/

Connect with Anshuman:
LinkedIn: ⁠⁠⁠⁠anshumanbhartiya
X: ⁠⁠⁠⁠https://x.com/anshuman_bh
Website: ⁠⁠⁠⁠https://anshumanbhartiya.com/
⁠⁠⁠⁠Instagram: anshuman.bhartiya

Connect with Sandesh:
LinkedIn: ⁠⁠⁠⁠ anandsandesh
X: ⁠⁠⁠⁠https://x.com/JubbaOnJeans
Boring AppSec
E1-27: Getting the Boring aspects of AppSec right E28+: All aspects of building AppSec products
By Sandesh Mysore Anand

Ep 30: Scaling Product Security In The AI Era with Teja Myneedu

Sandesh Mysore Anand — Fri, 05 Dec 2025 11:12:00 GMT

In this episode, we sit down with Teja Myneedu, Sr. Director, Security and Trust at Navan. He shares his philosophy on achieving security at scale, discussing some challenges and approaches specially in the AI era.

Teja's career spans over two decades on the front lines of product security at hyper-growth companies like Splunk. He currently operates at the complex intersection of FinTech and corporate travel, where his responsibilities include securing financial transactions and ensuring the physical duty of care for global travelers.

Key Takeaways

• Scaling Security Philosophy: Security programs should be built on developer empathy and innovative solutions, scaling with context and automation.

• Pragmatic Protection: Focus on incremental, practical improvements (like WAF rules) to secure the enterprise immediately, instead of letting the pursuit of perfection delay necessary defenses; security by obscurity is not always bad.

• Flawed Prioritization: Prioritization frameworks are often flawed because they lack organizational and business context, which security tools fail to provide.

• AI and Code Fixes: AI is changing the application security field by reducing the cognitive load on engineers and making it easier for security teams to propose vulnerability fixes (PRs).

• The Authorization Dilemma: The biggest novel threat introduced by LLMs is the complexity of identity and authorization, as agents require delegate access and dynamically determine business logic.

Tune in for a deep dive!

Connect with Teja:

LinkedIn: myneedu

Company Website: https://www.navan.com

Connect with Anshuman:

LinkedIn: ⁠⁠⁠⁠anshumanbhartiya

X: ⁠⁠⁠⁠https://x.com/anshuman_bh

Website: ⁠⁠⁠⁠https://anshumanbhartiya.com/

⁠⁠⁠⁠Instagram: ⁠⁠anshuman.bhartiya

Connect with Sandesh:

LinkedIn: ⁠⁠⁠⁠anandsandesh

X: ⁠⁠⁠⁠https://x.com/JubbaOnJeans ⁠⁠⁠⁠

Ep 29: Architecting AI Security: Standards and Agentic Systems with Ken Huang

Sandesh Mysore Anand — Tue, 25 Nov 2025 11:31:00 GMT

In this episode, we sit down with Ken Huang, a core architect behind modern AI security standards, to discuss the revolutionary challenges posed by agentic AI systems. Ken, who chairs the OWASP AIVSS project and co-chairs the AI safety working groups at the Cloud Security Alliance, breaks down how security professionals are writing the rulebook for a future driven by autonomous agents.

Key Takeaways

• AIVSS for Non-Deterministic Risk: The OWASP AIVSS project aims to provide a quantitative measure for core agent AI risks by applying an agent AI risk factor on top of CVSS, specifically addressing the autonomy and non-deterministic nature of AI agents.

• Need for Task-Scoped IAM: Traditional OAuth and SAML are inadequate for agentic systems because they provide coarse-grained, session-scoped access control. New authentication standards must be task-scoped, dynamically removing access once a specific task is complete, and driven by verifying the agent’s intent.

• A2A Security Requires New Protocols: Agent-to-Agent communication (A2A) introduces security issues beyond traditional API security (like BOLA). New systems must utilize protocols for Agent Capability Discovery and Negotiation—validated by digital signatures—to ensure the trustworthiness and promised quality of service from interacting agents.

• Goal Manipulation is a Critical Threat: Sophisticated attacks often utilize context engineering to execute goal manipulation against agents. These attacks include gradually shifting an agent’s objective (crescendo attack), using prompt injection to force the agent to expose secrets (malicious goal expansion), and forcing endless processing loops (exhaustion loop/denial of wallet).

Tune in for a deep dive!

Connect with Ken:

LinkedIn: kenhuang8

Company Website: https://distributedapps.ai/

Substack: https://kenhuangus.substack.com/

Paper (Agent Capability Negotiation and Binding Protocol): https://arxiv.org/abs/2506.13590

Book (Securing AI Agents): https://www.amazon.com/Securing-AI-Agents

AIVSS: https://aivss.owasp.org/

Connect with Anshuman:

LinkedIn: ⁠⁠⁠⁠anshumanbhartiya

X: ⁠⁠⁠⁠https://x.com/anshuman_bh

Website: ⁠⁠⁠⁠https://anshumanbhartiya.com/

⁠⁠⁠⁠Instagram: anshuman.bhartiya

Connect with Sandesh:

LinkedIn: ⁠⁠⁠⁠ anandsandesh

X: ⁠⁠⁠⁠https://x.com/JubbaOnJeans

Edition #1: AI for Offense Is Here. Defenders Aren’t Ready.

Anshuman Bhartiya — Mon, 17 Nov 2025 16:00:00 GMT

A Chinese state-sponsored group GTG-1002 ran a full offensive campaign using Claude Code sub-agents with MCP, automating 80–90% of the kill chain across ~30 targets. This isn’t a red-team exercise; it’s the first real glimpse of what AI-native offensive ops look like in the wild.

The bad actors automated the entire cyber kill chain: Recon -> Attack Surface Mapping -> Vulnerability Detection -> Vulnerability Validation -> Credentials Harvesting -> Lateral Movement -> Data Collection

Each phase likely had a sub-agent working semi-autonomously, with human operators only stepping in for approvals and course corrections.

If you’re a defender still treating coding agents as toys, this is your wake-up call.

I read the full report from Anthropic; you can find it here. Below are my takeaways as someone who’s been building similar AI-native systems (like SecureVibes) AND who has built bug hunting machines in the past (like BountyMachine).

What is a traditional web hacking campaign?

Before I start talking about the report itself, I want to take a slight detour first and walk you through how I would try to hack into any organization using open source tools (taking a hypothetical web based example below). It would look something like below:

Perform reconnaissance on my target’s domain and gather all sub-domains using tools like amass, subfinder, etc.
Port scan all domains and sub-domain using tools like nmap, masscan, etc.
For any domain/sub-domain that has, let’s say, a web console exposed on port 80/443: run a directory bruteforcer like wfuzz, ffuf, etc., run a screenshot utility to take screenshots using tools like gowitness, probe the web console further to detect things like tech stack, versions, etc. using tools like httpx, so on and so forth.
Triage all the results obtained
Research if there are any known exploits/CVEs against the gathered stack
Try to exploit them against the target assets and see if I can laterally move in their environment and gain a persistent foothold, eventually exfiltrating keys to their kingdom.

You get the idea. There is nothing novel about any of this. These techniques and tools have been known to hackers since years. Some of the steps of the workflow above could definitely be automated and chained together.

Matter of fact, we presented BountyMachine - a system that I, along with a couple other hacker friends, built to automate our bug bounty hunting workflow. We presented this system in a talk titled “Bug Bounty Hunting on Steroids“ at Defcon Recon Village 7 years ago. You can watch that video here.

Building this system involved building custom orchestration, glue code, and bespoke infra using technologies like Kubernetes and Argo, which were green field back then. Stitching tools together, handling weird output formats, and managing state across stages was half the work. We ( Glenn Grant , Mohammed Diaa and I ) spent months building it out and it got pretty complex. The progress was slow, the incentives weren’t there and we soon gave up on it.

The point being that this is how I’d have traditionally conducted automated web hacking campaigns, before I knew anything about AI. The ROI just wasn’t there for hobbyists / white hats like me. Maybe, state sponsored groups already had such systems operating at scale, but what do I know!

Why AI-orchestrated operation is different from traditional campaigns

Anthropic called the attack a “sophisticated cyber espionage operation“. There are folks in the security industry debating whether the attack was sophisticated or not, and whether there was anything new/novel in this attack as compared to traditional campaigns.

The report insinuates similar techniques/tools as mentioned in the previous section so I agree that there is nothing novel or sophisticated about them as such.

Rather, the novelty/sophistication is in the technologies (sub-agents, MCP, coding agent acting as an orchestrator, etc.) used to carry out the techniques/tools and the simplicity of building such systems these days in a fraction of time using AI by anybody, including script kiddies.

GTG-1002 might just be a bunch of script kiddies hacking in a basement. I guess we will never know?

MCP, in particular, is getting widely adopted by organizations trying to implement AI, yet security teams are struggling to wrap their heads around the new attack surface it has opened up. This also likely explains why this campaign went un-detected for a long time. MCP observability is a real risk enterprises are facing today.

If I read between the lines in the report, the bad actors (I am speculating):

used sub-agents for specific objectives leveraging their individual context windows,
orchestrated by Claude Code abstracting away all the glue code work and bespoke infra required to build such autonomous hacking systems,
leveraged the MCP protocol to call open source tools that performed vulnerability scanning and piped data from one tool to another; using LLM as the intelligence layer to deal with interoperability of the tools,
likely used Claude hooks to build human in the loop workflows and kept the campaign well directed

Basically, they were able to automate the entire cyber kill chain using just a general purpose coding agent and its native constructs, MCP and open source tools. If this is not novel, I don’t know what is!

Why I’m not surprised this worked

If you’ve been following me and my LinkedIn posts, you might already know that I have been exploring building AI native systems for a while now, using coding agents like Claude Code, Codex and Droid. I’ve blogged about my technical research on my website here.

I even built an AI native security system for vibecoded applications called SecureVibes in ~2 weeks working solo nights and weekends. It can already find real security vulnerabilities in a codebase using a multi-agent flow (uses Claude sub-agents) and simple prompts following a methodology that a human security professional would.

If I can do that as one person, it’s obvious that motivated, well-resourced bad actors can build something like what Anthropic described, and then push it much further.

The exciting part is what this unlocks for offensive security: automating large chunks of recon, vulnerability discovery, and exploit validation. The nerve-racking part is that this campaign going undetected for a while and successfully hitting real targets is a clear sign -

The gap between what attackers can do with AI and what defenders are prepared for is getting wider, fast.

What can / should you do about it?

This report is going to be turned into slideware by a lot of security vendors and VCs alike. “AI-native” everything. “Our platform will save you”. “This is why we invested in this startup” Sure. They are not wrong! But, lets bring back our attention to what really matters for organizations who might be experiencing similar attack campaigns right now and have no idea about it.

If you’re a defender in an organization, the main lesson isn’t “buy more tools”. It is:

You need to up-skill yourself in using AI as a defensive operator.

No product/platform will save you if:

You don’t understand how these agents actually work under the hood
You don’t know how to integrate them into your existing infra and workflows
You don’t understand the new risks they introduce

If you are fighting against AI Offense, you’ve gotta know how AI works in the first place before you can start leveraging it as a defender to fight against it. You’ve gotta use a machine to fight against a machine.

Also, you, as a defender, know more about your organization than any security product can. I am referring to the institutional knowledge about how tools work and integrate with each other, how software gets built and deployed, the SDLC and the security integration points, etc. So, here are a few things (not an exhaustive list by any means, just some off the top of my head) you can do as defenders in your organizations:

Run safe experiments with agents in your own environment - Give a coding agent access to a staging environment and see how far it can get automating recon or vulnerability validation. Use MCP for tool calling. Make your SOC watch that run and notice the gaps. Improve the gap. Rinse and repeat.
Design for “AI operator” skills on your team - Someone needs to own wiring agents into SOC automation, IR, vulnerability management, instead of leaving them as side projects.
Invest in sandboxed execution by default - Any agent that can run code or touch internal systems should be executed in a tightly controlled sandbox (Cloudflare’s Claude Code sandbox model is a good reference). This will ensure agents can operating freely with agency and are not restricted.
Try automating parts of your job that still requires critical thinking - I will take vulnerability triaging as an example here since that is something I have tried to automate myself with reasonable success and I have some data points to back my advice. Triaging vulnerabilities generally requires human critical thinking and reasoning because there are a lot of factors at play - environment, risk appetite, business impact, technical skills of the triager, etc. Automating it, pre AI, has been difficult because deterministic systems/scripts cannot reason through all of these factors the same way an AI agent can. Also, LLMs are trained on internet data so an AI agent can reason about a vulnerability covering a lot of different perspectives as compared to humans. The process of automating triaging has allowed me to learn about how to bake in determinism into AI native systems with a reasonable variation by building evals. This is a fundamental skill to learn if you want to build reliable AI systems. If you’re interested, I can cover how I think about this in a separate post. Let me know!

To sum it up, you need to learn how to use AI as a force multiplier to build better defensive capabilities. This is just the beginning. The attackers are already moving - you don’t have the luxury of waiting.

Excerpts that stood out (and what they mean)

Claude maintained persistent operational context across sessions spanning multiple days

Coding agents like Claude Code, Codex, Droid can operate for multiple days with minimal human intervention. The operators here pushed that to the edge: long-running agents, consistent context, and steady progress toward objectives. This is a concrete signal that fully autonomous agents aren’t far off; the timelines are shrinking fast.

the human operators claimed that they were employees of legitimate cybersecurity firms and convinced Claude that it was being used in defensive cybersecurity testing.

They evaded Claude’s safety controls with role-play: give the model a narrow, “defensive testing” story, assign it a persona, and frame harmful actions as part of that persona’s job. That’s one of the scariest patterns with AI systems today. As defenders, we need to explicitly think about how to detect and block role-play prompts that reframe abuse as “defense”.

Claude independently determined which credentials provided access to which services, mapping privilege levels and access boundaries without human direction

Given the right tools and environment, coding agents can reason like a capable security operator. The key is agency inside a controlled sandbox: don’t over-constrain them, but don’t let them loose on prod either.

For exploit verification and vulnerability validation: arm the agent with the right tools, let it try attacks in a contained environment, and adapt dynamically to runtime context. Claude skills fit nicely here - you provide example scripts for a skill, and Claude Code generates new code (via the bash tool) on the fly from those patterns.

In SecureVibes, this is how I dynamically test authorization vulnerabilities against a live app. Today it’s not sandboxed. But, if I ever host SecureVibes as a service, sandboxing would be non-negotiable. (The authorization skill in SecureVibes is defined here.)

Structured markdown files tracked discovered services, harvested credentials, extracted data, exploitation techniques, and complete attack progression. This documentation enabled seamless handoff between operators, facilitated campaign resumption after interruptions, and supported strategic decision-making about follow-on activities.

I’ve been beating the drum on Markdown as the memory layer for agents. The attackers did the same thing. SecureVibes also uses this pattern: each sub-agent writes a markdown file, and the next sub-agent consumes it. Claude Code handles the orchestration under the hood, and the flow is surprisingly robust.

Early on, I explored graph DBs and RAG DBs as the memory layer. They’re powerful but often overkill for this kind of workflow. For many use cases, you get 80–90% of the value by just letting agents read/write structured markdown files:

Models “like” markdown
Files are human-readable
You can diff and audit them easily

An important limitation emerged during investigation: Claude frequently overstated findings and occasionally fabricated data during autonomous operations, claiming to have obtained credentials that didn’t work or identifying critical discoveries that proved to be publicly available information. This AI hallucination in offensive security contexts presented challenges for the actor’s operational effectiveness, requiring careful validation of all claimed results. This remains an obstacle to fully autonomous cyberattacks.

Some folks are reading this as Anthropic contradicting its own claims. On one end, Anthropic claims bad actors used Claude to conduct a multi-entity campaign fully orchestrated end to end, running over multiple days. And, on the other end, they also claim that Claude fabricated details. I get it. It is definitely contradicting and confusing.

But, if you read between the lines, the point these folks are missing or potentially overlooking is this particular part - “requiring careful validation of all claimed results“. If you look at all the CRS built for AIxCC, all teams have a “verifier” component in their systems. This is precisely to address the hallucinations or the fabrications that AI agents are prone to making. This is the very innate nature of non-deterministic AI systems. Anthropic is not contradicting its claims. It is simply stating the facts.

To me, the even more subtle takeaway here is that - if the bad actors got so far even with all the fabrication, just imagine how much more damage they can do if they build better verifiers to tackle the AI hallucinations. The race is on!

And, if you are still not convinced with this explanation, check the latest tweet from the OG Andrej Karpathy.

Summary

AI agents aren’t a demo anymore. They’re already running multi-day campaigns, chaining tools, and making decisions that used to require a room full of operators.

They’re dangerous if left unchecked, but useless if over-restricted. The real game is giving them agency inside tight, well-designed constraints: sandboxed environments, clear skills, human-in-the-loop checkpoints.

MCP is 100x’ing this on both sides. It makes it trivial for good and bad actors to wire agents into real systems and achieve security outcomes we haven’t seen at this scale before.

If attackers have AI-augmented teams and you don’t, you’re already behind. Time to wake up, my fellow defenders.

Appendix / Resources

If you want to go deeper into how I’ve been building AI-native security systems and learning more about this space, these might help:

SecureVibes architecture walkthrough – How SecureVibes works end-to-end and how I use the Claude Agent SDK to orchestrate multiple agents.

Vibecoding a DAST sub-agent – How I vibecoded a DAST-like sub-agent in SecureVibes and armed it with skills to test for authorization vulnerabilities.

AI & Security talk at 10x Genomics – A broader look at the AI + security landscape and how I’m thinking about it with real-world examples.

If this was useful, share it with someone on your security team and leave a comment with what you’re experimenting with on the defensive side to tackle AI offense. Also, don’t forget to like, follow and subscribe to my newsletter - AI Security Engineer so that you don’t miss the next edition!

Running SecureVibes on SecureVibes - Results & What’s Next (Part 3/3)

Anshuman Bhartiya — Tue, 14 Oct 2025 15:15:00 GMT

Testing a Security Scanner by Scanning Itself

The best way to test a security scanner? Run it on its own codebase.

I built SecureVibes to find vulnerabilities in vibecoded applications. But SecureVibes itself is vibecoded—I didn’t write a single line of code myself. I used AI agents to build an AI agent system.

This meta experiment would answer two questions:

Does the multi-agent approach actually work?
How does it compare to traditional tools and single-agent systems?

I figured this was the perfect test case. I am familiar with what the system is supposed to be doing. Even though I vibecoded the entire thing, I am aware of the design decisions I made. I used AI as a companion and guided it to build this thing but I have no idea if its secure or not. This is exactly the problem I wanted to address in the first place.

The Experiment Design

I ran SecureVibes on itself using three different Claude models:

Haiku (fast/cheap)
Sonnet (balanced) - Also, ran it twice with Sonnet to see the variance in the results because of the non-deterministic nature of SecureVibes
Opus (premium)

Then I compared results against:

Traditional SAST: Semgrep, Bandit
Single-agent systems: Claude Code, Codex, Droid
Custom Droid with security focus

All detailed reports are available at github.com/anshumanbh/securevibes/docs/example-reports.

Here’s what I found…

Results: Model Comparison

Haiku vs Sonnet vs Opus

Sonnet wins hands down. Not just subjectively, but objectively:

Model Vulnerabilities Found Cost Value Score Haiku 2 $0.15 Poor Sonnet 17 $3.44 Best Opus 12 $7.64 Good

Sonnet found 17 vulnerabilities at $3.44, while Opus found only 12 at $7.64. Haiku’s $0.15 price tag is tempting, but catching only 2 issues means you’re flying blind.

The sweet spot for security scanning isn’t the cheapest or most expensive model—it’s the one that balances depth of analysis with practical cost constraints. Sonnet proves that the middle path can outperform the premium option. As to why Opus didn’t do well, I am curious about that as well. I don’t have a good answer unfortunately.

Multiple Runs of Sonnet

I ran Sonnet twice to see if results were consistent. About 12-13 vulnerabilities appeared in both reports (core issues like API keys, path traversal, JSON validation). But each run found 4-5 unique issues:

Unique to Run 1:

Race conditions in concurrent scans
Symlink traversal enabling infinite loops
Git commit protection warnings
Report authenticity verification

Unique to Run 2:

Prompt injection defense gaps
Model downgrade attacks via env vars
Hardcoded credentials exposure flow
Tool parameter validation

The union of both runs found ~21 distinct issue types.

This reveals a powerful insight: running the same scanner multiple times might actually increase coverage. For critical codebases, consider 2-3 runs despite added cost. The probabilistic nature of LLMs means different runs can catch different issues.

Results: SecureVibes vs Everything Else

vs Traditional SAST

I ran two popular open-source SAST tools:

Semgrep - 0 findings
Bandit - 0 findings

Why zero findings? These tools look for syntactic patterns. They can’t detect architectural issues like “CLI bypass via symlink attack” or “insufficient permission validation in file operations”—exactly what SecureVibes found.

This is unfortunately the state of current open source code security scanners. They’re excellent at finding known patterns but terrible at understanding context.

vs Single-Agent Systems

Prompt - “perform a security review of the current codebase”

I ran the same security review task using coding agents without specialized multi-agent workflows:

System with Model Vulnerabilities Found Claude Code with Sonnet 4.5 9 Codex with GPT-5-codex 4 Droid with GLM 4.6 7 SecureVibes with Sonnet 16

SecureVibes crushed the coding agents in their default setting:

78% more issues than Claude Code (16 vs 9)
4x more issues than Codex (16 vs 4)
2.3x more issues than Droid (16 vs 7)

Why the difference? Single-agent systems lack structured workflow. They scan linearly. SecureVibes builds context (Phase 1), hypothesizes (Phase 2), then validates (Phase 3). This progressive refinement mirrors how human security teams work.

vs Custom Security Droid

I also set up a custom droid specifically for security audits and ran it with Sonnet 4.5. The report is here.

Prompt - “security-audit: Review entire codebase for vulnerabilities”

Results: 23 vulnerabilities found

4 Critical (vs SecureVibes: 2-4)
9 High (vs SecureVibes: 6)
7 Medium (vs SecureVibes: 6-9)
3 Low (vs SecureVibes: 0)

The Custom Droid found 35-44% more vulnerabilities than SecureVibes using the same model. This taught me what I call “learning the bitter lesson”: Using the same model sonnet 4.5, the output from the custom Droid is actually pretty good as compared to the one with SecureVibes.

What this means: All the work I did over the past few days building a custom multi-agent system essentially got matched by a feature Factory released in their coding agent. If you’re using Claude Code, I believe the same outcome can be achieved by building your own suite of Claude Code subagents—very much like what I did with SecureVibes, but you’d have to know what you’re doing.

The quality difference: The Custom Droid found several unique vulnerabilities SecureVibes missed:

More granular categorization (Low severity tier)
Additional timeout and rate limiting issues
More comprehensive error handling gaps
Better detection of compliance-related issues (GDPR, SOC 2)

But this isn’t defeat—it’s validation. The multi-agent approach works so well that platforms are building it in as native features. There are still plenty of opportunities here. This is just the first iteration of SecureVibes and I believe I can definitely improve the results and get it at par with the custom droid results:

Domain expertise matters: Continue improving agents with security-specific knowledge
Privacy-first options: Build versions that work with local models to preserve IP
Accessibility: Non-technical users still need a UI, not command-line tools
SDLC integration: Build custom droids/agents for different security gates (PR review, pre-commit, pre-deploy)

Key Learnings

Filesystem Threat Boundary

Most vulnerabilities were CLI ↔ filesystem interactions. It makes sense—that’s the product so AI understands the threat model and the boundaries really well between the CLI program and the host machine’s file system.

Multi-agent > Single-agent

This was the biggest validation. The multi-agent approach consistently outperformed single-agent attempts. The progressive refinement (context → threats → validation) mirrors how human security teams work, and it shows in the quality of results.

The Claude Agent SDK is a game changer for building multi-agent systems. It handles orchestration, so you can focus on designing the workflow and prompts.

File-based Communication is Underrated

Early versions used in-memory state passing between agents. It was a nightmare to debug when something went wrong.

Switching to file-based communication (.md and .json files) made the system so much easier to understand, debug, and extend. I can inspect any phase’s output, replay phases, and even manually edit artifacts to test edge cases. Markdown surprisingly works great for both humans and machines.

Real-time Progress Streaming is Essential

Initially, SecureVibes used filesystem polling to detect phase completions. During 10-15 minute scans, users would see progress updates only every 30-60 seconds, leading to “is it frozen?” moments.

I rebuilt it using the Claude SDK’s hooks system (PreToolUse, PostToolUse, SubagentStop) for event-driven streaming. Now users see exactly what each agent is doing in real-time—which files it’s reading, what patterns it’s searching for. This dramatically improved UX.

STRIDE is Still Relevant

I was skeptical about using a traditional threat modeling framework (STRIDE) in an AI-driven system. But it turned out to be perfect.

It gives the Threat Modeling Agent a structured way to think about threats, ensuring comprehensive coverage across all categories. Without STRIDE, the agent would often focus too heavily on one vulnerability class (usually injection attacks) and ignore authorization or audit issues.

False Positives are the Enemy

Traditional SAST tools have terrible false positive rates. By using the three-phase approach where Phase 3 validates threats with concrete evidence, SecureVibes’ false positive rate is dramatically lower.

The agent must provide the exact line number, code snippet, and explanation of exploitability. This forces it to actually confirm the vulnerability exists rather than flagging suspicious-looking patterns.

Claude SDK Orchestration is Magical

I initially built a custom orchestrator agent to coordinate the workflow. Then I realized the SDK itself handles orchestration—you just define agents and Claude figures out when to invoke them.

This cut hundreds of lines of coordination code and made the system more reliable. The SDK handles error recovery, retries, and state management automatically.

AI Coding Agents Accelerate Development

I used Factory’s Droid and Claude Sonnet 4.5 for this project. I first used Claude Code along with Github MCP and Anthropic documentation to create a comprehensive guide on the Claude Agent SDK. You can find that here.

Then I had Droid reference that guide to build features. The combination of context-aware coding agents and good documentation dramatically sped up development.

NOTE: I can’t recommend Factory’s Droid enough. It is a game changer. There have been multiple instances where Claude Code, Codex and Cursor just failed to deliver and Droid was able to one-shot it. If you want to try it out, here is a referral code worth $40 credits - https://app.factory.ai/r/Z2B374AY. I promise you will not be disappointed!

Iterative refinement is key

Text extraction from different agent outputs (especially markdown), JSON parsing, and prompt engineering all required multiple iterations. The first version of any prompt never works perfectly. I learned to build in instrumentation early (debug modes, verbose logging) to understand what’s actually happening.

Build First, Optimize Later

The current system is expensive. If you are on a Claude subscription plan, you don’t have to worry about this too much but if you don’t have one and want to just pay as you go for the API requests, the costs can rack up really fast, especially if you run periodic scans on entire codebases. My focus for the first iteration wasn’t on building a cost effective system. Now, that I know it works - I will continue to find ways in order to make this cheaper to run.

What’s Next: Building in Public

This is just the beginning. I’m committed to building this in public and inviting the community to join me on this journey. Here are some items on my wishlist:

1. Dashboard

Right now, SecureVibes outputs results to the terminal and in different file formats - JSON and Markdown. I want to build a web dashboard that provides:

Visual trend analysis (are vulnerabilities increasing or decreasing over time?)
Vulnerability timeline and history
Team collaboration features (assign findings, track remediation)
Integration with issue trackers (Jira, GitHub Issues, Linear)
Comparison between scans (what changed?)

2. Fixer Sub-Agent

Finding vulnerabilities is great, but fixing them is where the real value is. I want to build a Fixer Agent that:

Takes a vulnerability from VULNERABILITIES.json
Reads the vulnerable code in context
Generates a patch that fixes the issue
Explains what it changed and why
Creates a PR with the fix (optional)

This is tricky because the fix needs to actually work (not break functionality), preserve the original intent of the code, and consider the broader codebase context.

3. Evaluation Framework

The hardest problem in AI security tools: how do you know if it’s actually working?

I want to build a comprehensive evaluation framework:

Benchmark datasets - Known vulnerable applications (WebGoat, pygoat, NodeGoat, etc.)
Ground truth - Manually verified vulnerability sets for each benchmark
Metrics - Precision, recall, F1 score for each vulnerability class
Regression testing - Ensure updates don’t decrease detection quality
Comparison - How does SecureVibes compare to Semgrep, Snyk, etc.?

This is crucial for validating that improvements actually improve detection, building trust with users, identifying weak spots in detection, and benchmarking against other tools.

4. Context Engineering

via MCP

Claude Agents SDK has MCP support. This is really exciting because what this allows is for the subagents to bring in context from other services/systems.

This is essentially how AI native systems can be made smarter, more efficient and accurate. For example, if an app has an existing threat model saved in Jira, we could use the SDK to fetch that and use it with the subagents. The possibilities are endless!

Compacting / Pre and Post Processing

Right now, all agents get access to the entire repository. But for large codebases (10k+ files), this is inefficient and expensive. I want to build a Context Engineer that:

Analyzes the repository structure
Identifies high-risk files (auth, API endpoints, DB queries, file handling)
Creates a “security-relevant file subset”
Passes only this subset to downstream agents

This would dramatically reduce token usage and costs for large repositories, while focusing analysis on the code that actually matters from a security perspective.

5. Make SecureVibes work with other models

Currently, since I am using Claude Agent SDK, this will work with Anthropic’s models only. And, its not cheap by any means. A full comprehensive scan of a somewhat medium codebase can cost anywhere between $2-$5. Being able to use local models to achieve similar results will unlock a lot of new opportunities for this system to be used in regulated industries, where sending proprietary code (IP) information to frontier model companies is prohibited. Not to mention, this will also help with cost savings.

6. Make SecureVibes into a Web Cyber Reasoning System (Web CRS)

Inspired by the AIxCC CRS (Cyber Reasoning Systems), I’d really like to emulate how those systems are designed with multiple layers of validation. If we can build such a CRS encapsulating the current SAST capabilities, along with DAST capabilities (in particular for web applications), I’d consider that a huge win! Imagine:

finding vulnerabilities via source code analysis -> validating them via dynamic analysis -> proposing a fix -> validating the fix works.

7. Make SecureVibes self-improving

The current scan results are good but I don’t necessarily agree with all the severities. I also don’t want to fix a few of these yet because its a CLI tool at the end of the day that I am going to be running locally on my machine. But, they might definitely manifest into something bigger in the long term so I want to triage all of these manually and provide a justification as to what I think about them. I would then like SecureVibes to update its threat model and keep my preferences in mind so that it becomes smarter with every feedback I provide it.

How You Can Contribute

This is an open source project, and contributions are welcome! Here are ways you can help:

🔧 Contribute Code

Areas where help is especially welcome:

Improving prompts for specific vulnerability classes
Building the dashboard
Creating benchmark datasets
Everything mentioned in the wishlist above

🎤 Spread the Word

If you find SecureVibes useful, share it! Tweet about it, write about it, present it at meetups. The more people use it, the better it gets.

⭐ Star the Repo

GitHub stars help with visibility. If you think this project is interesting, give it a star!

Conclusion

Building SecureVibes has been one of the most rewarding projects I’ve worked on. It combines my passion for security with the exciting possibilities of AI agents. The multi-agent architecture proved that we can build AI security tools that are not just “smart pattern matchers” but systems that reason about security the way human experts do.

We’re at an inflection point with AI and security. LLMs are finally capable enough to handle complex security reasoning, but we’re still figuring out the right architectures and workflows. Context is key! I believe multi-agent systems like SecureVibes are the future—not because they’re trendy, but because they work.

The vibecoding era has democratized software development—anyone can build an app with AI assistance. But with that democratization comes risk. Many vibecoded applications are built by developers who aren’t familiar with security best practices, using unfamiliar tech stacks, and shipping to production quickly. SecureVibes aims to make security accessible to these developers, providing professional-grade vulnerability detection without requiring security expertise.

Try SecureVibes on your codebase today. Open an issue if you find bugs. Submit a PR if you have ideas. Let’s build the future of AI-native security together.

Follow Along

I’ll post about new features, challenges I’m facing, design decisions, and lessons learned. If you’re interested in AI agents, security tooling, or building in public, follow along!

LinkedIn: @anshumanbhartiya
GitHub: securevibes repository
Blog: anshumanbhartiya.com
Discord: https://discord.gg/9cYqTBdC9h

“I don’t know where I am going, but I know how to get there” - Boyd Varty