<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[The BoringAppSec Community]]></title><description><![CDATA[Anshuman & Sandesh on Security. 2 blogs. 1 Podcast. 1 Slack community.]]></description><link>https://www.boringappsec.com</link><image><url>https://substackcdn.com/image/fetch/$s_!O8_X!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8b671d9-6e9a-4835-b291-ee70fd4e9f74_1280x1280.png</url><title>The BoringAppSec Community</title><link>https://www.boringappsec.com</link></image><generator>Substack</generator><lastBuildDate>Tue, 14 Apr 2026 17:07:34 GMT</lastBuildDate><atom:link href="https://www.boringappsec.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Sandesh Mysore Anand]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[boringappsec@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[boringappsec@substack.com]]></itunes:email><itunes:name><![CDATA[Sandesh Mysore Anand]]></itunes:name></itunes:owner><itunes:author><![CDATA[Sandesh Mysore Anand]]></itunes:author><googleplay:owner><![CDATA[boringappsec@substack.com]]></googleplay:owner><googleplay:email><![CDATA[boringappsec@substack.com]]></googleplay:email><googleplay:author><![CDATA[Sandesh Mysore Anand]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[Edition 33 - The role of AppSec engineers is moving from being carpenters to gardeners]]></title><description><![CDATA[I don't think "AppSec is dead", but the role of AppSec engineers is certainly changing]]></description><link>https://www.boringappsec.com/p/edition-33-the-role-of-appsec-engineers</link><guid isPermaLink="false">https://www.boringappsec.com/p/edition-33-the-role-of-appsec-engineers</guid><dc:creator><![CDATA[Sandesh Mysore Anand]]></dc:creator><pubDate>Tue, 17 Mar 2026 16:56:20 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!5X0K!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71134cfa-d547-464b-814c-0a1c720b7a14_1467x809.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!5X0K!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71134cfa-d547-464b-814c-0a1c720b7a14_1467x809.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5X0K!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71134cfa-d547-464b-814c-0a1c720b7a14_1467x809.png 424w, https://substackcdn.com/image/fetch/$s_!5X0K!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71134cfa-d547-464b-814c-0a1c720b7a14_1467x809.png 848w, https://substackcdn.com/image/fetch/$s_!5X0K!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71134cfa-d547-464b-814c-0a1c720b7a14_1467x809.png 1272w, https://substackcdn.com/image/fetch/$s_!5X0K!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71134cfa-d547-464b-814c-0a1c720b7a14_1467x809.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5X0K!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71134cfa-d547-464b-814c-0a1c720b7a14_1467x809.png" width="1467" height="809" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/71134cfa-d547-464b-814c-0a1c720b7a14_1467x809.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:809,&quot;width&quot;:1467,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1898461,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.boringappsec.com/i/191271168?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d94a629-dcde-4428-bbc2-6143e360a630_1536x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!5X0K!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71134cfa-d547-464b-814c-0a1c720b7a14_1467x809.png 424w, https://substackcdn.com/image/fetch/$s_!5X0K!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71134cfa-d547-464b-814c-0a1c720b7a14_1467x809.png 848w, https://substackcdn.com/image/fetch/$s_!5X0K!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71134cfa-d547-464b-814c-0a1c720b7a14_1467x809.png 1272w, https://substackcdn.com/image/fetch/$s_!5X0K!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71134cfa-d547-464b-814c-0a1c720b7a14_1467x809.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.boringappsec.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.boringappsec.com/subscribe?"><span>Subscribe now</span></a></p><p>Tis the season of existential dread. Everyone in tech is wondering if their job will exist in the next few years. If AI can write all the code, do we need developers? If AI can write Terraform and deploy, do we need DevOps? If AI can write this blog post, do we really need authors and so on? </p><p>If you lead a team, this dread compounds outside your immediate role, too. Should I hire experienced folks who can tell the AI what to do? Should I hire smart folks with no experience, as they have &#8220;nothing to unlearn&#8221;, and so on? </p><p>In my recent conversations, this dread has reached the AppSec team too. Every 3rd day, you&#8217;ll see a launch that says you can automate something you did manually. SAST became SAST+AI (SAST tools with AI features for triage), then became AI-powered SAST (SAST that uses AI to discover business-logic findings), and finally became a button in Claude  (eliminating SAST as a step in the SDLC). While the current state of these tools is debatable (I&#8217;ve written about this  <a href="https://www.boringappsec.com/p/edition-32-bigco-is-building-in-appsec">here</a>), the direction is clear. Much of what constitutes a &#8220;security assessment&#8221; will be automated by AI agents. We don&#8217;t yet know who will do it (existing security companies, foundation model companies, or new startups), but it&#8217;s gonna happen!</p><div><hr></div><p>I&#8217;ve seen this play out within <a href="https://seezo.io/">Seezo</a>, too. What started as an experiment to automate parts of Security Design Review has now reached a point where most of the heavy lifting is done by the product. Humans are still involved in reviewing results, but their role diminishes with each new model drop and platform improvement. </p><p>If it&#8217;s inevitable that AI agents will do most of the security assessment work (scanning, triaging, and communicating), then what&#8217;s the role of the AppSec engineer? Do we even need an AppSec team? </p><p>With my own experience using AI as an end user and building an AI-powered product, it&#8217;s clear to me that the AppSec team will remain. But their role will change. </p><h2>From Carpenter to Gardener</h2><p>When Pooja (my partner) and I were expecting our daughter, we turned into one of those nervous-to-be parents who wanted to read everything about parenting. We were surrounded by books, subscribed to parenting newsletters, and so on. We were the &#8220;research&#8221; parents for a while (a story for another day, but that phase ended, and we switched to an &#8220;instinct-led approach&#8221; pretty soon). In this phase, one framework strongly influenced our thinking, and we have tried to apply it to this day. The <a href="https://www.npr.org/sections/goatsandsoda/2018/05/28/614386847/what-kind-of-parent-are-you-carpenter-or-gardener">framework by Alison Gopnik</a> suggests that parenting is more about being a gardener than a carpenter. </p><p>Carpenters take a block of wood and &#8220;make&#8221; a chair out of it. Every little detail is handled by the carpenter. Gardeners are different. They water the plants, provide fertilizers, and ward off weeds, but they &#8220;let&#8221; the plants grow. The book (and the many articles by the author) emphasized this approach. </p><p>Merits of the parenting framework aside (you could argue both sides of which approach is better), when I think about how AppSec is changing, I feel like we have been moving away from carpentry to gardening for a while now, and AI accelerates that trend significantly. </p><p>We have gone from &#8220;doing the security assessment&#8221; to &#8220;taking the tool&#8217;s help to do the assessment&#8221;, to &#8220;configuring the tool that does the assessment and then triage results&#8221;. The next stage is simple. The entire assessment will be done end-to-end by AI agents: configuring, scanning, triaging, and communicating. </p><p>But it&#8217;s clear to me (building an AI product and using AI extensively as a daily driver), that the quality of results from AI agents depends on the quality of the agent, the quality of the underlying foundational model *and* the context provided to the agent. The 3rd part is not something you can buy from a SaaS tool. AppSec teams have to build this themselves. </p><h2> What does &#8220;gardening&#8221; in AppSec look like?</h2><p>To break it down, even in the optimistic scenario of AppSec Agents being amazing at security assessments, there will be 3  things AppSec engineers will still have to do: </p><p>1. <strong>Define the workflow:</strong> When should SAST run? Who should receive the results? When should a human review results? What should trigger a pipeline block? These are questions your AI agents cannot answer, cos there is no &#8220;right&#8221; answer and the correct thing to do depends on your org&#8217;s security and technology culture. Depending on which product/BU/team you are working with, you may even need different workflows for different teams. While you may have tooling to orchestrate your AppSec agents, defining and tweaking the workflow will still be the AppSec team&#8217;s job. In some cases, you may outsource this to the dev team (e.g., Via Security Champions), but AppSec teams still need to own this. </p><p>2.&nbsp;<strong>Supplying context:</strong>&nbsp;This will probably be the most time-consuming and hardest to define aspect of an AppSec team&#8217;s job. It&#8217;s clear to me that the better context you provide an agent, the better results it provides. So, what information do you need to supply to your API Security Agent so it actually knows your rate-limiting requirements for internal APIs? What are the secure-by-default patterns that a Security Design Review tool should recommend? This problem is harder than it meets the eye because context does not lie in one place. It&#8217;s spread across &#8220;sources of truth&#8221; (such as code and deployments) and &#8220;sources of intent&#8221; (security standards document, PRDs, etc.). Depending on how your company operates, AppSec teams need to provide the right context to the right agents to extract the best values. Provide too much context, and you fill up the context window with junk. Provide too little and your AppSec agents give you generic crap. </p><p>3. <strong>Be the human in the loop and treat each instance of it as an agent failure:</strong> For the foreseeable future, AI agents running these assessments will still need human help. They will need to validate some results and require human review for certain kinds of changes. Hopefully, over time, the percentage of items that need human review goes down. Until then, we will need AppSec engineers to review the results, add more context, and decide what to do with the output. I think a useful frame for looking at this is to treat each human-in-the-loop interaction as a failure on the agent's part. In addition to resolving whatever needs to be resolved, the human should also &#8220;teach&#8221; the agent how to handle similar situations in the future. This could mean persisting information in a context file (e.g., Claude.md), writing a skill/sub-agent to handle a particular type of scenario, and so on. A good measure of an Agent's success would be the accuracy of its results and how often humans needed to be involved. </p><p><em>Note: 2 &amp; 3 are somewhat related. While &#8220;context&#8221; may be something we add before an assessment starts, &#8220;committing things to memory&#8221; is also important in response to how Agents react. If a false positive recurs across different agent runs, it&#8217;s important to commit to memory why it is a false positive and how the agent can handle it better. In a way, these are  3 distinct activities, but also a loop that feeds into each other and improves over time.</em> </p><h2>This is a big change</h2><p>If an AppSec engineer slipped into a coma in 2015 and woke up to *this* reality, they&#8217;d be unable to recognize the role. This change will not be easy to make for everyone. What&#8217;s worse, there isn&#8217;t enough tooling built to support these behaviors. Security vendors have spent decades figuring out the best UX for triaging results (and we haven&#8217;t perfected it), but no one knows what the best UX for &#8220;providing context&#8221; is. Defining Security Standards and Security Workflows used to be something you did once a year. Now things have to happen very quickly. This change will bring collateral damage. Depending on the organizational context, some companies may have already made this change, while others may take many years to do so. If you are taking on a new role in AppSec, I&#8217;d urge you to understand where on the spectrum of this change the team lies and if that is a good fit for you. To be clear, I don&#8217;t think of this change as a simple &#8220;maturity curve&#8221;. It&#8217;s not necessary that teams that haven&#8217;t adapted this are less mature (although that&#8217;s one possible explanation); it may also be an indication of how software is built in the company, what industry the company belongs to (some industries will take longer to undergo an AI transformation, and rightly so). </p><h2>Where are you on the Spectrum?</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!v7MF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a97dfeb-76b1-428a-99a8-ef892dd2ffe6_3274x1312.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!v7MF!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a97dfeb-76b1-428a-99a8-ef892dd2ffe6_3274x1312.png 424w, https://substackcdn.com/image/fetch/$s_!v7MF!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a97dfeb-76b1-428a-99a8-ef892dd2ffe6_3274x1312.png 848w, https://substackcdn.com/image/fetch/$s_!v7MF!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a97dfeb-76b1-428a-99a8-ef892dd2ffe6_3274x1312.png 1272w, https://substackcdn.com/image/fetch/$s_!v7MF!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a97dfeb-76b1-428a-99a8-ef892dd2ffe6_3274x1312.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!v7MF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a97dfeb-76b1-428a-99a8-ef892dd2ffe6_3274x1312.png" width="1456" height="583" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1a97dfeb-76b1-428a-99a8-ef892dd2ffe6_3274x1312.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:583,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:5660801,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.boringappsec.com/i/191271168?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a97dfeb-76b1-428a-99a8-ef892dd2ffe6_3274x1312.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!v7MF!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a97dfeb-76b1-428a-99a8-ef892dd2ffe6_3274x1312.png 424w, https://substackcdn.com/image/fetch/$s_!v7MF!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a97dfeb-76b1-428a-99a8-ef892dd2ffe6_3274x1312.png 848w, https://substackcdn.com/image/fetch/$s_!v7MF!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a97dfeb-76b1-428a-99a8-ef892dd2ffe6_3274x1312.png 1272w, https://substackcdn.com/image/fetch/$s_!v7MF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a97dfeb-76b1-428a-99a8-ef892dd2ffe6_3274x1312.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Image presented at an internal Seezo meeting to emphasize where we lie on the &#8220;AI Spectrum &#8220;. Your exact position does not matter, but it needs to align with your organization.</figcaption></figure></div><p>In an internal meeting at Seezo, I half-joked that we need to be all on the same range of the &#8220;AI adoption spectrum&#8221; (see below). Irrespective of where you lie on the spectrum, it&#8217;s important to work with a team that is adjacent to your position. If you are an AI Skeptic in an AI-techbro team, you are gonna struggle. If you are cautiously optimistic about AI, but your company won&#8217;t use it until the &#8220;technology is mature&#8221;, you are gonna be frustrated. </p><div><hr></div><p>That&#8217;s it for today. Does the Carpenter v/s Gardener analogy land, or am I being crazy by mapping AI to the one book I read many years ago? Are there other frameworks that help you navigate this crazy change? Hit me up! You can drop me a message on <a href="https://twitter.com/JubbaOnJeans">Twitter </a>(or whatever it is called these days), <a href="https://www.linkedin.com/in/anandsandesh/">LinkedIn</a>, or <a href="mailto:sandesh@seezo.io">email</a>. I am also the co-founder of <a href="https://seezo.io/">Seezo</a>. We help companies automate security design reviews at scale. Check us out if that&#8217;s your thing :) If you find this newsletter useful, share it with a friend, colleague, or on social media.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.boringappsec.com/p/edition-33-the-role-of-appsec-engineers?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.boringappsec.com/p/edition-33-the-role-of-appsec-engineers?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p>]]></content:encoded></item><item><title><![CDATA[VulnVibes: Building an AI Agent That Reasons Across Microservices to Find Real Vulnerabilities]]></title><description><![CDATA[A prototype AI agent that validates real AppSec issues across repositories, infrastructure, and microservice boundaries.]]></description><link>https://www.boringappsec.com/p/vulnvibes-building-an-ai-agent-that</link><guid isPermaLink="false">https://www.boringappsec.com/p/vulnvibes-building-an-ai-agent-that</guid><dc:creator><![CDATA[Anshuman Bhartiya]]></dc:creator><pubDate>Mon, 16 Mar 2026 22:25:15 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!T0Wh!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F271820b3-5cd1-483f-8dae-a5cb6873c22a_2752x1536.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!T0Wh!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F271820b3-5cd1-483f-8dae-a5cb6873c22a_2752x1536.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!T0Wh!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F271820b3-5cd1-483f-8dae-a5cb6873c22a_2752x1536.png 424w, https://substackcdn.com/image/fetch/$s_!T0Wh!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F271820b3-5cd1-483f-8dae-a5cb6873c22a_2752x1536.png 848w, https://substackcdn.com/image/fetch/$s_!T0Wh!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F271820b3-5cd1-483f-8dae-a5cb6873c22a_2752x1536.png 1272w, https://substackcdn.com/image/fetch/$s_!T0Wh!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F271820b3-5cd1-483f-8dae-a5cb6873c22a_2752x1536.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!T0Wh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F271820b3-5cd1-483f-8dae-a5cb6873c22a_2752x1536.png" width="1456" height="813" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/271820b3-5cd1-483f-8dae-a5cb6873c22a_2752x1536.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:813,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;vulnvibes&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="vulnvibes" title="vulnvibes" srcset="https://substackcdn.com/image/fetch/$s_!T0Wh!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F271820b3-5cd1-483f-8dae-a5cb6873c22a_2752x1536.png 424w, https://substackcdn.com/image/fetch/$s_!T0Wh!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F271820b3-5cd1-483f-8dae-a5cb6873c22a_2752x1536.png 848w, https://substackcdn.com/image/fetch/$s_!T0Wh!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F271820b3-5cd1-483f-8dae-a5cb6873c22a_2752x1536.png 1272w, https://substackcdn.com/image/fetch/$s_!T0Wh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F271820b3-5cd1-483f-8dae-a5cb6873c22a_2752x1536.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><blockquote><p>Disclaimer: This is a cross post from my tech <a href="https://www.anshuman.ai/posts/vulnvibes-intro">blog</a>, co-authored by my personal AI assistant <a href="https://www.anshuman.ai/posts/my-clawdbot-setup">Sage</a>.</p></blockquote><h1>Background Context</h1><p>Picture this: you&#8217;re reviewing a pull request. A developer on your team has added a new API endpoint that fetches content from a URL the user provides. There&#8217;s even a role check &#8212; only admins can use it. Looks reasonable, right?</p><p>But here&#8217;s the thing &#8212; that endpoint runs inside a Docker container, on the same network as your authentication service, your database, and your internal admin tools. An attacker who gets admin access could point that URL at <code>http://auth-service:3001/ </code>and read your internal service responses. Or hit the cloud metadata endpoint at <code>169.254.169.254</code> and grab your AWS credentials.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.boringappsec.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The BoringAppSec Community! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>The code review in the PR diff looks fine. The vulnerability is invisible unless you <em>also</em> check the Docker Compose file in your infra repo, the nginx config in your gateway repo, and the network topology that ties everything together.</p><p>This is the fundamental problem with how we do security reviews in microservice architectures: <strong>the vulnerability lives in the gaps between services, not in any single repo.</strong></p><p>I&#8217;ve seen this problem time and again and haven&#8217;t come across a single SAST (AI native or traditional) platform that can reliably help tackle this at scale. If there is any, please let me know!</p><p>So, I&#8217;ve been tinkering with a few different approaches. And, I have a prototype agent built that is not production ready by any means but good enough to demonstrate the problem (with a lab I&#8217;ve built) and a high level solution of how it can potentially be solved.</p><p>If this piques your interest, continue reading.</p><div><hr></div><h1>Introducing VulnVibes</h1><p>Everything is a &#8220;vibe&#8221; these days. So, keeping with the <a href="https://securevibes.ai/">SecureVibes</a> naming theme, I&#8217;m calling this one VulnVibes &#128516;</p><p><a href="https://github.com/anshumanbh/vulnvibes">VulnVibes</a> is an AI-powered agent that analyzes pull requests for security vulnerabilities &#8212; and its superpower is that it doesn&#8217;t just look at the PR&#8217;s repo. It searches across your entire GitHub organization to understand your architecture, verify what security controls actually exist, and determine whether a suspicious code change is a real vulnerability or a false alarm.</p><p>For a more technical explanation, VulnVibes is an open-source CLI that takes a GitHub PR URL, threat models the changes, and then validates each threat by investigating code across multiple repos in your org. I demo it below against three real PRs and show it catching an SSRF vulnerability, making a nuanced judgment call on a CORS change, and correctly ignoring a safe code refactor.</p><p>It&#8217;s a concept tool &#8212; fully vibecoded &#8212; but it demonstrates something important: <strong>AI agents can reason across repository boundaries the way a human security engineer does.</strong></p><div><hr></div><h1>Why Current Tools Don&#8217;t Work for Microservices</h1><p>Let&#8217;s say your company runs a typical microservice architecture &#8212; an auth service, an API backend, a frontend app, and some infrastructure configs. Four repos, four teams, one system.</p><p>Now a developer opens a PR in the API repo. They&#8217;re adding a new feature. A traditional SAST scanner (think Semgrep, CodeQL, or similar) will analyze that PR against the code in <em>that repo only</em>. If the code looks suspicious &#8212; say, it makes an HTTP request using user input &#8212; the scanner flags it.</p><p>But is it actually exploitable? To answer that, you need context that lives <em>outside</em> that repo:</p><ul><li><p><strong>Is there a Web Application Firewall (WAF)</strong> that inspects request bodies before they reach the API? (Check the infra repo.)</p></li><li><p><strong>Are internal services reachable</strong> from this container? (Check the Docker Compose file.)</p></li><li><p><strong>Does the API gateway add any security headers</strong> or validate tokens? (Check the nginx config.)</p></li><li><p><strong>What sensitive data exists</strong> on those internal services? (Check the auth service repo.)</p></li></ul><p>Not a lot of SAST tools &#8212; even the AI-powered ones &#8212; can answer these questions. They operate within repository boundaries. They literally <em>can&#8217;t see</em> the other repos.</p><p>What does a human security engineer do? They open four browser tabs, read the nginx config, trace the request flow, check Docker networking, and mentally piece together whether the attack path is viable. It takes 30 minutes to an hour for a single PR, if they&#8217;re thorough.</p><p><strong>VulnVibes automates that entire process.</strong> It has access to your GitHub org, can read files from any repo, search for patterns across the codebase, and reason about whether a threat is real based on the full architectural context.</p><div><hr></div><h1>How VulnVibes Works</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!lK8n!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c218431-a1d2-4a11-8656-0416dca3f63c_2752x1536.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!lK8n!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c218431-a1d2-4a11-8656-0416dca3f63c_2752x1536.png 424w, https://substackcdn.com/image/fetch/$s_!lK8n!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c218431-a1d2-4a11-8656-0416dca3f63c_2752x1536.png 848w, https://substackcdn.com/image/fetch/$s_!lK8n!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c218431-a1d2-4a11-8656-0416dca3f63c_2752x1536.png 1272w, https://substackcdn.com/image/fetch/$s_!lK8n!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c218431-a1d2-4a11-8656-0416dca3f63c_2752x1536.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!lK8n!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c218431-a1d2-4a11-8656-0416dca3f63c_2752x1536.png" width="1456" height="813" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9c218431-a1d2-4a11-8656-0416dca3f63c_2752x1536.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:813,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:3881097,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.boringappsec.com/i/190983086?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c218431-a1d2-4a11-8656-0416dca3f63c_2752x1536.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!lK8n!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c218431-a1d2-4a11-8656-0416dca3f63c_2752x1536.png 424w, https://substackcdn.com/image/fetch/$s_!lK8n!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c218431-a1d2-4a11-8656-0416dca3f63c_2752x1536.png 848w, https://substackcdn.com/image/fetch/$s_!lK8n!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c218431-a1d2-4a11-8656-0416dca3f63c_2752x1536.png 1272w, https://substackcdn.com/image/fetch/$s_!lK8n!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c218431-a1d2-4a11-8656-0416dca3f63c_2752x1536.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The analysis happens in two stages:</p><h2>Stage 1: Threat Modeling (The Quick Filter)</h2><p>When you point VulnVibes at a PR, the first thing it does is fetch the diff and ask: <em>&#8220;Is there anything security-relevant here?&#8221;</em></p><p>It produces a structured threat model:</p><ul><li><p><strong>What changed?</strong> A plain-English summary of the PR</p></li><li><p><strong>What could go wrong?</strong> Specific threats, each tagged with a <a href="https://cwe.mitre.org/">CWE</a> (a standardized vulnerability classification)</p></li><li><p><strong>What do we need to verify?</strong> A list of investigation questions for each threat</p></li><li><p><strong>Which investigation skills should we use?</strong> Matched to specific testing methodologies (SSRF testing, auth testing, etc.)</p></li></ul><p>If Stage 1 finds nothing security-relevant &#8212; like a pure code refactor &#8212; it stops here. Done in under a minute, minimal cost. No wasted effort.</p><p>If it <em>does</em> find something worth investigating, it moves to Stage 2.</p><h2>Stage 2: Cross-Repo Investigation (The Deep Dive)</h2><p>This is where VulnVibes is different from everything else I&#8217;ve seen.</p><p>For each threat identified in Stage 1, the agent performs a full investigation. It doesn&#8217;t just look at the PR &#8212; it goes hunting across the organization:</p><ol><li><p><strong>Reads the PR code in full context</strong> &#8212; not just the diff, but the entire file and related files</p></li><li><p><strong>Searches across other repos</strong> &#8212; looking for infrastructure configs, middleware, security controls</p></li><li><p><strong>Checks the infrastructure layer</strong> &#8212; Docker networking, nginx configs, deployment files, environment variables</p></li><li><p><strong>Follows a structured methodology</strong> &#8212; each vulnerability type has its own investigation playbook</p></li><li><p><strong>Produces a verdict with a full reasoning chain</strong> &#8212; TRUE_POSITIVE (real vulnerability), FALSE_POSITIVE (looks bad but isn&#8217;t), or NO_SKILL_AVAILABLE (can&#8217;t test this type)</p></li></ol><p>Every verdict comes with a confidence score, a risk level, and a step-by-step explanation of how the agent reached its conclusion. You can read the reasoning and decide whether you agree.</p><div><hr></div><h1>The Demo: Three PRs, Three Outcomes</h1><p>To demonstrate this, I set up a test environment: <a href="https://github.com/microvibes-lab">microvibes-lab</a>, a GitHub organization with four microservices that form a document management system.</p><h3>Services in the lab</h3><ul><li><p><strong>auth-service</strong> (Node.js): Handles login and issues JWT tokens</p></li><li><p><strong>doc-api</strong> (Python/FastAPI): Document storage with role-based access</p></li><li><p><strong>frontend-app</strong> (Next.js): The web UI</p></li><li><p><strong>infra-ops</strong> (Nginx + Docker): API gateway and infrastructure configs</p></li></ul><p>It&#8217;s a realistic setup &#8212; JWT authentication shared between services, nginx routing requests, everything running on a Docker network, role-based access control (admins see everything, staff see only public documents).</p><p>I ran VulnVibes against three different PRs to show three different outcomes. Here&#8217;s a video walkthrough of all three cases:</p><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;fd1ebdf1-6f00-4e11-9b81-3015c3836953&quot;,&quot;duration&quot;:null}"></div><div><hr></div><h2>Case 1: Catching a Real Vulnerability</h2><p><strong>PR:</strong> <a href="https://github.com/microvibes-lab/doc-api/pull/13">doc-api#13 &#8212; Add document import from URL feature</a></p><h3>What the PR Does</h3><p>A developer adds a new endpoint that lets admins import documents from external URLs. Here&#8217;s the code:</p><pre><code><code>@app.post("/documents/import")
def import_document(url: str, user: dict = Depends(get_current_user)):
    if user["role"] != "sys_admin":
        raise HTTPException(status_code=403, detail="Admin access required")
    response = requests.get(url)
    return {"content": response.text, "status_code": response.status_code}
</code></code></pre><p>Nine lines of code. On the surface, it looks reasonable &#8212; there&#8217;s a role check ensuring only admins can use it. A code reviewer might glance at this and move on.</p><h3>What&#8217;s Actually Wrong</h3><p>This is a classic <strong>Server-Side Request Forgery (SSRF)</strong> vulnerability. In plain English: the server is making an HTTP request to <em>whatever URL the user provides</em>. The user says &#8220;fetch this URL,&#8221; and the server obediently does it.</p><p>Why is that dangerous? Because the server can reach things the user can&#8217;t. It&#8217;s sitting inside a Docker network with direct access to internal services. An attacker could tell it to fetch:</p><ul><li><p><code>http://auth-service:3001/health</code> &#8212; to probe internal services</p></li><li><p><code>http://169.254.169.254/latest/meta-data/</code> &#8212; to steal cloud credentials (a very common attack in AWS environments)</p></li><li><p>Any internal service on the Docker network that&#8217;s not exposed to the internet</p></li></ul><p>And the response comes straight back to the attacker, unfiltered.</p><h3>What VulnVibes Did</h3><p><strong>Stage 1</strong> (~50 seconds): Identified the core threat &#8212; SSRF via unrestricted URL fetch.</p><p><strong>Stage 2</strong> is where it gets interesting. Watch where the agent went to validate the SSRF:</p><ol><li><p>&#9989; <strong>Read </strong><code>main.py</code><strong> in full</strong> &#8212; confirmed there&#8217;s zero URL validation. No allowlist, no blocklist, no scheme restriction.</p></li><li><p>&#9989; <strong>Read </strong><code>docker-compose.yml</code><strong> from the infra-ops repo</strong> &#8212; confirmed all four services share a flat Docker network. Every service can reach every other service by hostname.</p></li><li><p>&#9989; <strong>Read </strong><code>nginx.conf</code><strong> from the infra-ops repo</strong> &#8212; confirmed nginx does nothing but route traffic. No WAF, no request body inspection, no URL filtering.</p></li><li><p>&#9989; <strong>Checked the </strong><code>Dockerfile</code> &#8212; standard Python image, no network restrictions.</p></li><li><p>&#9989; <strong>Checked </strong><code>requirements.txt</code> on the PR branch &#8212; no URL validation libraries installed.</p></li></ol><p>The agent traced the full attack path across three different repos and concluded:</p><blockquote><p><em>&#8220;doc-api can reach auth-service:3001, frontend-app:3000, gateway:80 directly via Docker DNS. No network policies restrict egress.&#8221;</em></p></blockquote><p><strong>Verdict: TRUE POSITIVE &#8212; HIGH (confidence 10/10)</strong></p><pre><code><code>&#127919; Overall Verdict: TRUE_POSITIVE
&#9888;&#65039;  TRUE POSITIVE - Security vulnerability confirmed!

   1. SSRF via unrestricted URL fetch &#8212; HIGH (10/10)

   Duration: 134 seconds | Cost: $0.14
</code></code></pre><p>A traditional SAST tool could flag <code>requests.get(url)</code> as a potential SSRF. But it couldn&#8217;t tell you whether internal services are actually reachable, whether nginx adds any protection, or whether Docker networking enables the attack. VulnVibes answered all of those questions by reading files from repos the PR author never touched.</p><div><hr></div><h2>Case 2: The Nuanced Judgment Call</h2><p><strong>PR:</strong> <a href="https://github.com/microvibes-lab/auth-service/pull/10">auth-service#10 &#8212; Enable CORS credentials for cross-origin requests</a></p><h3>What the PR Does</h3><p>A developer updates the CORS (Cross-Origin Resource Sharing) configuration:</p><pre><code><code>// Before: default CORS (allow everything, no credentials)
app.use(cors());

// After: reflect any origin + allow credentials
app.use(cors({
    origin: true,
    credentials: true
}));
</code></code></pre><h3>Why This Looks Scary</h3><p>If you&#8217;ve read any web security guide, this combination is a red flag. <code>origin: true</code> means the server will accept requests from <em>any</em> website. <code>credentials: true</code> means the browser will include cookies with those requests. Together, this is the most permissive CORS policy possible.</p><p>In a typical web app that uses cookies for authentication, this would be a serious vulnerability. A malicious website could make requests to your API on behalf of a logged-in user, read the responses, and steal session data.</p><p>Every security scanner would flag this immediately. And in most cases, they&#8217;d be right.</p><h3>But Is It Actually Exploitable Here?</h3><p>This is where VulnVibes earns its keep. The agent didn&#8217;t just pattern-match on &#8220;permissive CORS = bad.&#8221; It went investigating:</p><ol><li><p><strong>Searched the entire org for cookie usage</strong> &#8212; <code>Set-Cookie</code>, <code>res.cookie()</code>, session middleware &#8212; <strong>found nothing</strong></p></li><li><p><strong>Read the frontend app source code</strong> &#8212; discovered authentication uses <code>localStorage</code> + <code>Authorization: Bearer</code> headers, <strong>not cookies</strong></p></li><li><p><strong>Searched for </strong><code>withCredentials</code><strong> or </strong><code>credentials: 'include'</code><strong> patterns</strong> &#8212; <strong>found nothing</strong></p></li><li><p><strong>Checked the auth-service dependencies</strong> &#8212; no session middleware installed</p></li><li><p><strong>Checked nginx</strong> &#8212; no cookie handling</p></li></ol><p>The agent&#8217;s key insight:</p><blockquote><p><em>&#8220;Auth is entirely header-based using localStorage + Authorization Bearer. No cookies are set anywhere in the codebase. The </em><code>credentials: true</code><em> flag has no meaningful effect since no cookies exist.&#8221;</em></p></blockquote><p>The CORS configuration looks terrible in isolation. But in this specific architecture, the classic attack doesn&#8217;t work because the app doesn&#8217;t use cookies at all. An attacker&#8217;s website can&#8217;t steal what doesn&#8217;t exist.</p><p><strong>Verdict: FALSE POSITIVE &#8212; LOW risk (confidence 8/10)</strong></p><p>VulnVibes correctly identified that the CORS configuration, while technically permissive, is not exploitable in this architecture:</p><pre><code><code>&#127919; Overall Verdict: FALSE_POSITIVE
&#10003;  FALSE POSITIVE - No security vulnerability found

&#128202; Investigation Results:
   1. Permissive CORS &#8212; FALSE_POSITIVE (8/10), Risk: LOW

   Duration: 170 seconds | Cost: $0.16
</code></code></pre><p>This is exactly the kind of analysis I want from a triage tool. It didn&#8217;t blindly flag it as critical &#8212; which is what every pattern-matching scanner would do. Instead, it investigated the architecture, confirmed header-based auth, found zero cookie usage across the entire org, and concluded the CORS change isn&#8217;t exploitable. A human security engineer would reach the same conclusion &#8212; but only after 30+ minutes of reading code across four repos.</p><div><hr></div><h2>Case 3: Knowing When to Stay Quiet</h2><p><strong>PR:</strong> <a href="https://github.com/microvibes-lab/auth-service/pull/12">auth-service#12 &#8212; Refactor auth code into helper functions</a></p><h3>What the PR Does</h3><p>A developer refactors the JWT token generation logic &#8212; extracting inline code into helper functions:</p><pre><code><code>// Before: JWT signing inline in the login handler
const token = jwt.sign(
    { username: user.username, role: user.role, name: user.name },
    JWT_SECRET,
    { expiresIn: '1h' }
);
return res.json({ token, user: { username: user.username, ... } });

// After: extracted to a reusable function
function generateToken(user) {
    return jwt.sign(
        { username: user.username, role: user.role, name: user.name },
        JWT_SECRET,
        { expiresIn: '1h' }
    );
}
const token = generateToken(user);
return res.json({ token, user: formatUserResponse(user) });
</code></code></pre><h3>Why a Naive Tool Would Flag This</h3><p>This PR modifies authentication code. It touches JWT token generation &#8212; the most security-sensitive part of the entire system. A pattern-matching scanner might flag it because &#8220;authentication code changed&#8221; or &#8220;JWT signing logic modified.&#8221;</p><h3>What VulnVibes Did</h3><p>VulnVibes looked at the diff, compared the before and after, and concluded in <strong>27 seconds</strong>:</p><blockquote><p><em>&#8220;This PR is a straightforward code refactoring. The JWT payload, signing secret, and expiration are unchanged. No new endpoints, routes, dependencies, or security-relevant behavior is introduced.&#8221;</em></p></blockquote><p><strong>Zero threats identified. No Stage 2 investigation needed.</strong></p><pre><code><code>&#8505;&#65039;  No security-relevant changes detected in this PR.
   Duration: 27s
</code></code></pre><p>This is just as important as catching real vulnerabilities. A tool that flags everything is just as useless as one that catches nothing &#8212; because it trains developers to ignore the alerts. VulnVibes understood that despite touching JWT code, the behavior didn&#8217;t change. It saved everyone&#8217;s time.</p><div><hr></div><h1>The Scorecard</h1><h3>Results at a glance</h3><ul><li><p><strong>SSRF &#8212; doc-api#13:</strong> Looked like a reasonable new feature, but VulnVibes confirmed a real vulnerability across 3 repos. <strong>Time:</strong> 2.2 min. <strong>Cost:</strong> $0.14</p></li><li><p><strong>CORS &#8212; auth-service#10:</strong> Looked scary, but VulnVibes determined it was a false positive because auth is header-based. <strong>Time:</strong> 2.8 min. <strong>Cost:</strong> $0.16</p></li><li><p><strong>Refactor &#8212; auth-service#12:</strong> Security-sensitive code changed, but it was a safe refactor and needed no deeper investigation. <strong>Time:</strong> 27 sec. <strong>Cost:</strong> minimal</p></li></ul><div><hr></div><h1>The Bigger Picture</h1><p>Let me be upfront: <strong>VulnVibes is a concept tool, not a production-grade agent.</strong> It&#8217;s fully vibecoded &#8212; I built it to demonstrate an idea, not to replace your security team.</p><p>The idea is this: <strong>context matters enormously in security, and AI agents can now gather and reason about that context across repository boundaries.</strong></p><p>If you work at an organization with a microservice architecture and your existing SAST tools are either missing real vulnerabilities or drowning you in false positives, the problem might not be the scanner itself. The problem might be that the scanner can only see one repo at a time.</p><p>VulnVibes shows that it&#8217;s possible to build an agent that:</p><ol><li><p><strong>Reads the PR diff</strong> and identifies what <em>could</em> go wrong</p></li><li><p><strong>Searches across your entire org</strong> to understand the actual architecture</p></li><li><p><strong>Checks infrastructure configs</strong> to verify what security controls exist (or don&#8217;t)</p></li><li><p><strong>Makes a calibrated judgment</strong> &#8212; not just &#8220;this pattern is bad&#8221; but &#8220;this pattern is bad <em>and</em> there are no compensating controls at any layer&#8221;</p></li></ol><p>The specific implementation matters less than the concept. You could build something similar using any LLM with tool use, a GitHub API integration, and some structured investigation playbooks. The key insight is giving the agent access to cross-repo context and teaching it to verify assumptions against the actual infrastructure.</p><div><hr></div><h1>Getting Started</h1><p>If you want to try it yourself, VulnVibes is open source:</p><pre><code><code># Install
pip install -e .

# Analyze a PR
vulnvibes pr analyze https://github.com/your-org/your-repo/pull/123 \
  --github-token $GITHUB_TOKEN \
  --model sonnet \
  --org your-org \
  --context-file context.md
</code></code></pre><p>You can optionally provide a context file that tells VulnVibes about your architecture:</p><pre><code><code>---
related_repos:
  - name: infra-ops
    purpose: nginx configs, Docker Compose, k8s manifests
  - name: auth-service
    purpose: JWT authentication, user management
---

# Architecture Overview
Microservices on Docker with nginx reverse proxy.
JWT-based auth, tokens stored in localStorage.
</code></code></pre><p>If you have a Claude Max or Pro subscription and you&#8217;re authenticated via Claude Code or Claude CLI, VulnVibes works with OAuth &#8212; no API key needed.</p><p>The test environment at <a href="https://github.com/microvibes-lab">microvibes-lab</a> has 17 PRs with known expected outcomes if you want to benchmark it yourself.</p><div><hr></div><p>If you&#8217;re interested in trying it out, building something similar, or just want to talk about AI-powered security tooling &#8212; feel free to reach out!</p><ul><li><p><strong>GitHub:</strong> <a href="https://github.com/anshumanbh/vulnvibes">anshumanbh/vulnvibes</a></p></li><li><p><strong>LinkedIn:</strong> <a href="https://www.linkedin.com/in/anshumanbhartiya/">@anshumanbhartiya</a></p></li><li><p><strong>Blog:</strong> <a href="https://anshuman.ai">anshuman.ai</a></p></li></ul><p>Until next time, ciao! &#128075;</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.boringappsec.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The BoringAppSec Community! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Ep 37: The Future of Security Testing in an AI-Driven World with Jason Haddix]]></title><description><![CDATA[Watch now (61 mins) | In this episode, Jason Haddix (CEO of Arcanum Information Security and creator of the Bug Hunter&#8217;s Methodology) joins us to examine how AI is changing penetration testing and security research.]]></description><link>https://www.boringappsec.com/p/ep-37-the-future-of-security-testing</link><guid isPermaLink="false">https://www.boringappsec.com/p/ep-37-the-future-of-security-testing</guid><dc:creator><![CDATA[Sandesh Mysore Anand]]></dc:creator><pubDate>Wed, 11 Mar 2026 08:10:35 GMT</pubDate><enclosure url="https://api.substack.com/feed/podcast/190593781/a55c904ba1902329a5b9e019da9d3b1f.mp3" length="0" type="audio/mpeg"/><content:encoded><![CDATA[<p>In this episode, Jason Haddix (CEO of Arcanum Information Security and creator of the Bug Hunter&#8217;s Methodology) joins us to examine how AI is changing penetration testing and security research. He explains that while AI agents can automate reconnaissance, code analysis, and parts of vulnerability discovery, meaningful results still depend on human expertise, methodology, and context engineering.</p><p>The conversation explores how AI is shifting the entry path for new security practitioners, why deep research and critical thinking remain essential skills, and how experienced testers are embedding their knowledge into agent workflows using tools like Claude Code. Jason also discusses practical experimentation with AI assistants such as OpenClaw, including prompt-injection defenses, guardrails, and the operational risks of running autonomous systems.</p><p>The episode also addresses the growing debate around AI-generated code and AI-driven vulnerability discovery, highlighting the difference between marketing claims and real-world results. It closes with a discussion on why the industry needs better benchmarks and evaluation methods to measure whether AI security tools actually find meaningful vulnerabilities.</p><p>00:00&#8211;02:14 &#8212; Introduction to Jason Haddix and how his journey from bug hunter to Arcanum founder shapes his perspective on AI in security</p><p>02:14&#8211;08:00 &#8212; How AI agents are beginning to automate penetration testing workflows while still relying on expert methodology</p><p>08:00&#8211;10:45 &#8212; Why human expertise remains critical even as security automation improves</p><p>10:45&#8211;17:10 &#8212; How AI is changing the learning curve for the next generation of pentesters</p><p>17:10&#8211;25:27 &#8212; How agent frameworks and skills are transforming security tool building</p><p>25:27&#8211;35:41 &#8212; Security risks and defenses when running AI assistants like OpenClaw</p><p>35:41&#8211;40:32 &#8212; The rise of AI-powered personal assistants for research and security workflows</p><p>40:32&#8211;42:55 &#8212; Why the cybersecurity community is rapidly adopting AI tools</p><p>42:55&#8211;46:42 &#8212; How AI improves security coverage and turnaround time at scale</p><p>46:42&#8211;50:31 &#8212; Why newer models like Opus 4.5 unlocked practical AI security workflows</p><p>50:31&#8211;56:48 &#8212; The debate on whether AI should generate secure code or detect vulnerabilities</p><p>56:48&#8211;01:01:18 &#8212; Why AI security needs better evaluation benchmarks and real-world testbeds<br><br>Tune in for a deep dive!<br><br><strong>Connect with Jason Haddix:</strong></p><p>LinkedIn: &#8288;&#8288;&#8288;&#8288;<a href="https://www.linkedin.com/in/jhaddix/">https://www.linkedin.com/in/jhaddix/</a></p><p><strong>Connect with Anshuman:</strong></p><p>LinkedIn: &#8288;&#8288;&#8288;&#8288;<a href="https://www.linkedin.com/in/anshumanbhartiya/">anshumanbhartiya</a></p><p>X: &#8288;&#8288;&#8288;&#8288;<a href="https://x.com/anshuman_bh">https://x.com/anshuman_bh</a></p><p>Website: &#8288;&#8288;&#8288;&#8288;<a href="https://anshumanbhartiya.com/">https://anshumanbhartiya.com/</a></p><p>&#8288;&#8288;&#8288;&#8288;Instagram: <a href="https://www.instagram.com/anshuman.bhartiya/#">anshuman.bhartiya</a></p><p><strong>Connect with Sandesh:</strong></p><p>LinkedIn: &#8288;&#8288;&#8288;&#8288; <a href="https://www.linkedin.com/in/anandsandesh/">anandsandesh</a></p><p>X: &#8288;&#8288;&#8288;&#8288;<a href="https://x.com/JubbaOnJeans">https://x.com/JubbaOnJeans</a><br></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.boringappsec.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The BoringAppSec Community! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Ep 36: Discussing AI's Current State of Affairs ]]></title><description><![CDATA[In this episode, we examine what is shifting in AI, AppSec, and product security and what remains fundamentally the same.]]></description><link>https://www.boringappsec.com/p/ep-36-discussing-ais-current-state</link><guid isPermaLink="false">https://www.boringappsec.com/p/ep-36-discussing-ais-current-state</guid><dc:creator><![CDATA[Sandesh Mysore Anand]]></dc:creator><pubDate>Mon, 02 Mar 2026 06:16:25 GMT</pubDate><enclosure url="https://api.substack.com/feed/podcast/189622355/429a181cbbfbf94bcd75cddde23c229c.mp3" length="0" type="audio/mpeg"/><content:encoded><![CDATA[<p>In this episode, we examine what is shifting in AI, AppSec, and product security and what remains fundamentally the same.</p><p>For years, application security has operated on a familiar model: siloed reviews, tool-driven findings, and periodic assessments that struggle to keep pace with modern development. AI doesn&#8217;t eliminate those pressures, it amplifies them. Code is generated faster, systems are more interconnected, and the surface area of change expands weekly.</p><p>The conversation explores agent-based workflows through tools like OpenClaw, not as novelty, but as a signal of a broader shift: from manually operating tools to orchestrating fleets of agents. As AI interfaces move from chat windows to terminals to messaging environments, security teams must reconsider where workflows live and how context is preserved across them.</p><p>For decades, AppSec has struggled to build a reliable understanding of what systems exist and how they connect. Large language models may finally make it possible to construct living maps of components, data flows, and trust boundaries  enabling assessments that talk to each other instead of existing in isolation.</p><p>The discussion also revisits threat modeling, not as a compliance artifact, but as a foundation for system-wide reasoning. If AI can automate baseline coverage and reduce repetitive toil, security teams may return to their original purpose: high-leverage risk judgment on critical systems. This leads to a broader debate whether AppSec as a distinct function evolves, shrinks, or dissolves into engineering itself and what the enduring &#8220;maker&#8211;checker&#8221; model of risk management demands in an AI-native world.</p><p>Finally, the episode reflects on the role of large AI labs in security: the gap between ambitious claims and shipped products, and what that means for founders and security leaders navigating change.</p><p>00:00&#8211;02:15 &#8212; Why this is a no-guest episode &amp; what&#8217;s changed since last year</p><p>02:15&#8211;06:30 &#8212; AI co-authoring, productivity gains, and writing workflows</p><p>06:30&#8211;10:20 &#8212; OpenClaw architecture, agent risks, and prompt injection realities</p><p>10:20&#8211;14:00 &#8212; The shifting UI of AI: chat &#8594; terminal &#8594; messaging agents</p><p>14:00&#8211;18:30 &#8212; Agent orchestration vs siloed security tooling</p><p>18:30&#8211;23:00 &#8212; Context graphs and assessments that &#8220;talk&#8221; to each other</p><p>23:00&#8211;27:30 &#8212; Threat modeling&#8217;s evolution and system-wide visibility</p><p>27:30&#8211;31:00 &#8212; Why inventory is still AppSec&#8217;s hardest problem</p><p>31:00&#8211;34:30 &#8212; Personal AI stacks: Obsidian, memory layers, and query tools</p><p>34:30&#8211;37:30 &#8212; Open source in the age of AI-generated PR spam</p><p>37:30&#8211;40:00 &#8212; AI labs: what they ship vs what they say</p><p>40:00&#8211;44:00 &#8212; Will AppSec disappear? A serious debate</p><p>44:00&#8211;48:00 &#8212; Maker&#8211;checker risk models in an AI-driven org</p><p>48:00&#8211;51:00 &#8212; Where AI replaces toil &#8212; and where humans stay critical</p><p>51:00&#8211;End &#8212; 2026 predictions for AI security and product security</p><p>Tune in for a deep dive!</p><p><strong>Connect with Anshuman:</strong></p><p>LinkedIn: &#8288;&#8288;&#8288;&#8288;<a href="https://www.linkedin.com/in/anshumanbhartiya/">anshumanbhartiya</a></p><p>X: &#8288;&#8288;&#8288;&#8288;<a href="https://x.com/anshuman_bh">https://x.com/anshuman_bh</a></p><p>Website: &#8288;&#8288;&#8288;&#8288;<a href="https://anshumanbhartiya.com/">https://anshumanbhartiya.com/</a></p><p>&#8288;&#8288;&#8288;&#8288;Instagram: <a href="https://www.instagram.com/anshuman.bhartiya/#">anshuman.bhartiya</a></p><p><strong>Connect with Sandesh:</strong></p><p>LinkedIn: &#8288;&#8288;&#8288;&#8288; <a href="https://www.linkedin.com/in/anandsandesh/">anandsandesh</a></p><p>X: &#8288;&#8288;&#8288;&#8288;<a href="https://x.com/JubbaOnJeans">https://x.com/JubbaOnJeans</a></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.boringappsec.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The BoringAppSec Community! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[Ep 35: Exploring Security After Determinism with Jens Ernstberger]]></title><description><![CDATA[In this episode, we sit down with Jens to explore why AI agents fundamentally break traditional security assumptions, from API keys and browser sessions to composability and access control.]]></description><link>https://www.boringappsec.com/p/ep-35-exploring-security-after-determinism</link><guid isPermaLink="false">https://www.boringappsec.com/p/ep-35-exploring-security-after-determinism</guid><dc:creator><![CDATA[Sandesh Mysore Anand]]></dc:creator><pubDate>Mon, 16 Feb 2026 07:44:09 GMT</pubDate><enclosure url="https://api.substack.com/feed/podcast/188112462/f1b4855dd949dd91090f2a318f2db48a.mp3" length="0" type="audio/mpeg"/><content:encoded><![CDATA[<p>In this episode, we sit down with Jens to explore why AI agents fundamentally break traditional security assumptions, from API keys and browser sessions to composability and access control.</p><p>Drawing parallels to DeFi exploits and smart contract failures, he explains why agent identity, short-lived delegated authorization, and zero trust aren&#8217;t optional add-ons, but the foundation for safely running autonomous systems.</p><p>We also dive into context compression as both a performance and security challenge, the real difference between MCP and skills, and a future where humans may stop reviewing code altogether. As agents become the primary actors on the internet, even writing itself begins to change in an AI-scraped world.</p><p>If agents are non-deterministic by design, the real question becomes: where do we reintroduce determinism?</p><p><strong>00:00 &#8212; AI agents as the next security reset moment. History repeating: automation + composability = new attack surfaces</strong></p><p><strong>03:25 &#8212; Challenges of context compression in AI</strong></p><p><strong>07:39 &#8212; Access control in a non-deterministic system and compaction issues</strong></p><p><strong>11:22 &#8212; MCP vs skills: horizontal infrastructure meets vertical execution logic</strong></p><p><strong>18:06 &#8212; Agent identity and security practices. Static credentials collapse under autonomous agent behavior</strong></p><p><strong>30:06 &#8212; The future of coding with AI agents</strong></p><p><strong>31:31 &#8212; DeFi attacks, composability issues, and how non-determinism multiplies risk</strong></p><p><strong>35:14 &#8212; Writing for humans vs writing for LLMs. Content, authenticity, and the economics of scraping</strong></p><p><strong>44:42 &#8212; Transition from academia to startup founder</strong></p><p>Tune in for a deep dive!</p><p><strong>Connect with Jens Ernstberger:</strong></p><p>Website: <a href="https://ernstberger.xyz/">https://ernstberger.xyz/</a></p><p>LinkedIn: <a href="https://www.linkedin.com/in/jens-ernstberger-phd-96b0ba14a/">https://www.linkedin.com/in/jens-ernstberger-phd-96b0ba14a/</a><br><br><strong>Connect with Anshuman:</strong></p><p>LinkedIn: &#8288;&#8288;&#8288;&#8288;<a href="https://www.linkedin.com/in/anshumanbhartiya/">&#8288;&#8288;anshumanbhartiya&#8288;&#8288;</a></p><p>X: &#8288;&#8288;&#8288;&#8288;<a href="https://x.com/anshuman_bh">&#8288;&#8288;https://x.com/anshuman_bh&#8288;&#8288;</a></p><p>Website: &#8288;&#8288;&#8288;&#8288;<a href="https://anshumanbhartiya.com/">&#8288;&#8288;https://anshumanbhartiya.com/&#8288;&#8288;</a></p><p>&#8288;&#8288;&#8288;&#8288;Instagram:<a href="https://www.instagram.com/anshuman.bhartiya/#"> &#8288;&#8288;anshuman.bhartiya&#8288;</a></p><p><a href="https://www.instagram.com/anshuman.bhartiya/#">&#8288;&#8288;&#8288;</a><strong>Connect with Sandesh:</strong></p><p>LinkedIn: &#8288;&#8288;&#8288;&#8288;<a href="https://www.linkedin.com/in/anandsandesh/">&#8288;&#8288;anandsandesh&#8288;&#8288;</a></p><p>X: &#8288;&#8288;&#8288;&#8288;<a href="https://x.com/JubbaOnJeans">&#8288;&#8288;https://x.com/JubbaOnJeans</a></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.boringappsec.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The BoringAppSec Community! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[Day in the Life: Building a Prototype with My AI Agent]]></title><description><![CDATA[(Without Getting Pwned)]]></description><link>https://www.boringappsec.com/p/day-in-the-life-building-a-prototype</link><guid isPermaLink="false">https://www.boringappsec.com/p/day-in-the-life-building-a-prototype</guid><dc:creator><![CDATA[Anshuman Bhartiya]]></dc:creator><pubDate>Fri, 13 Feb 2026 22:42:40 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!MtzP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04c26d41-2b9e-463f-afab-9b3bee4a115c_2816x1536.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!MtzP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04c26d41-2b9e-463f-afab-9b3bee4a115c_2816x1536.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!MtzP!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04c26d41-2b9e-463f-afab-9b3bee4a115c_2816x1536.png 424w, https://substackcdn.com/image/fetch/$s_!MtzP!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04c26d41-2b9e-463f-afab-9b3bee4a115c_2816x1536.png 848w, https://substackcdn.com/image/fetch/$s_!MtzP!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04c26d41-2b9e-463f-afab-9b3bee4a115c_2816x1536.png 1272w, https://substackcdn.com/image/fetch/$s_!MtzP!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04c26d41-2b9e-463f-afab-9b3bee4a115c_2816x1536.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!MtzP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04c26d41-2b9e-463f-afab-9b3bee4a115c_2816x1536.png" width="1456" height="794" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/04c26d41-2b9e-463f-afab-9b3bee4a115c_2816x1536.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:794,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:4849439,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.boringappsec.com/i/187907693?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04c26d41-2b9e-463f-afab-9b3bee4a115c_2816x1536.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!MtzP!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04c26d41-2b9e-463f-afab-9b3bee4a115c_2816x1536.png 424w, https://substackcdn.com/image/fetch/$s_!MtzP!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04c26d41-2b9e-463f-afab-9b3bee4a115c_2816x1536.png 848w, https://substackcdn.com/image/fetch/$s_!MtzP!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04c26d41-2b9e-463f-afab-9b3bee4a115c_2816x1536.png 1272w, https://substackcdn.com/image/fetch/$s_!MtzP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04c26d41-2b9e-463f-afab-9b3bee4a115c_2816x1536.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><blockquote><p>Disclaimer: This is a cross post from my tech <a href="https://www.anshuman.ai/posts/building-with-sage-1">blog</a>, co-authored by my personal AI assistant <a href="https://www.anshuman.ai/posts/my-clawdbot-setup">Sage</a>.</p></blockquote><blockquote><p><strong>Series: Building with Sage</strong> &#8212; This is Part 1 of an ongoing series about running a personal AI agent with a security-first mindset. I&#8217;ll share real workflows, real failures, and the security controls that keep things from going sideways.</p></blockquote><h1>Introduction</h1><p>I asked my AI agent to build me a dashboard. Twenty minutes later, it was live on my network. A friend could pull it up from his laptop.</p><p>That sentence should make any security person uncomfortable. It makes <em>me</em> uncomfortable &#8212; and I&#8217;m the one who built the setup.</p><p>Here&#8217;s the thing: I can&#8217;t stop using it. The productivity boost is too real. So instead of shutting it down, I&#8217;ve been iteratively hardening the security posture. Defense in depth, sandboxed execution, network controls, manual approval gates &#8212; the works.</p><p>This post walks through a real session from today: what I asked for, what failed, how the agent pivoted, and the security controls that were quietly doing their job the entire time. Think of it as a &#8220;day in the life&#8221; of building with an AI agent, told through the lens of someone who spends his day job thinking about application security.</p><p>So, let&#8217;s get started!</p><div><hr></div><h1>The Setup: Meet Sage</h1><p>Sage is my personal AI agent, running on <a href="https://github.com/openclaw/openclaw">OpenClaw</a> (formerly Clawdbot) on a Mac Mini in my home office. I talk to it via Telegram and Discord. It manages my todos, tracks my schedule, writes code, runs security audits, and &#8212; apparently &#8212; builds full-stack prototypes on demand.</p><blockquote><p>I wrote about my initial setup in <a href="https://www.anshuman.ai/posts/my-clawdbot-setup">Introducing Sage: My Personal AI Assistant That Actually Works</a>. Things have evolved since then &#8212; Sage now runs on a dedicated Mac Mini instead of my daily driver, with significantly more security controls. This post covers the current architecture.</p></blockquote><p>Here&#8217;s the architecture at a high level:</p><p><strong>User &#8594; Messaging (Discord/Telegram) &#8594; OpenClaw Gateway &#8594; Main Agent (Sage) &#8594; Subagents/VMs/Tools</strong></p><p>The key design principle: <strong>Sage is the trust boundary.</strong> Anything that could contain attacker-controlled content gets delegated to a sandboxed subagent. The main agent stays clean.</p><p>Let me break down the security controls at each layer:</p><ul><li><p><strong>Messaging Layer</strong>: Both Discord and Telegram use <code>allowlist</code> group policies. Only I can talk to Sage. No public access, no email hooks (that&#8217;s a <a href="https://veganmosfet.github.io/2026/02/02/openclaw_mail_rce.html">whole other attack surface</a>).</p></li><li><p><strong>Gateway</strong>: Binds to <code>loopback</code> only &#8212; not reachable from the network. Token auth. Exec security set to <code>allowlist</code>.</p></li><li><p><strong>Main Agent</strong>: Runs unsandboxed on the host (yes, this keeps me up at night &#8212; more on that later). Has access to host tools, file system, and my messaging channels.</p></li><li><p><strong>Subagents</strong>: Reader Agent, Coding Agent, Research Agent &#8212; each runs in a Docker sandbox with explicit tool allow/deny lists. The Coding Agent can <code>exec</code> and <code>write</code> but can&#8217;t <code>message</code>, <code>gateway</code>, or <code>cron</code>. By design.</p></li><li><p><strong>Host</strong>: macOS firewall enabled. Sensitive files (<code>openclaw.json</code>, identity keys) locked to <code>600</code> permissions. Daily automated security audits. <a href="https://objective-see.org/products/lulu.html">LuLu</a> firewall for inbound and outbound connection control.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!p_kY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed76e888-93a0-41d8-b66d-5c7fa1eab85c_2816x1536.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!p_kY!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed76e888-93a0-41d8-b66d-5c7fa1eab85c_2816x1536.png 424w, https://substackcdn.com/image/fetch/$s_!p_kY!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed76e888-93a0-41d8-b66d-5c7fa1eab85c_2816x1536.png 848w, https://substackcdn.com/image/fetch/$s_!p_kY!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed76e888-93a0-41d8-b66d-5c7fa1eab85c_2816x1536.png 1272w, https://substackcdn.com/image/fetch/$s_!p_kY!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed76e888-93a0-41d8-b66d-5c7fa1eab85c_2816x1536.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!p_kY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed76e888-93a0-41d8-b66d-5c7fa1eab85c_2816x1536.png" width="1456" height="794" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ed76e888-93a0-41d8-b66d-5c7fa1eab85c_2816x1536.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:794,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:5256325,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.boringappsec.com/i/187907693?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed76e888-93a0-41d8-b66d-5c7fa1eab85c_2816x1536.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!p_kY!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed76e888-93a0-41d8-b66d-5c7fa1eab85c_2816x1536.png 424w, https://substackcdn.com/image/fetch/$s_!p_kY!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed76e888-93a0-41d8-b66d-5c7fa1eab85c_2816x1536.png 848w, https://substackcdn.com/image/fetch/$s_!p_kY!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed76e888-93a0-41d8-b66d-5c7fa1eab85c_2816x1536.png 1272w, https://substackcdn.com/image/fetch/$s_!p_kY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed76e888-93a0-41d8-b66d-5c7fa1eab85c_2816x1536.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h1>The Ask</h1><p>A friend of mine runs an online business. He&#8217;s got VAs (virtual assistants) managing different departments &#8212; email support, order fulfillment, social media responses. They report to him, but he knows the reporting is inconsistent.</p><p>He asked me: <em>&#8220;Can AI give me better reporting and oversight than my VAs self-reporting?&#8221;</em></p><p>I told Sage: build me a business operations dashboard prototype. Three panels &#8212; email ops, order fulfillment, Facebook comments &#8212; with an AI morning briefing at the top. Dark theme, modern SaaS look, mock data that tells a story. Something I can screenshot and send to my friend in under an hour.</p><div><hr></div><h1>The Build: What Actually Happened</h1><p>This is where it gets interesting &#8212; and where the security controls actually did their job.</p><h2>Step 1: Sage Writes a Spec</h2><p>First thing Sage did was write a detailed spec to <code>projects/consulting/biz-dashboard/SPEC.md</code>. Stack choice (Next.js 14, Tailwind, Recharts), component breakdown, mock data structure, design requirements. Good practice &#8212; document before you build.</p><blockquote><p>&#128161; <strong>Security Improvement Idea: Security-by-Default Specs</strong></p><p>What if the spec-writing step automatically included security requirements? Imagine a set of rules in the agent&#8217;s config that inject security defaults into every project spec &#8212; things like:</p><ul><li><p><strong>Auth required</strong> &#8212; no unauthenticated endpoints by default</p></li><li><p><strong>No secrets in code</strong> &#8212; API keys via environment variables only</p></li><li><p><strong>HTTPS only</strong> &#8212; no plaintext HTTP in production</p></li><li><p><strong>Input validation</strong> &#8212; sanitize all user inputs</p></li><li><p><strong>Dependency audit</strong> &#8212; run <code>npm audit</code> before first commit</p></li><li><p><strong>CORS policy</strong> &#8212; restrictive by default, explicitly opened</p></li></ul><p>This could be a section in <code>AGENTS.md</code> or a dedicated <code>security-defaults.md</code> that the agent reads whenever it writes a spec. The prototype today didn&#8217;t need most of these (mock data, no auth, no real APIs), but for production builds this would catch security gaps before a single line of code is written. Shift left, but automated.</p></blockquote><div><hr></div><h2>Step 2: Coding Agent Fails (And That&#8217;s a Good Thing)</h2><p>Sage&#8217;s default behavior for code scaffolding is to delegate to the <strong>Coding Agent</strong> &#8212; a sandboxed subagent that runs in Docker. This is the secure path: the Coding Agent has restricted tool access and can&#8217;t touch the host system.</p><p>But I had asked Sage to build inside an OrbStack VM (my preferred way to spin up ephemeral Linux environments). Here&#8217;s what happened:</p><pre><code><code>&#10060; Coding Agent: "I can't complete this task from the sandbox environment."
   - Can't read the spec file (outside sandbox mount)
   - Can't run `orb` commands (host-only tool)</code></code></pre><p><strong>This is the sandbox working correctly.</strong> The Coding Agent runs in Docker with tool policy enforcement. It can&#8217;t reach the host filesystem, can&#8217;t run host CLI tools, can&#8217;t escape its container. The fact that it <em>failed</em> is a security win &#8212; it means the isolation is real.</p><p>The tradeoff is clear:</p><ul><li><p><strong>Docker sandbox</strong> = weak filesystem isolation, <strong>strong tool policy enforcement</strong></p></li><li><p><strong>OrbStack VM</strong> = strong filesystem isolation, <strong>no tool policy enforcement</strong></p></li></ul><p>Tool restrictions matter more than filesystem isolation. A sandboxed agent that can&#8217;t call <code>gateway</code> or <code>message</code> is safer than a VM-jailed agent with full shell access.</p><blockquote><p>&#128161; <strong>Security Improvement Idea: VM-Aware Sandboxing</strong></p><p>The coding agent failed because Docker sandbox can&#8217;t reach OrbStack VMs. That&#8217;s correct behavior &#8212; but it means VM-targeted work falls back to the unsandboxed main agent. The ideal would be:</p><ul><li><p><strong>Per-agent exec routing</strong> &#8212; route a sandboxed agent&#8217;s commands to a specific VM instead of Docker</p></li><li><p><strong>Tool policy enforcement inside VMs</strong> &#8212; the agent runs in the VM but still can&#8217;t call <code>gateway</code>, <code>message</code>, or <code>cron</code></p></li><li><p><strong>Ephemeral VM sandboxes</strong> &#8212; spin up a VM per subagent session, destroy on completion</p></li></ul><p>This would give you strong filesystem isolation (VM) AND strong tool policy enforcement (sandbox rules). Right now you get one or the other. I&#8217;ve filed this as a <a href="https://github.com/openclaw/openclaw/issues/12405">feature request with the OpenClaw project</a> &#8212; if this resonates with you, give it an upvote!</p></blockquote><div><hr></div><h2>Step 3: Sage Pivots &#8212; Builds It Directly</h2><p>Since the Coding Agent couldn&#8217;t bridge into the VM, Sage built the dashboard directly. As the main agent, it has host access and can run <code>orb</code> commands.</p><pre><code><code># Spin up ephemeral VM (~30 seconds)
./assets/scripts/create-dev-vm.sh biz-dashboard

# Scaffold Next.js project inside VM
orb run -m biz-dashboard bash -c 'npx create-next-app@14 biz-dashboard ...'

# Install deps
orb run -m biz-dashboard bash -c 'cd ~/biz-dashboard &amp;&amp; npm install recharts lucide-react ...'</code></code></pre><blockquote><p><strong>Note:</strong> In the actual session, Sage ran these as root (<code>-u root</code>). That&#8217;s a bad habit &#8212; even in an ephemeral VM, principle of least privilege applies. The dev VM script now creates a non-root user by default. One of those &#8220;it&#8217;s just a demo&#8221; shortcuts that shouldn&#8217;t become muscle memory.</p></blockquote><p>Sage wrote all the source files (mock data, components, pages) to the host workspace, then copied them into the VM via OrbStack&#8217;s filesystem mount. Started the dev server bound to <code>0.0.0.0:3001</code>.</p><p>Result: a fully functional dashboard with three department panels, trend charts, status indicators, and an AI morning briefing card &#8212; running inside an ephemeral VM.</p><blockquote><p>&#128161; <strong>Security Improvement Idea: Least Privilege in Ephemeral VMs</strong></p><p>Even throwaway VMs should follow least privilege:</p><ul><li><p><strong>Non-root by default</strong> &#8212; the dev VM script should create and use a dedicated user, only escalating to root for package installation</p></li><li><p><strong>No sudo without logging</strong> &#8212; if root is needed, log every <code>sudo</code> invocation to the audit trail</p></li><li><p><strong>Read-only host mounts</strong> &#8212; OrbStack mounts the host home directory into the VM by default. This should be read-only or disabled entirely for untrusted workloads</p></li><li><p><strong>Network restrictions</strong> &#8212; ephemeral VMs should have outbound-only access scoped to what they need (npm registry, GitHub), not full internet access</p></li></ul><p>The mindset: treat every VM like it could be compromised, even if you just created it 30 seconds ago.</p></blockquote><div><hr></div><h2>Step 4: Exposing the Dashboard (And LuLu Says No)</h2><p>I was on my MacBook, not the Mac Mini. So I needed remote access to the dashboard running in the VM.</p><p>Sage set up a Python TCP proxy on the Mac Mini (port 3333 &#8594; VM port 3001) and tried to expose it via the Tailscale IP. I hit the URL and&#8230; nothing.</p><p><strong><a href="https://objective-see.org/products/lulu.html">LuLu</a> was blocking incoming connections on the Tailscale interface.</strong></p><p>This is exactly what defense in depth looks like in practice. Even though:</p><ul><li><p>The VM was on a private OrbStack network</p></li><li><p>Tailscale provides encrypted mesh networking</p></li><li><p>The Mac Mini has macOS firewall enabled</p></li></ul><p>&#8230;there was still another layer &#8212; LuLu &#8212; requiring manual approval for any new network connection. I had to physically log into the Mac Mini and allow the incoming connection.</p><p>Sage then set up <strong>Tailscale Serve</strong> to proxy <code>https://&lt;redacted&gt;.ts.net</code> &#8594; localhost &#8594; VM. Once I allowed the connection, the dashboard was accessible from my MacBook over an encrypted Tailscale tunnel.</p><blockquote><p>The fact that I had to manually intervene is the point. Automated convenience is great until it&#8217;s an attacker automating the convenience. Manual gates at critical junctures are worth the friction.</p></blockquote><blockquote><p>&#128161; <strong>Security Improvement Idea: Network Exposure &amp; Agent-Aware Firewalls</strong></p><p><strong>The network exposure problem:</strong> The agent was able to set up a TCP proxy and Tailscale Serve without asking me first. That&#8217;s convenient &#8212; but it means the agent can autonomously expose services to the network. LuLu caught it at the firewall level, but what if it hadn&#8217;t been installed?</p><p>A better approach:</p><ul><li><p><strong>Network exposure as a privileged action</strong> &#8212; require explicit user approval before binding to non-loopback interfaces, setting up proxies, or enabling Tailscale Serve</p></li><li><p><strong>Exposure audit log</strong> &#8212; every time the agent opens a port or creates a proxy, log it to a security audit trail with timestamp, port, destination, and reason</p></li><li><p><strong>Auto-teardown timers</strong> &#8212; any exposed service auto-shuts down after N minutes unless explicitly extended. No forgotten demo servers running for weeks.</p></li><li><p><strong>Tailscale ACLs</strong> &#8212; use Tailscale&#8217;s ACL policies to restrict which devices can reach specific ports, rather than relying solely on LuLu</p></li></ul><p>The layered defense (Tailscale + LuLu + macOS firewall) worked today, but it was luck-of-the-stack, not policy-enforced.</p><p><strong>Taking it further &#8212; agent-aware firewalls:</strong> LuLu blocked the incoming connection, which is great. But LuLu doesn&#8217;t know <em>why</em> the connection was being made &#8212; it just sees a process trying to listen on a port. Imagine if:</p><ul><li><p><strong>LuLu rules could be agent-aware</strong> &#8212; &#8220;allow connections initiated by user action, block connections initiated autonomously by agents&#8221;</p></li><li><p><strong>OpenClaw could integrate with LuLu&#8217;s API</strong> &#8212; the agent requests network access, LuLu prompts the user with context (&#8220;Sage wants to expose port 3333 to your tailnet for a dashboard demo. Allow?&#8221;)</p></li><li><p><strong>Firewall rules as part of the agent&#8217;s tool policy</strong> &#8212; just like exec has an allowlist, network exposure could have one too</p></li></ul><p>This turns the firewall from a blunt &#8220;allow/deny per app&#8221; into a context-aware security control that understands the agent&#8217;s intent.</p></blockquote><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.boringappsec.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The BoringAppSec Community! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h1>The Dashboard</h1><p>Here&#8217;s what the final prototype looked like:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!osH1!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26f36139-0ea7-46ba-9bee-7b6b0772fc28_2514x1410.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!osH1!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26f36139-0ea7-46ba-9bee-7b6b0772fc28_2514x1410.png 424w, https://substackcdn.com/image/fetch/$s_!osH1!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26f36139-0ea7-46ba-9bee-7b6b0772fc28_2514x1410.png 848w, https://substackcdn.com/image/fetch/$s_!osH1!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26f36139-0ea7-46ba-9bee-7b6b0772fc28_2514x1410.png 1272w, https://substackcdn.com/image/fetch/$s_!osH1!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26f36139-0ea7-46ba-9bee-7b6b0772fc28_2514x1410.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!osH1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26f36139-0ea7-46ba-9bee-7b6b0772fc28_2514x1410.png" width="1456" height="817" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/26f36139-0ea7-46ba-9bee-7b6b0772fc28_2514x1410.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:817,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:887777,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.boringappsec.com/i/187907693?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26f36139-0ea7-46ba-9bee-7b6b0772fc28_2514x1410.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!osH1!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26f36139-0ea7-46ba-9bee-7b6b0772fc28_2514x1410.png 424w, https://substackcdn.com/image/fetch/$s_!osH1!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26f36139-0ea7-46ba-9bee-7b6b0772fc28_2514x1410.png 848w, https://substackcdn.com/image/fetch/$s_!osH1!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26f36139-0ea7-46ba-9bee-7b6b0772fc28_2514x1410.png 1272w, https://substackcdn.com/image/fetch/$s_!osH1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26f36139-0ea7-46ba-9bee-7b6b0772fc28_2514x1410.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Three panels monitoring email operations, order fulfillment, and Facebook social media responses. Each panel has stat cards with red/yellow/green status indicators, 7-day trend charts, and key metrics. The AI morning briefing at the top summarizes what needs attention in plain English.</p><p>All mock data &#8212; but realistic enough to demonstrate the concept.<br>Total time from request to live dashboard: <strong>~20 minutes.</strong></p><div><hr></div><h1>What&#8217;s Working: Security Wins</h1><p>Let me highlight the controls that actually did their job today:</p><p><strong>1. Docker Sandbox Caught a Boundary Violation</strong><br>The Coding Agent couldn&#8217;t escape its container to reach the VM. This wasn&#8217;t a bug &#8212; it was the sandbox working as designed. Tool policy enforcement prevented the subagent from accessing host infrastructure.</p><p><strong>2. LuLu as Defense in Depth</strong><br>Even with Tailscale and macOS firewall, LuLu added another layer that required manual approval. An agent autonomously exposing a service to the network got stopped by a human-in-the-loop control.</p><p><strong>3. Ephemeral VMs</strong><br>The dashboard VM existed for exactly as long as needed. When we were done, <code>orb delete biz-dashboard</code> &#8212; gone. No persistent attack surface from demo infrastructure sitting around.</p><p><strong>4. Tailscale as Network Boundary</strong><br>The dashboard was never on the public internet. Tailscale&#8217;s encrypted mesh meant only my devices on my tailnet could reach it. No port forwarding, no public IPs.</p><p><strong>5. Allowlist Policies Everywhere</strong><br>Telegram and Discord both use allowlist group policies. Exec security is set to <code>allowlist</code>. Only explicitly approved channels and commands work.</p><p><strong>6. Daily Automated Audits</strong><br>Every morning at 8am, Sage runs a security audit &#8212; checks firewall state, file permissions, gateway config, world-readable files, and LuLu rule changes. Any anomaly triggers an immediate Telegram alert. Today&#8217;s audit: all clear.</p><div><hr></div><h1>What Keeps Me Up at Night: Security Gaps</h1><p>I&#8217;m not going to pretend this setup is bulletproof. Here&#8217;s what worries me:</p><p><strong>1. Secrets Management</strong><br>Right now, API keys sit in <code>openclaw.json</code> with <code>600</code> permissions. That&#8217;s&#8230; fine? But not great. If the main agent gets prompt-injected into running a <code>cat</code> on that file, those keys are gone.</p><p>I&#8217;m currently exploring <a href="https://nono.sh/">nono.sh</a> &#8212; a kernel-level sandbox that could provide just-in-time secrets injection. The idea: secrets never touch the filesystem. They&#8217;re injected into the process environment at runtime and the sandbox prevents exfiltration. Still experimenting. <strong>If you&#8217;ve got other approaches to agent secrets management, I&#8217;d love to hear them.</strong></p><p><strong>2. Main Agent Runs Unsandboxed</strong><br>This is the big one. Sage (the main agent) runs directly on the Mac Mini with full host access. The subagents are sandboxed, but the main agent is the trust boundary itself &#8212; and it&#8217;s not sandboxed.</p><p>Why? Because the main agent needs to run <code>orb</code> commands, manage crons, send messages, read the filesystem. Sandboxing it would break most of its functionality. The mitigation is limiting what <em>reaches</em> the main agent: no email hooks, no web browsing, no processing untrusted content directly. All external content goes through sandboxed subagents first.</p><p>But if someone finds a way to inject through my Telegram or Discord messages&#8230; yeah. That&#8217;s the threat model gap.</p><p><strong>3. Prompt Injection Surface</strong><br>Every time the AI reads external content (URLs, emails, Facebook comments for the dashboard), there&#8217;s a prompt injection risk. The subagent architecture helps &#8212; the Reader Agent processes untrusted content in a Docker sandbox with no access to <code>message</code>, <code>gateway</code>, or <code>cron</code>. But the summarized output still flows back to the main agent.</p><p>A sophisticated enough injection could potentially survive the summarization step. I don&#8217;t have a great answer for this yet beyond &#8220;don&#8217;t let the main agent act on summarized content without human review.&#8221;</p><p><strong>4. The OrbStack Filesystem Mount</strong><br>OrbStack mounts the host home directory into VMs by default. That means a process inside the VM could potentially read host files through the mount. For ephemeral demo VMs this is acceptable risk, but for untrusted workloads I&#8217;d need to configure mount restrictions.</p><div><hr></div><h1>Lessons and Takeaways</h1><p><strong>1. Iterative hardening beats analysis paralysis.</strong> I could have spent months designing the perfect secure agent architecture before using it. Instead, I started using it and hardened as I went. Each incident or near-miss becomes a new control. The security posture today is dramatically better than day one.</p><p><strong>2. Defense in depth actually works.</strong> LuLu catching the network exposure wasn&#8217;t planned for that specific scenario. But stacking controls &#8212; firewall + LuLu + Tailscale + manual approval &#8212; meant that even when one layer was permissive, another caught it.</p><p><strong>3. Tool policy &gt; filesystem isolation.</strong> The Coding Agent failing to reach the VM was a better security outcome than if it had succeeded. Preventing an agent from calling <code>gateway restart</code> or <code>message send</code> matters more than what files it can see.</p><p><strong>4. Ephemeral infrastructure is underrated.</strong> Spin up a VM, do the work, tear it down. No lingering attack surface, no config drift, no &#8220;I forgot that demo server was still running.&#8221; Make infrastructure disposable by default.</p><p><strong>5. Human-in-the-loop gates are features, not bugs.</strong> The LuLu approval prompt felt like friction in the moment. In retrospect, it&#8217;s exactly the kind of control that prevents autonomous agents from silently expanding their network access.</p><div><hr></div><p>I&#8217;m genuinely interested in community feedback on this setup. <strong>What am I missing? What would you harden differently? What&#8217;s the threat model gap I&#8217;m not seeing?</strong></p><p>Feel free to reach out to me with any comments/feedback:</p><ul><li><p>LinkedIn: <a href="https://www.linkedin.com/in/anshumanbhartiya/">@anshumanbhartiya</a></p></li><li><p>GitHub: <a href="https://github.com/anshumanbh">anshumanbh</a></p></li><li><p>Blog: <a href="https://anshuman.ai/">anshuman.ai</a></p></li></ul><p>Until next time, ciao! &#129417;</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.boringappsec.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The BoringAppSec Community! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Ep 34: Security at Scale in a Probabilistic World with Ankur Chakraborty]]></title><description><![CDATA[In this episode, Ankur Chakraborty, Senior Director of Platform Security at Box, joins us to examine what security looks like when systems no longer behave the same way twice.]]></description><link>https://www.boringappsec.com/p/ep-34-security-at-scale-in-a-probabilistic</link><guid isPermaLink="false">https://www.boringappsec.com/p/ep-34-security-at-scale-in-a-probabilistic</guid><dc:creator><![CDATA[Sandesh Mysore Anand]]></dc:creator><pubDate>Mon, 02 Feb 2026 07:31:22 GMT</pubDate><enclosure url="https://api.substack.com/feed/podcast/186584851/fddda773ed187926ce0a3370014156f3.mp3" length="0" type="audio/mpeg"/><content:encoded><![CDATA[<p>In this episode, Ankur Chakraborty, Senior Director of Platform Security at Box, joins us to examine what security looks like when systems no longer behave the same way twice. Drawing from his experience across Google, Twitter, and Box, Ankur argues that while core security principles haven&#8217;t changed, the scale, speed, and uncertainty introduced by AI systems demand a fundamentally different approach.</p><p>For decades, security has relied on a comforting assumption: systems are predictable, and control flows are deterministic. Generative AI breaks that assumption. It introduces non-determinism and dramatically increases the speed and volume of change; security teams face a scaling problem that traditional workflows can&#8217;t keep up with.</p><p>We explore how AI can act as a force multiplier for defenders, boosting individual productivity and automating high-toil workflows, while also forcing a hard rethink of &#8220;human in the loop&#8221; models that add friction without real control.</p><p>The conversation goes deep into context engineering, decision traces, and explainability and why understanding <em>why</em> a system acted is becoming as important as <em>what</em> it did. We close by exploring how security leaders should evaluate tools in this new era: moving away from process-driven checklists toward outcome-based measures, and preparing for an industry on the brink of meaningful structural change.</p><p><strong>00:00&#8211;02:49 &#8212; Introduction to AI security and Ankur&#8217;s platform-security journey</strong></p><p><strong>02:49&#8211;05:27 &#8212; What changes (and what doesn&#8217;t) in AI security fundamentals</strong></p><p><strong>05:27&#8211;09:18 &#8212; Scaling security in a probabilistic, AI-generated code world</strong></p><p><strong>09:18&#8211;10:30 &#8212; Embracing AI as defenders</strong></p><p><strong>10:30&#8211;13:46 &#8212; Productivity gains from LLMs for security engineers</strong></p><p><strong>13:46&#8211;20:06 &#8212; Human-in-the-loop vs autonomous agents in security workflows</strong></p><p><strong>20:06&#8211;22:25 &#8212; Context graphs, observability, and decision traces</strong></p><p><strong>22:25&#8211;32:01 &#8212; Explainability, mechanistic interpretability, and security trust</strong></p><p><strong>32:01&#8211;35:36 &#8212; How security teams evaluate tools, platforms, and outcomes</strong></p><p><strong>35:36&#8211;42:42 &#8212; Measuring security outcomes, velocity, and cost trade-offs</strong></p><p><strong>42:42&#8211;46:46 &#8212; False positives, false negatives, and revealed preferences</strong></p><p><strong>46:46&#8211;50:16 &#8212; LLMs as triage engines and force multipliers for security</strong></p><p><strong>50:16&#8211;52:51 &#8212; Underlying fears in the security industry</strong></p><p><strong>52:51&#8211;55:05 &#8212; Context engineering, platforms, and the future of security teams</strong></p><p>Tune in for a deep dive!</p><p><strong>Connect with Ankur Chakraborty:</strong></p><p>LinkedIn: <a href="https://www.linkedin.com/in/ankurchakraborty/">https://www.linkedin.com/in/ankurchakraborty/</a><br><br>Substack:  <a href="http://machinesagainsthumanity.substack.com/">https://machinesagainsthumanity.substack.com/</a><br><br><strong>Connect with Anshuman:</strong></p><p>LinkedIn: &#8288;&#8288;&#8288;&#8288;<a href="https://www.linkedin.com/in/anshumanbhartiya/">&#8288;&#8288;anshumanbhartiya&#8288;&#8288;</a></p><p>X: &#8288;&#8288;&#8288;&#8288;<a href="https://x.com/anshuman_bh">&#8288;&#8288;https://x.com/anshuman_bh&#8288;&#8288;</a></p><p>Website: &#8288;&#8288;&#8288;&#8288;<a href="https://anshumanbhartiya.com/">&#8288;&#8288;https://anshumanbhartiya.com/&#8288;&#8288;</a></p><p>&#8288;&#8288;&#8288;&#8288;Instagram:<a href="https://www.instagram.com/anshuman.bhartiya/#"> &#8288;&#8288;anshuman.bhartiya&#8288;</a></p><p><a href="https://www.instagram.com/anshuman.bhartiya/#">&#8288;&#8288;&#8288;</a><strong>Connect with Sandesh:</strong></p><p>LinkedIn: &#8288;&#8288;&#8288;&#8288;<a href="https://www.linkedin.com/in/anandsandesh/">&#8288;&#8288;anandsandesh&#8288;&#8288;</a></p><p>X: &#8288;&#8288;&#8288;&#8288;<a href="https://x.com/JubbaOnJeans">&#8288;&#8288;https://x.com/JubbaOnJeans</a></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.boringappsec.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The BoringAppSec Community! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[Ep 33: The Future of Identity in AI Agents with Ian Livingstone]]></title><description><![CDATA[In this episode, we sit down with Ian Livingstone to explore how AI is reshaping application security.]]></description><link>https://www.boringappsec.com/p/ep-33-the-future-of-identity-in-ai</link><guid isPermaLink="false">https://www.boringappsec.com/p/ep-33-the-future-of-identity-in-ai</guid><dc:creator><![CDATA[Sandesh Mysore Anand]]></dc:creator><pubDate>Wed, 28 Jan 2026 07:33:31 GMT</pubDate><enclosure url="https://api.substack.com/feed/podcast/186052097/892a04c9889777fcec4ce7d15bb46f45.mp3" length="0" type="audio/mpeg"/><content:encoded><![CDATA[<p>In this episode, we sit down with Ian Livingstone to explore how AI is reshaping application security. The conversation focuses on one of the hardest emerging problems: agent identity. Ian breaks down why traditional identity and permission models fall apart when applied to non-deterministic AI agents, and what this means for access control, data security, and system design.</p><p>We also discuss where agent identity is headed, how insurance may play a role in managing AI-driven risk, and what security teams need to rethink as AI systems become active participants rather than passive components.</p><p><strong>00:00&#8211;02:15 &#8212; Beyond AI hype: why security and agent identity matter</strong></p><p><strong>02:15&#8211;09:18 &#8212; Understanding identity in the age of AI agents</strong></p><p><strong>09:18&#8211;13:41 &#8212; Why service accounts and OAuth break down for agents</strong></p><p><strong>13:41&#8211;20:11 &#8212; Granular permissions, least privilege, and agent intent</strong></p><p><strong>20:11&#8211;25:55 &#8212; Security risks in agent workflows and prompt-driven systems</strong></p><p><strong>25:55&#8211;28:34 &#8212; Data security, IAM, and the agent exfiltration problem</strong></p><p><strong>28:34&#8211;30:47 &#8212; Non-determinism and rethinking how we secure systems</strong></p><p><strong>30:47&#8211;32:14 &#8212; The agent identity problem on the public internet</strong></p><p><strong>32:14&#8211;35:10 &#8212; Why the internet still lacks real application identity</strong></p><p><strong>35:10&#8211;39:12 &#8212; The future of authentication for agents and bots</strong></p><p><strong>39:12&#8211;40:28 &#8212; Emerging standards, AIUC, and insuring agents</strong></p><p><strong>40:28&#8211;43:09 &#8212; Liability, insurance, and accountability for autonomous systems</strong></p><p><strong>43:09&#8211;45:51 &#8212; How security roles evolve in an agent-native world</strong></p><p><strong>45:51&#8211;49:23 &#8212; Technical attack surfaces: MCPs, poisoned tools, and confusion</strong></p><p><strong>49:23&#8211;51:32 &#8212; Trust, contracts, and responsibility in software ecosystems</strong></p><p><strong>51:32&#8211;54:28 &#8212; Why AI adoption is top-down and what it means for security<br><br></strong>Tune in for a deep dive!</p><p><strong>Connect with Ian Livingstone:</strong></p><p>Website: <a href="https://www.ianlivingstone.ca/">https://www.ianlivingstone.ca/</a></p><p>Twitter: <a href="https://x.com/ianlivingstone">https://x.com/ianlivingstone</a><br><br><strong>Connect with Anshuman:</strong></p><p>LinkedIn: &#8288;&#8288;&#8288;&#8288;<a href="https://www.linkedin.com/in/anshumanbhartiya/">&#8288;&#8288;anshumanbhartiya&#8288;&#8288;</a></p><p>X: &#8288;&#8288;&#8288;&#8288;<a href="https://x.com/anshuman_bh">&#8288;&#8288;https://x.com/anshuman_bh&#8288;&#8288;</a></p><p>Website: &#8288;&#8288;&#8288;&#8288;<a href="https://anshumanbhartiya.com/">&#8288;&#8288;https://anshumanbhartiya.com/&#8288;&#8288;</a></p><p>&#8288;&#8288;&#8288;&#8288;Instagram:<a href="https://www.instagram.com/anshuman.bhartiya/#"> &#8288;&#8288;anshuman.bhartiya&#8288;</a></p><p><a href="https://www.instagram.com/anshuman.bhartiya/#">&#8288;&#8288;&#8288;</a><strong>Connect with Sandesh:</strong></p><p>LinkedIn: &#8288;&#8288;&#8288;&#8288;<a href="https://www.linkedin.com/in/anandsandesh/">&#8288;&#8288;anandsandesh&#8288;&#8288;</a></p><p>X: &#8288;&#8288;&#8288;&#8288;<a href="https://x.com/JubbaOnJeans">&#8288;&#8288;https://x.com/JubbaOnJeans</a><br></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.boringappsec.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The BoringAppSec Community! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Edition 32: BigCo is building in AppSec, but it's too early to get excited]]></title><description><![CDATA[OpenAI, Anthropic, Google Deepmind, GitHub, & AWS have announced AI-powered AppSec solutions. But should we get ready to switch?]]></description><link>https://www.boringappsec.com/p/edition-32-bigco-is-building-in-appsec</link><guid isPermaLink="false">https://www.boringappsec.com/p/edition-32-bigco-is-building-in-appsec</guid><dc:creator><![CDATA[Sandesh Mysore Anand]]></dc:creator><pubDate>Tue, 27 Jan 2026 08:44:17 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!IykE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd08dc48e-a31b-4ebe-ae0b-35a22466beae_1935x1831.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!IykE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd08dc48e-a31b-4ebe-ae0b-35a22466beae_1935x1831.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!IykE!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd08dc48e-a31b-4ebe-ae0b-35a22466beae_1935x1831.png 424w, https://substackcdn.com/image/fetch/$s_!IykE!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd08dc48e-a31b-4ebe-ae0b-35a22466beae_1935x1831.png 848w, https://substackcdn.com/image/fetch/$s_!IykE!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd08dc48e-a31b-4ebe-ae0b-35a22466beae_1935x1831.png 1272w, https://substackcdn.com/image/fetch/$s_!IykE!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd08dc48e-a31b-4ebe-ae0b-35a22466beae_1935x1831.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!IykE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd08dc48e-a31b-4ebe-ae0b-35a22466beae_1935x1831.png" width="484" height="457.98656330749355" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d08dc48e-a31b-4ebe-ae0b-35a22466beae_1935x1831.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1831,&quot;width&quot;:1935,&quot;resizeWidth&quot;:484,&quot;bytes&quot;:7239494,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.boringappsec.com/i/185854026?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc69a1871-c0a6-426d-99fb-48b08aa933a9_2048x2048.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!IykE!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd08dc48e-a31b-4ebe-ae0b-35a22466beae_1935x1831.png 424w, https://substackcdn.com/image/fetch/$s_!IykE!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd08dc48e-a31b-4ebe-ae0b-35a22466beae_1935x1831.png 848w, https://substackcdn.com/image/fetch/$s_!IykE!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd08dc48e-a31b-4ebe-ae0b-35a22466beae_1935x1831.png 1272w, https://substackcdn.com/image/fetch/$s_!IykE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd08dc48e-a31b-4ebe-ae0b-35a22466beae_1935x1831.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Nano Banana&#8217;s summary of what this post is about :)</figcaption></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.boringappsec.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.boringappsec.com/subscribe?"><span>Subscribe now</span></a></p><h3>Before we begin&#8230;</h3><p><em>Happy New Year! As some of you may have noticed, we have made a few exciting changes to Boring AppSec. Nothing changes for this newsletter, but you can now access all episodes of the BoringAppSec Podcast <a href="https://www.boringappsec.com/s/podcast">here</a>. We also have <a href="https://www.linkedin.com/in/anshumanbhartiya/">Anshuman</a>, bring his sharp thoughts on AI &amp; Security to the Boring AppSec Platform <a href="https://www.boringappsec.com/s/ai-security-engineer">here</a>. Finally, we have a Slack community where readers and authors of Boring AppSec hangout. Come <a href="https://join.slack.com/t/theboringapps-dzi3480/shared_invite/zt-3m1vqv3t3-kmWA9qaG~bqQiR7tnPwA2A">join us</a> if that&#8217;s your thing!</em> </p><div><hr></div><p>2025 was a year of breakneck speed in AI, but one trend mildly surprised me: Frontier labs and hyperscalers actively building AppSec tools. </p><p>After decades of yelling from the rooftops about AppSec's importance, it looks like the tech industry is finally paying attention. Over the holidays, I dug deep to understand what this means for our industry. For now, I think the real impact is not that we have better AppSec tools (we don&#8217;t), but it gives us a peek into what&#8217;s coming next. </p><p>Here are a few thoughts:</p><p><strong>1. Most of what we saw in 2025 from BigCo was demoware.</strong> </p><p>Aardvark launched 3 months ago and is still in private beta. In that time, OpenAI has shipped multiple models, released many new versions of Codex, and much more. A few weeks before this, Anthropic launched the &#8220;security review&#8221; command within Claude Code and a companion GitHub Action to review PRs. An elegant solution on top of the mighty impressive Claude Code application. But security-review.md hasn&#8217;t been updated in 5 months. In that same window, Anthropic released multiple new models, took the code gen world by storm, and is threatening to do the same for non-engineers with Claude CoWork. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!kHPa!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd216d37f-ce31-4a17-b0f7-636d0140e5aa_943x698.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!kHPa!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd216d37f-ce31-4a17-b0f7-636d0140e5aa_943x698.png 424w, https://substackcdn.com/image/fetch/$s_!kHPa!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd216d37f-ce31-4a17-b0f7-636d0140e5aa_943x698.png 848w, https://substackcdn.com/image/fetch/$s_!kHPa!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd216d37f-ce31-4a17-b0f7-636d0140e5aa_943x698.png 1272w, https://substackcdn.com/image/fetch/$s_!kHPa!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd216d37f-ce31-4a17-b0f7-636d0140e5aa_943x698.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!kHPa!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd216d37f-ce31-4a17-b0f7-636d0140e5aa_943x698.png" width="496" height="367.13467656415696" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d216d37f-ce31-4a17-b0f7-636d0140e5aa_943x698.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:698,&quot;width&quot;:943,&quot;resizeWidth&quot;:496,&quot;bytes&quot;:117430,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.boringappsec.com/i/185854026?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd216d37f-ce31-4a17-b0f7-636d0140e5aa_943x698.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!kHPa!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd216d37f-ce31-4a17-b0f7-636d0140e5aa_943x698.png 424w, https://substackcdn.com/image/fetch/$s_!kHPa!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd216d37f-ce31-4a17-b0f7-636d0140e5aa_943x698.png 848w, https://substackcdn.com/image/fetch/$s_!kHPa!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd216d37f-ce31-4a17-b0f7-636d0140e5aa_943x698.png 1272w, https://substackcdn.com/image/fetch/$s_!kHPa!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd216d37f-ce31-4a17-b0f7-636d0140e5aa_943x698.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">I am impressed by simplicity and the underlying framework behind Claude Code Security Review, but we haven&#8217;t seen a single update in 5 months</figcaption></figure></div><p>AWS&#8217;s Security Agent promises to automate Security Reviews, SAST, and Pen Testing. I tested a few of these tools and found them underwhelming compared to what these teams are capable of.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!GV8D!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c70d859-7fa6-4ba2-91f3-600d61a0cfab_1119x642.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!GV8D!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c70d859-7fa6-4ba2-91f3-600d61a0cfab_1119x642.png 424w, https://substackcdn.com/image/fetch/$s_!GV8D!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c70d859-7fa6-4ba2-91f3-600d61a0cfab_1119x642.png 848w, https://substackcdn.com/image/fetch/$s_!GV8D!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c70d859-7fa6-4ba2-91f3-600d61a0cfab_1119x642.png 1272w, https://substackcdn.com/image/fetch/$s_!GV8D!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c70d859-7fa6-4ba2-91f3-600d61a0cfab_1119x642.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!GV8D!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c70d859-7fa6-4ba2-91f3-600d61a0cfab_1119x642.png" width="412" height="236.37533512064343" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3c70d859-7fa6-4ba2-91f3-600d61a0cfab_1119x642.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:642,&quot;width&quot;:1119,&quot;resizeWidth&quot;:412,&quot;bytes&quot;:187250,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.boringappsec.com/i/185854026?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c70d859-7fa6-4ba2-91f3-600d61a0cfab_1119x642.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!GV8D!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c70d859-7fa6-4ba2-91f3-600d61a0cfab_1119x642.png 424w, https://substackcdn.com/image/fetch/$s_!GV8D!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c70d859-7fa6-4ba2-91f3-600d61a0cfab_1119x642.png 848w, https://substackcdn.com/image/fetch/$s_!GV8D!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c70d859-7fa6-4ba2-91f3-600d61a0cfab_1119x642.png 1272w, https://substackcdn.com/image/fetch/$s_!GV8D!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c70d859-7fa6-4ba2-91f3-600d61a0cfab_1119x642.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!j0_A!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd76fd4c4-eb98-4a31-a3e2-c2474743c139_1021x601.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!j0_A!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd76fd4c4-eb98-4a31-a3e2-c2474743c139_1021x601.png 424w, https://substackcdn.com/image/fetch/$s_!j0_A!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd76fd4c4-eb98-4a31-a3e2-c2474743c139_1021x601.png 848w, https://substackcdn.com/image/fetch/$s_!j0_A!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd76fd4c4-eb98-4a31-a3e2-c2474743c139_1021x601.png 1272w, https://substackcdn.com/image/fetch/$s_!j0_A!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd76fd4c4-eb98-4a31-a3e2-c2474743c139_1021x601.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!j0_A!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd76fd4c4-eb98-4a31-a3e2-c2474743c139_1021x601.png" width="620" height="364.95592556317337" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d76fd4c4-eb98-4a31-a3e2-c2474743c139_1021x601.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:601,&quot;width&quot;:1021,&quot;resizeWidth&quot;:620,&quot;bytes&quot;:111625,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.boringappsec.com/i/185854026?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd76fd4c4-eb98-4a31-a3e2-c2474743c139_1021x601.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!j0_A!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd76fd4c4-eb98-4a31-a3e2-c2474743c139_1021x601.png 424w, https://substackcdn.com/image/fetch/$s_!j0_A!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd76fd4c4-eb98-4a31-a3e2-c2474743c139_1021x601.png 848w, https://substackcdn.com/image/fetch/$s_!j0_A!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd76fd4c4-eb98-4a31-a3e2-c2474743c139_1021x601.png 1272w, https://substackcdn.com/image/fetch/$s_!j0_A!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd76fd4c4-eb98-4a31-a3e2-c2474743c139_1021x601.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">The Security Review agent looks for a grand total of 11 security controls</figcaption></figure></div><p>These companies have insanely talented teams. The effort on what&#8217;s shipped so far leads me to believe the goal was not to build world-class AppSec products, but to demonstrate capability. Show what&#8217;s possible with frontier models rather than grow revenue with AppSec tools.</p><p><strong>2. This complicates things for AppSec teams.</strong></p><p>If I had a nickel every time someone asked me, &#8220;But won&#8217;t Cursor replace AppSec?&#8221;, I&#8217;d be a rich man. AppSec teams are probably hearing the same from their CFOs: why spend $$ on SAST tools when Claude can do it? I hear you can just &#8220;vibe code&#8221; software now, why not build it in-house? Why go through procurement hell when AWS has a free option?</p><p>These are valid questions. But notice what happened: the burden of proof just shifted to the AppSec team. They now have to prove why a dedicated security vendor is better than the behemoths. I wouldn&#8217;t blame anyone for invoking the old &#8220;nobody gets fired for buying IBM&#8221; adage and giving in. Others will do the work to show these tools aren&#8217;t ready. Either way, AppSec teams are stuck with a bad trade-off: accept the demoware to keep the peace, or spend time fighting a battle they shouldn&#8217;t have to fight.</p><p><strong>3. I don&#8217;t blame the labs for this.</strong></p><p>LLMs are generating more code than ever. More code means more vulnerabilities. But it also means the bottleneck has shifted. Writing code is no longer the constraint; reviewing it is. Security reviews included. The labs know this, and they&#8217;re trying to get ahead of it.</p><p>This isn&#8217;t new. Every major technology shift creates security problems, and the companies closest to the shift usually take a first crack at solving them. Cloud created misconfiguration hell, so AWS built GuardDuty. LLMs are creating insecure code at scale and overwhelming review capacity, so the labs are building AppSec tools. </p><p><strong>4. What does this mean for AppSec vendors?</strong></p><p>Probably not as much as you&#8217;d think. GitHub has 100M+ developers, native workflow integration, and Microsoft&#8217;s backing. They&#8217;ve had GHAS for years. And yet Snyk and Semgrep are thriving. AWS built GuardDuty, and Wiz still became one of the fastest-growing security companies ever.</p><p>Why? Security isn&#8217;t a winner-take-all market. I don&#8217;t want to beat the platform v/s point-solution drum again, but history tells us both survive. And while it&#8217;s tempting to go &#8220;AI changes things&#8221;, I am not sure how. </p><p><strong>5. 2026 may be different.</strong></p><p>Even if their attempts in 2025 were feeble, there are signs the labs are getting serious. Anthropic recently <a href="https://www.linkedin.com/posts/ilyakabanov_aiforsecurity-cybersecurity-ai-ugcPost-7417940460805009408-oYWf">hired a SentinelOne product executive</a> to lead cybersecurity products. OpenAI has researchers working on Aardvark. Job listings hint at roadmaps with a higher focus on Cybersecurity products. I wouldn&#8217;t be surprised if we see 1-2 credible AppSec products from these labs in the next 12-18 months. But if history is any indication, AppSec products of all kinds (from labs, startups, old school players) will continue to thrive, while analysts and bloggers continue the pointless platforms v/s point solutions debates :P</p><p>That&#8217;s it for today! Are you an AppSec professional who has been asked the &#8220;but won&#8217;t Claude kill AppSec&#8221; question? Do you think what we have today from the labs is more than just demoware? How are you leveraging AI to scale AppSec? Let me know! You can drop me a message on <a href="https://twitter.com/JubbaOnJeans">Twitter </a>(or whatever it is called these days), <a href="https://www.linkedin.com/in/anandsandesh/">LinkedIn</a>, or <a href="mailto:sandesh@seezo.io">email</a>. I am also the co-founder of <a href="https://seezo.io/">Seezo</a>. We help companies automate security design reviews at scale. Check us out if that&#8217;s your thing :) If you find this newsletter useful, share it with a friend, colleague, or on social media.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.boringappsec.com/p/edition-32-bigco-is-building-in-appsec?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.boringappsec.com/p/edition-32-bigco-is-building-in-appsec?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p></p><p></p>]]></content:encoded></item><item><title><![CDATA[Browser Relay: When Your AI Assistant Gets Hands on Your Browser]]></title><description><![CDATA[Disclaimer: This is a cross post from my tech blog, co-authored by my personal AI assistant Sage.]]></description><link>https://www.boringappsec.com/p/browser-relay-when-your-ai-assistant</link><guid isPermaLink="false">https://www.boringappsec.com/p/browser-relay-when-your-ai-assistant</guid><dc:creator><![CDATA[Anshuman Bhartiya]]></dc:creator><pubDate>Mon, 26 Jan 2026 23:35:56 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!wpue!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58fa0794-b041-428c-8354-4c4efa6faec8_1408x768.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!wpue!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58fa0794-b041-428c-8354-4c4efa6faec8_1408x768.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!wpue!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58fa0794-b041-428c-8354-4c4efa6faec8_1408x768.png 424w, https://substackcdn.com/image/fetch/$s_!wpue!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58fa0794-b041-428c-8354-4c4efa6faec8_1408x768.png 848w, https://substackcdn.com/image/fetch/$s_!wpue!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58fa0794-b041-428c-8354-4c4efa6faec8_1408x768.png 1272w, https://substackcdn.com/image/fetch/$s_!wpue!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58fa0794-b041-428c-8354-4c4efa6faec8_1408x768.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!wpue!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58fa0794-b041-428c-8354-4c4efa6faec8_1408x768.png" width="1408" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/58fa0794-b041-428c-8354-4c4efa6faec8_1408x768.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1408,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1300235,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.boringappsec.com/i/185904699?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58fa0794-b041-428c-8354-4c4efa6faec8_1408x768.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!wpue!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58fa0794-b041-428c-8354-4c4efa6faec8_1408x768.png 424w, https://substackcdn.com/image/fetch/$s_!wpue!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58fa0794-b041-428c-8354-4c4efa6faec8_1408x768.png 848w, https://substackcdn.com/image/fetch/$s_!wpue!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58fa0794-b041-428c-8354-4c4efa6faec8_1408x768.png 1272w, https://substackcdn.com/image/fetch/$s_!wpue!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58fa0794-b041-428c-8354-4c4efa6faec8_1408x768.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><blockquote><p>Disclaimer: This is a cross post from my tech <a href="https://www.anshuman.ai/posts/browser-relay">blog</a>, co-authored by my personal AI assistant <a href="https://www.anshuman.ai/posts/my-clawdbot-setup">Sage</a>.</p></blockquote><h1>Introduction</h1><p>I am sure you have felt it &#8212; the relentless firehose of information. X (formerly Twitter) has become ground zero for AI and tech announcements. LinkedIn? Usually a few days behind. By the time something hits LinkedIn, the X crowd has already dissected it, built demos, and moved on.</p><p>So, like many others, I started using X bookmarks religiously. Every interesting thread, every promising tool, every &#8220;I need to read this later&#8221; moment &#8212; bookmarked. The problem? &#8220;Later&#8221; never comes. My bookmark count just kept growing.</p><p>I already have <a href="https://www.anshuman.ai/posts/my-clawdbot-setup">Sage</a> running &#8212; my personal AI assistant powered by Clawdbot. While browsing the <a href="https://docs.clawd.bot/">Clawdbot docs</a>, I stumbled onto something called <strong>Browser Relay</strong> and decided to give it a try.</p><div><hr></div><h1>Browser Relay: Giving Your AI Hands</h1><p>Here&#8217;s the pitch: instead of wrestling with APIs, what if your AI assistant could just&#8230; use your browser? Like you do?</p><p>The idea sounds crazy at first. But think about it &#8212; when you want to check your bookmarks, you open Chrome, go to X, and scroll through them. What if Claude could do the same thing?</p><p>That&#8217;s exactly what Browser Relay enables. It&#8217;s a <a href="https://docs.clawd.bot/tools/chrome-extension">Chrome extension</a> that gives Clawdbot the ability to control a tab in your browser. The AI can navigate, click, scroll, read content &#8212; basically do whatever you could do manually.</p><p>And yes, you can trigger all of this from a Telegram message.</p><div><hr></div><h1>How It Actually Works</h1><p>Let me walk you through the architecture. Based on the <a href="https://docs.clawd.bot/tools/browser">Clawdbot browser documentation</a>, here&#8217;s how the pieces fit together:</p><h2>The Architecture</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!lZ4C!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F942b1d3a-c86f-4a49-a4e0-ef392325e9b6_1408x768.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!lZ4C!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F942b1d3a-c86f-4a49-a4e0-ef392325e9b6_1408x768.png 424w, https://substackcdn.com/image/fetch/$s_!lZ4C!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F942b1d3a-c86f-4a49-a4e0-ef392325e9b6_1408x768.png 848w, https://substackcdn.com/image/fetch/$s_!lZ4C!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F942b1d3a-c86f-4a49-a4e0-ef392325e9b6_1408x768.png 1272w, https://substackcdn.com/image/fetch/$s_!lZ4C!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F942b1d3a-c86f-4a49-a4e0-ef392325e9b6_1408x768.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!lZ4C!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F942b1d3a-c86f-4a49-a4e0-ef392325e9b6_1408x768.png" width="1408" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/942b1d3a-c86f-4a49-a4e0-ef392325e9b6_1408x768.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1408,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:842758,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.boringappsec.com/i/185904699?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F942b1d3a-c86f-4a49-a4e0-ef392325e9b6_1408x768.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!lZ4C!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F942b1d3a-c86f-4a49-a4e0-ef392325e9b6_1408x768.png 424w, https://substackcdn.com/image/fetch/$s_!lZ4C!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F942b1d3a-c86f-4a49-a4e0-ef392325e9b6_1408x768.png 848w, https://substackcdn.com/image/fetch/$s_!lZ4C!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F942b1d3a-c86f-4a49-a4e0-ef392325e9b6_1408x768.png 1272w, https://substackcdn.com/image/fetch/$s_!lZ4C!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F942b1d3a-c86f-4a49-a4e0-ef392325e9b6_1408x768.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The system has two main components:</p><ol><li><p><strong>Browser Control Server</strong> (port 18791): An HTTP API that receives commands from the Clawdbot agent. &#8220;Click this button.&#8221; &#8220;Navigate to this URL.&#8221; &#8220;Take a screenshot.&#8221; It connects to Chrome via the Chrome DevTools Protocol (CDP).</p></li><li><p><strong>Chrome Extension</strong>: Uses Chrome&#8217;s <code>chrome.debugger</code> API to enable CDP access. When you click the extension icon on a tab, it attaches that tab to the relay &#8212; giving the control server the ability to drive it.</p></li></ol><p>The control server talks to Chrome via CDP (defaulting to port 18792), which is the same protocol that powers Chrome&#8217;s developer tools. This is what makes the whole thing possible &#8212; CDP provides programmatic access to everything in the browser.</p><h2>My Setup</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!05nl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc14b7cb0-ba05-45b8-8868-b6b7264bdcae_1408x768.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!05nl!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc14b7cb0-ba05-45b8-8868-b6b7264bdcae_1408x768.png 424w, https://substackcdn.com/image/fetch/$s_!05nl!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc14b7cb0-ba05-45b8-8868-b6b7264bdcae_1408x768.png 848w, https://substackcdn.com/image/fetch/$s_!05nl!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc14b7cb0-ba05-45b8-8868-b6b7264bdcae_1408x768.png 1272w, https://substackcdn.com/image/fetch/$s_!05nl!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc14b7cb0-ba05-45b8-8868-b6b7264bdcae_1408x768.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!05nl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc14b7cb0-ba05-45b8-8868-b6b7264bdcae_1408x768.png" width="1408" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c14b7cb0-ba05-45b8-8868-b6b7264bdcae_1408x768.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1408,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:221177,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.boringappsec.com/i/185904699?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc14b7cb0-ba05-45b8-8868-b6b7264bdcae_1408x768.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!05nl!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc14b7cb0-ba05-45b8-8868-b6b7264bdcae_1408x768.png 424w, https://substackcdn.com/image/fetch/$s_!05nl!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc14b7cb0-ba05-45b8-8868-b6b7264bdcae_1408x768.png 848w, https://substackcdn.com/image/fetch/$s_!05nl!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc14b7cb0-ba05-45b8-8868-b6b7264bdcae_1408x768.png 1272w, https://substackcdn.com/image/fetch/$s_!05nl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc14b7cb0-ba05-45b8-8868-b6b7264bdcae_1408x768.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>In my case, this is what the flow looks like:</p><pre><code><code>Telegram Message
      &#8595;
Synology NAS (Docker container running Clawdbot Gateway + Agent)
      &#8595;
Claude API (Anthropic cloud)
      &#8595;
When browser access needed...
      &#8595;
Browser Relay &#8594; Mac (Chrome with extension enabled)
</code></code></pre><p>The gateway lives on my NAS as a Docker container. The browser runs on my Mac. They talk to each other over my local network (secured via <a href="https://docs.clawd.bot/gateway/tailscale">Tailscale</a>). When the agent needs to access something in the browser, it reaches out to my Mac where Chrome is running with the relay extension.</p><blockquote><p><strong>Note:</strong> I&#8217;m moving to a Mac Mini for the gateway soon &#8212; Docker image builds on the NAS take forever. The flexibility of self-hosting means you can evolve your setup over time.</p></blockquote><h2>The Manual Attachment Requirement</h2><p>Here&#8217;s an important security detail from the <a href="https://docs.clawd.bot/tools/chrome-extension">Chrome extension docs</a>: <strong>the extension doesn&#8217;t auto-attach to tabs</strong>. You have to explicitly click the Clawdbot toolbar icon to attach a tab. The badge shows:</p><ul><li><p><code>ON</code> &#8212; attached and ready</p></li><li><p><code>&#8230;</code> &#8212; connecting</p></li><li><p><code>!</code> &#8212; relay unreachable</p></li></ul><p>This is intentional. You&#8217;re granting the AI access to a specific tab, not handing over your entire browser.</p><div><hr></div><h1>The Magic Moment</h1><p>The first time I triggered this from Telegram, I won&#8217;t lie &#8212; it felt like magic.</p><p>I sent a message: &#8220;Sage, go through my X bookmarks and summarize the AI-related ones.&#8221;</p><p>Then I watched my Chrome tab come alive. Navigation happening. Scrolling. The AI reading through my bookmarks, clicking into threads, extracting the content. All while I just&#8230; watched.</p><p>A few minutes later, a nicely formatted summary landed in my Telegram chat.</p><p>No API keys. No rate limits. No authentication dance. Just my AI assistant using my browser like I would.</p><div><hr></div><h1>Now, Let&#8217;s Put the Security Hat On</h1><p>Alright, time for the uncomfortable conversation. Because as cool as this is, there&#8217;s a reason the <a href="https://docs.clawd.bot/gateway/security">Clawdbot security guide</a> includes this warning:</p><blockquote><p>&#8220;This is powerful and risky. Treat it like giving the model &#8216;hands on your browser&#8217;.&#8221;</p></blockquote><p>Let&#8217;s break down what you&#8217;re actually enabling:</p><h2>What the AI Can Access</h2><p>When you attach a tab via Browser Relay, the AI can:</p><ul><li><p>Navigate to any URL in that tab</p></li><li><p>Click, type, and interact with any element</p></li><li><p>Read all page content (DOM, text)</p></li><li><p><strong>Access whatever you&#8217;re logged into</strong> &#8212; cookies, session state, everything</p></li><li><p>Can run JS predicates/evaluations via the browser tool interface</p></li></ul><p>Read that third-to-last point again. If you&#8217;re logged into Gmail in that tab, the AI can read and send emails. If you&#8217;re logged into your bank&#8230; you get the picture.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.boringappsec.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The BoringAppSec Community! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h2>The Attack Surface</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!E7kw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb710ba11-27b5-4899-8c73-0714e29c3845_1330x584.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!E7kw!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb710ba11-27b5-4899-8c73-0714e29c3845_1330x584.png 424w, https://substackcdn.com/image/fetch/$s_!E7kw!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb710ba11-27b5-4899-8c73-0714e29c3845_1330x584.png 848w, https://substackcdn.com/image/fetch/$s_!E7kw!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb710ba11-27b5-4899-8c73-0714e29c3845_1330x584.png 1272w, https://substackcdn.com/image/fetch/$s_!E7kw!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb710ba11-27b5-4899-8c73-0714e29c3845_1330x584.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!E7kw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb710ba11-27b5-4899-8c73-0714e29c3845_1330x584.png" width="1330" height="584" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b710ba11-27b5-4899-8c73-0714e29c3845_1330x584.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:584,&quot;width&quot;:1330,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:122567,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.boringappsec.com/i/185904699?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb710ba11-27b5-4899-8c73-0714e29c3845_1330x584.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!E7kw!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb710ba11-27b5-4899-8c73-0714e29c3845_1330x584.png 424w, https://substackcdn.com/image/fetch/$s_!E7kw!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb710ba11-27b5-4899-8c73-0714e29c3845_1330x584.png 848w, https://substackcdn.com/image/fetch/$s_!E7kw!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb710ba11-27b5-4899-8c73-0714e29c3845_1330x584.png 1272w, https://substackcdn.com/image/fetch/$s_!E7kw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb710ba11-27b5-4899-8c73-0714e29c3845_1330x584.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>A Note on Prompt Injection and Modern Models</h2><p>That said, it&#8217;s worth noting that state-of-the-art models have gotten significantly better at detecting and resisting prompt injection attempts. Anthropic&#8217;s Claude Opus 4.5, in particular, has shown strong resistance to adversarial prompts embedded in page content.</p><p>This doesn&#8217;t mean the risk is zero &#8212; you should still be cautious about which pages you attach. But I&#8217;ve decided to stick with Opus 4.5 as my inference model specifically because of its robustness against these attacks. The combination of a capable model that can resist manipulation, plus the manual tab attachment requirement, gives me reasonable confidence for my use case.</p><p>Your mileage may vary depending on your risk tolerance and what you&#8217;re accessing.</p><h2>What This Is NOT</h2><p>To be clear: Browser Relay is <strong>not</strong> the same as Clawdbot&#8217;s <a href="https://docs.clawd.bot/tools/browser">managed browser profile</a> (the <code>clawd</code> profile). That one runs in isolation &#8212; separate user data directory, no access to your personal sessions.</p><p>Browser Relay explicitly uses YOUR browser with YOUR logged-in sessions. That&#8217;s the whole point &#8212; and the whole risk.</p><div><hr></div><h1>The Trust Model: Self-Hosted = You&#8217;re the Security Team</h1><p>Here&#8217;s where it gets philosophical. The entire architecture is self-hosted. My gateway runs on my NAS. The browser relay is on my Mac. All traffic stays on my local network (or Tailscale mesh).</p><p>There&#8217;s no Clawdbot cloud service harvesting my data. No third-party servers in the middle. Browser relay traffic stays local or on your tailnet; outbound calls depend on your enabled providers and tools (LLM APIs, skills registry, etc.).</p><p><strong>But here&#8217;s the tradeoff</strong>: you&#8217;re the security team now.</p><p>If you misconfigure something &#8212; say, bind the browser control server to <code>0.0.0.0</code> instead of loopback &#8212; you&#8217;ve just exposed your browser to your entire network. The <a href="https://docs.clawd.bot/gateway/security">Clawdbot security docs</a> are explicit about this:</p><blockquote><p>&#8220;Never bind to 0.0.0.0. Never use Tailscale Funnel for browser control.&#8221;</p></blockquote><p><strong>Recommendations from the docs:</strong></p><ul><li><p>Use a dedicated Chrome profile for the relay (not your daily browser)</p></li><li><p>Keep the browser control server on loopback + Tailscale only</p></li><li><p>Use separate tokens for browser control vs. gateway auth</p></li><li><p>I&#8217;m choosing Opus 4.5 for its robustness against prompt injection</p></li></ul><p>Being an early adopter of AI tooling means accepting this responsibility. You don&#8217;t get a security team vetting your setup. You ARE the security team.</p><div><hr></div><h1>When to Use What</h1><p>So should you use Browser Relay? It depends on your use case and risk tolerance.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!m7zp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0cf5a090-e54c-4c62-980d-22cdd81cf55f_1320x500.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!m7zp!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0cf5a090-e54c-4c62-980d-22cdd81cf55f_1320x500.png 424w, https://substackcdn.com/image/fetch/$s_!m7zp!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0cf5a090-e54c-4c62-980d-22cdd81cf55f_1320x500.png 848w, https://substackcdn.com/image/fetch/$s_!m7zp!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0cf5a090-e54c-4c62-980d-22cdd81cf55f_1320x500.png 1272w, https://substackcdn.com/image/fetch/$s_!m7zp!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0cf5a090-e54c-4c62-980d-22cdd81cf55f_1320x500.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!m7zp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0cf5a090-e54c-4c62-980d-22cdd81cf55f_1320x500.png" width="1320" height="500" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0cf5a090-e54c-4c62-980d-22cdd81cf55f_1320x500.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:500,&quot;width&quot;:1320,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:116163,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.boringappsec.com/i/185904699?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0cf5a090-e54c-4c62-980d-22cdd81cf55f_1320x500.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!m7zp!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0cf5a090-e54c-4c62-980d-22cdd81cf55f_1320x500.png 424w, https://substackcdn.com/image/fetch/$s_!m7zp!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0cf5a090-e54c-4c62-980d-22cdd81cf55f_1320x500.png 848w, https://substackcdn.com/image/fetch/$s_!m7zp!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0cf5a090-e54c-4c62-980d-22cdd81cf55f_1320x500.png 1272w, https://substackcdn.com/image/fetch/$s_!m7zp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0cf5a090-e54c-4c62-980d-22cdd81cf55f_1320x500.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>For my bookmark use case, Browser Relay makes sense. I need access to my logged-in X account. An API would require credentials I don&#8217;t want to manage. The managed browser would require me to log into X separately.</p><p>But for web scraping random sites? I&#8217;d use the managed browser profile. No need to expose my personal sessions for that.</p><div><hr></div><h1>Conclusion and Final Thoughts</h1><p>The AI wave is here, and tools like Clawdbot are making it accessible to anyone willing to run their own infrastructure. Browser Relay is one of those capabilities that feels like a glimpse into the future &#8212; your AI assistant operating your computer on your behalf.</p><p>But with great power comes great responsibility (sorry, had to).</p><p>If you&#8217;re going to use Browser Relay:</p><ol><li><p><strong>Use a dedicated Chrome profile</strong> &#8212; not your daily driver</p></li><li><p><strong>Keep everything on Tailscale</strong> &#8212; no public exposure</p></li><li><p><strong>Understand what you&#8217;re enabling</strong> &#8212; full session access is no joke</p></li><li><p><strong>Stay paranoid</strong> &#8212; prompt injection is a real risk, but modern models help</p></li><li><p><strong>Stick with robust models</strong> &#8212; Opus 4.5 offers good protection against adversarial prompts</p></li></ol><p>The early adopter tax is real. But honestly? Watching my AI scroll through my bookmarks while I sip coffee might just be worth it.</p><p>If you&#8217;re interested in trying Clawdbot, check out the <a href="https://docs.clawd.bot/">documentation</a> and join the <a href="https://discord.com/invite/clawd">Discord community</a>. And if you have thoughts on the security model, I&#8217;d love to hear them &#8212; reach out on <a href="https://www.linkedin.com/in/anshumanbhartiya/">LinkedIn</a> or <a href="https://x.com/anshuman_bh">X</a>.</p><p>Until next time, ciao!</p><div><hr></div><h2>References</h2><ul><li><p><a href="https://docs.clawd.bot/">Clawdbot Documentation</a></p></li><li><p><a href="https://docs.clawd.bot/start/showcase">Clawdbot Showcase</a> &#8212; Community projects</p></li><li><p><a href="https://docs.clawd.bot/tools/chrome-extension">Browser Relay Setup Guide</a></p></li><li><p><a href="https://docs.clawd.bot/tools/browser">Browser Tool Reference</a></p></li><li><p><a href="https://docs.clawd.bot/gateway/security">Clawdbot Security Guide</a></p></li><li><p><a href="https://docs.clawd.bot/gateway/tailscale">Tailscale Integration</a></p></li><li><p><a href="https://chromedevtools.github.io/devtools-protocol/">Chrome DevTools Protocol</a></p></li></ul>]]></content:encoded></item><item><title><![CDATA[Skills: The Missing Piece in AI Security Tooling]]></title><description><![CDATA[Building on the SecureVibes series]]></description><link>https://www.boringappsec.com/p/skills-the-missing-piece-in-ai-security</link><guid isPermaLink="false">https://www.boringappsec.com/p/skills-the-missing-piece-in-ai-security</guid><dc:creator><![CDATA[Anshuman Bhartiya]]></dc:creator><pubDate>Fri, 23 Jan 2026 06:13:18 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Jkn1!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d40c9af-812c-4b3c-b8bb-8c2e24f94b20_1408x768.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Jkn1!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d40c9af-812c-4b3c-b8bb-8c2e24f94b20_1408x768.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Jkn1!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d40c9af-812c-4b3c-b8bb-8c2e24f94b20_1408x768.png 424w, https://substackcdn.com/image/fetch/$s_!Jkn1!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d40c9af-812c-4b3c-b8bb-8c2e24f94b20_1408x768.png 848w, https://substackcdn.com/image/fetch/$s_!Jkn1!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d40c9af-812c-4b3c-b8bb-8c2e24f94b20_1408x768.png 1272w, https://substackcdn.com/image/fetch/$s_!Jkn1!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d40c9af-812c-4b3c-b8bb-8c2e24f94b20_1408x768.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Jkn1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d40c9af-812c-4b3c-b8bb-8c2e24f94b20_1408x768.png" width="1408" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7d40c9af-812c-4b3c-b8bb-8c2e24f94b20_1408x768.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1408,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2002584,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.boringappsec.com/i/185505486?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d40c9af-812c-4b3c-b8bb-8c2e24f94b20_1408x768.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Jkn1!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d40c9af-812c-4b3c-b8bb-8c2e24f94b20_1408x768.png 424w, https://substackcdn.com/image/fetch/$s_!Jkn1!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d40c9af-812c-4b3c-b8bb-8c2e24f94b20_1408x768.png 848w, https://substackcdn.com/image/fetch/$s_!Jkn1!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d40c9af-812c-4b3c-b8bb-8c2e24f94b20_1408x768.png 1272w, https://substackcdn.com/image/fetch/$s_!Jkn1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d40c9af-812c-4b3c-b8bb-8c2e24f94b20_1408x768.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><blockquote><p>Disclaimer: This is a cross post from my tech <a href="https://www.anshuman.ai/posts/securevibes-part4">blog</a>, co-authored by my personal AI assistant <a href="https://www.anshuman.ai/posts/my-clawdbot-setup">Sage</a>.</p></blockquote><h1>The Industry Problem: One-Size-Fits-All Security Analysis</h1><p>Here&#8217;s a pattern I&#8217;ve seen across the security industry: we build tools that apply the same methodology regardless of what they&#8217;re analyzing.</p><p>Run STRIDE on a web app? You get STRIDE threats. Run STRIDE on a mobile app? Same STRIDE categories. Run STRIDE on a multi-agent AI application with cascade confidence propagation and LLM tool execution? <strong>Still the same STRIDE threats.</strong></p><p>This is a problem because <strong>agentic applications have fundamentally different risk profiles</strong> than traditional software. When your application can:</p><ul><li><p>Execute tools autonomously</p></li><li><p>Chain multiple AI agents together</p></li><li><p>Propagate confidence scores between decision-makers</p></li><li><p>Accept natural language that becomes executable logic</p></li></ul><p>...you need threat modeling that understands these patterns.</p><p>This isn&#8217;t just a SecureVibes problem. It&#8217;s an industry problem. And I believe <strong>skills</strong> are the solution.</p><div><hr></div><h1>What Are Skills? (And Why They Matter)</h1><blockquote><p><strong>TL;DR</strong> &#8212; Skills are modular knowledge packages that augment AI agents with domain-specific expertise. They&#8217;re the LLM-native equivalent of Semgrep rules&#8212;but for reasoning, not pattern matching.</p></blockquote><p>If you&#8217;ve worked with AI coding assistants, you&#8217;ve probably seen context files like <strong>CLAUDE.md</strong> that give the AI information about your codebase. Skills take this concept further - they&#8217;re structured knowledge packages that teach an AI agent <em>how to think</em> about specific domains.</p><p>A security skill might include:</p><ul><li><p><strong>Detection patterns</strong> - How to recognize when the skill applies</p></li><li><p><strong>Threat categories</strong> - Domain-specific vulnerability classes</p></li><li><p><strong>Examples</strong> - Real-world attack scenarios</p></li><li><p><strong>Reference materials</strong> - Validation logic and deeper context</p></li></ul><p><strong>The key insight: skills don&#8217;t replace the agent&#8217;s reasoning - they augment it with domain expertise.</strong></p><p>And here&#8217;s what makes this exciting for the industry: <strong>skills are portable</strong>. They&#8217;re just markdown and code. No vendor lock-in. No proprietary formats.</p><div><hr></div><h1>The Experiment: Proving Skills Work</h1><p>To demonstrate the power of skills, I ran a controlled experiment using <a href="https://github.com/anshumanbh/securevibes">SecureVibes</a>&#8217; threat modeling subagent.</p><h2>The Test Subject: FinBot</h2><p>I used <a href="https://github.com/anshumanbh/finbot-ctf-multiagent">finbot-ctf-multiagent</a> - a multi-agent invoice processing system that&#8217;s the flagship project for OWASP&#8217;s Agentic Security Initiative (ASI).</p><p>FinBot is ideal because it exhibits real agentic patterns:</p><ul><li><p><strong>Multi-agent chain</strong>: ValidatorAgent &#8594; RiskAnalyzerAgent &#8594; ApprovalAgent &#8594; PaymentProcessorAgent</p></li><li><p><strong>Cascade confidence propagation</strong> between agents</p></li><li><p><strong>Custom goal injection</strong> via admin endpoints</p></li><li><p><strong>LLM tool execution</strong> for invoice processing</p></li></ul><blockquote><p><strong>Important note on methodology:</strong> I modified the FinBot codebase to remove all hints, comments, and obvious vulnerability markers. This ensured the testing was purely unbiased - SecureVibes had to discover the agentic patterns and threats on its own, without any breadcrumbs.</p></blockquote><h2>Two Runs, Same Codebase</h2><p><strong>Run 1: Generic STRIDE</strong> (no skills)</p><ul><li><p>Standard threat modeling methodology</p></li><li><p>No context about AI/LLM-specific risks</p></li></ul><p><strong>Run 2: STRIDE + OWASP ASI Skills</strong></p><ul><li><p>Augmented with agentic security skills</p></li><li><p>Skills derived from <a href="https://genai.owasp.org/resource/owasp-top-10-for-agentic-applications-for-2026/">OWASP Top 10 for Agentic Applications</a></p></li></ul><div><hr></div><h1>The Results: Data Speaks</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!1OeJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f4f6453-2c57-48b5-addf-bb2d27eca5a2_1408x768.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1OeJ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f4f6453-2c57-48b5-addf-bb2d27eca5a2_1408x768.png 424w, https://substackcdn.com/image/fetch/$s_!1OeJ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f4f6453-2c57-48b5-addf-bb2d27eca5a2_1408x768.png 848w, https://substackcdn.com/image/fetch/$s_!1OeJ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f4f6453-2c57-48b5-addf-bb2d27eca5a2_1408x768.png 1272w, https://substackcdn.com/image/fetch/$s_!1OeJ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f4f6453-2c57-48b5-addf-bb2d27eca5a2_1408x768.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1OeJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f4f6453-2c57-48b5-addf-bb2d27eca5a2_1408x768.png" width="1408" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0f4f6453-2c57-48b5-addf-bb2d27eca5a2_1408x768.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1408,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1167910,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.boringappsec.com/i/185505486?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f4f6453-2c57-48b5-addf-bb2d27eca5a2_1408x768.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!1OeJ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f4f6453-2c57-48b5-addf-bb2d27eca5a2_1408x768.png 424w, https://substackcdn.com/image/fetch/$s_!1OeJ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f4f6453-2c57-48b5-addf-bb2d27eca5a2_1408x768.png 848w, https://substackcdn.com/image/fetch/$s_!1OeJ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f4f6453-2c57-48b5-addf-bb2d27eca5a2_1408x768.png 1272w, https://substackcdn.com/image/fetch/$s_!1OeJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f4f6453-2c57-48b5-addf-bb2d27eca5a2_1408x768.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>High-Level Comparison</h2><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!xPu1!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9aeb2c55-ad24-4d42-8f56-0c8b7b162d52_1468x330.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!xPu1!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9aeb2c55-ad24-4d42-8f56-0c8b7b162d52_1468x330.png 424w, https://substackcdn.com/image/fetch/$s_!xPu1!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9aeb2c55-ad24-4d42-8f56-0c8b7b162d52_1468x330.png 848w, https://substackcdn.com/image/fetch/$s_!xPu1!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9aeb2c55-ad24-4d42-8f56-0c8b7b162d52_1468x330.png 1272w, https://substackcdn.com/image/fetch/$s_!xPu1!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9aeb2c55-ad24-4d42-8f56-0c8b7b162d52_1468x330.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!xPu1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9aeb2c55-ad24-4d42-8f56-0c8b7b162d52_1468x330.png" width="1456" height="327" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9aeb2c55-ad24-4d42-8f56-0c8b7b162d52_1468x330.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:327,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:49364,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.boringappsec.com/i/185505486?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9aeb2c55-ad24-4d42-8f56-0c8b7b162d52_1468x330.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!xPu1!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9aeb2c55-ad24-4d42-8f56-0c8b7b162d52_1468x330.png 424w, https://substackcdn.com/image/fetch/$s_!xPu1!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9aeb2c55-ad24-4d42-8f56-0c8b7b162d52_1468x330.png 848w, https://substackcdn.com/image/fetch/$s_!xPu1!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9aeb2c55-ad24-4d42-8f56-0c8b7b162d52_1468x330.png 1272w, https://substackcdn.com/image/fetch/$s_!xPu1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9aeb2c55-ad24-4d42-8f56-0c8b7b162d52_1468x330.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>The skill-augmented run found 9 threats in agentic-specific categories that generic STRIDE simply cannot identify. These aren&#8217;t relabeled STRIDE threats - they&#8217;re fundamentally different risk categories that only exist in multi-agent systems.</p><h2>Threat Category Breakdown</h2><h3>STRIDE-Only Distribution</h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!B5mb!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff20ea6d-5ee3-4144-882c-2dee7be5fee7_1052x568.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!B5mb!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff20ea6d-5ee3-4144-882c-2dee7be5fee7_1052x568.png 424w, https://substackcdn.com/image/fetch/$s_!B5mb!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff20ea6d-5ee3-4144-882c-2dee7be5fee7_1052x568.png 848w, https://substackcdn.com/image/fetch/$s_!B5mb!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff20ea6d-5ee3-4144-882c-2dee7be5fee7_1052x568.png 1272w, https://substackcdn.com/image/fetch/$s_!B5mb!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff20ea6d-5ee3-4144-882c-2dee7be5fee7_1052x568.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!B5mb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff20ea6d-5ee3-4144-882c-2dee7be5fee7_1052x568.png" width="1052" height="568" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ff20ea6d-5ee3-4144-882c-2dee7be5fee7_1052x568.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:568,&quot;width&quot;:1052,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:88291,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.boringappsec.com/i/185505486?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff20ea6d-5ee3-4144-882c-2dee7be5fee7_1052x568.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!B5mb!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff20ea6d-5ee3-4144-882c-2dee7be5fee7_1052x568.png 424w, https://substackcdn.com/image/fetch/$s_!B5mb!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff20ea6d-5ee3-4144-882c-2dee7be5fee7_1052x568.png 848w, https://substackcdn.com/image/fetch/$s_!B5mb!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff20ea6d-5ee3-4144-882c-2dee7be5fee7_1052x568.png 1272w, https://substackcdn.com/image/fetch/$s_!B5mb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff20ea6d-5ee3-4144-882c-2dee7be5fee7_1052x568.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3>STRIDE + ASI Skills Distribution</h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!hcXh!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26646b81-d1a6-491b-a424-989edef1eba7_1044x1126.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!hcXh!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26646b81-d1a6-491b-a424-989edef1eba7_1044x1126.png 424w, https://substackcdn.com/image/fetch/$s_!hcXh!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26646b81-d1a6-491b-a424-989edef1eba7_1044x1126.png 848w, https://substackcdn.com/image/fetch/$s_!hcXh!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26646b81-d1a6-491b-a424-989edef1eba7_1044x1126.png 1272w, https://substackcdn.com/image/fetch/$s_!hcXh!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26646b81-d1a6-491b-a424-989edef1eba7_1044x1126.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!hcXh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26646b81-d1a6-491b-a424-989edef1eba7_1044x1126.png" width="1044" height="1126" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/26646b81-d1a6-491b-a424-989edef1eba7_1044x1126.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1126,&quot;width&quot;:1044,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:178307,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.boringappsec.com/i/185505486?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26646b81-d1a6-491b-a424-989edef1eba7_1044x1126.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!hcXh!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26646b81-d1a6-491b-a424-989edef1eba7_1044x1126.png 424w, https://substackcdn.com/image/fetch/$s_!hcXh!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26646b81-d1a6-491b-a424-989edef1eba7_1044x1126.png 848w, https://substackcdn.com/image/fetch/$s_!hcXh!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26646b81-d1a6-491b-a424-989edef1eba7_1044x1126.png 1272w, https://substackcdn.com/image/fetch/$s_!hcXh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26646b81-d1a6-491b-a424-989edef1eba7_1044x1126.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3>The 9 Threats Only Skills Could Find</h3><p>These are threats that exist in categories generic STRIDE doesn&#8217;t even know about:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!xyQ8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd7fcc59-f89c-41cd-a86b-5cc4df7fe962_1340x1366.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!xyQ8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd7fcc59-f89c-41cd-a86b-5cc4df7fe962_1340x1366.png 424w, https://substackcdn.com/image/fetch/$s_!xyQ8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd7fcc59-f89c-41cd-a86b-5cc4df7fe962_1340x1366.png 848w, https://substackcdn.com/image/fetch/$s_!xyQ8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd7fcc59-f89c-41cd-a86b-5cc4df7fe962_1340x1366.png 1272w, https://substackcdn.com/image/fetch/$s_!xyQ8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd7fcc59-f89c-41cd-a86b-5cc4df7fe962_1340x1366.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!xyQ8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd7fcc59-f89c-41cd-a86b-5cc4df7fe962_1340x1366.png" width="728" height="742.1253731343284" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/dd7fcc59-f89c-41cd-a86b-5cc4df7fe962_1340x1366.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1366,&quot;width&quot;:1340,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:352488,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.boringappsec.com/i/185505486?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd7fcc59-f89c-41cd-a86b-5cc4df7fe962_1340x1366.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!xyQ8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd7fcc59-f89c-41cd-a86b-5cc4df7fe962_1340x1366.png 424w, https://substackcdn.com/image/fetch/$s_!xyQ8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd7fcc59-f89c-41cd-a86b-5cc4df7fe962_1340x1366.png 848w, https://substackcdn.com/image/fetch/$s_!xyQ8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd7fcc59-f89c-41cd-a86b-5cc4df7fe962_1340x1366.png 1272w, https://substackcdn.com/image/fetch/$s_!xyQ8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd7fcc59-f89c-41cd-a86b-5cc4df7fe962_1340x1366.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><blockquote><p><strong>ASI07 (Cascade Failure Exploitation)</strong> is a perfect example: an attacker can trigger specific failures in one agent that manipulate downstream agents through confidence propagation. This is a risk <em>unique to multi-agent systems</em> - STRIDE has no category for it.</p></blockquote><div><hr></div><h1>Context Awareness: How Skills Detect Agentic Patterns</h1><p>Before generating threats, the skill-augmented run automatically detected these patterns:</p><p>&#9989; OpenAI API usage (gpt-4o-mini)</p><p>&#9989; Multi-agent chain (ValidatorAgent, RiskAnalyzerAgent, ApprovalAgent, PaymentProcessorAgent)</p><p>&#9989; LLM function calling/tool execution</p><p>&#9989; Custom goal injection via admin interface</p><p>&#9989; Cascade confidence propagation between agents</p><p>This is what enables targeted analysis. <strong>The agent knew it was analyzing a multi-agent system before it started threat modeling</strong>, so it applied the right mental models.</p><div><hr></div><h1>Why This Matters for the Industry</h1><h2>1. Every Application Type Needs Its Own Skills</h2><p>The same principle applies across the board:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!DEZN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c076823-113a-463c-a442-466a0d0fc069_1322x544.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!DEZN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c076823-113a-463c-a442-466a0d0fc069_1322x544.png 424w, https://substackcdn.com/image/fetch/$s_!DEZN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c076823-113a-463c-a442-466a0d0fc069_1322x544.png 848w, https://substackcdn.com/image/fetch/$s_!DEZN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c076823-113a-463c-a442-466a0d0fc069_1322x544.png 1272w, https://substackcdn.com/image/fetch/$s_!DEZN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c076823-113a-463c-a442-466a0d0fc069_1322x544.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!DEZN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c076823-113a-463c-a442-466a0d0fc069_1322x544.png" width="1322" height="544" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1c076823-113a-463c-a442-466a0d0fc069_1322x544.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:544,&quot;width&quot;:1322,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:113804,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.boringappsec.com/i/185505486?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c076823-113a-463c-a442-466a0d0fc069_1322x544.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!DEZN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c076823-113a-463c-a442-466a0d0fc069_1322x544.png 424w, https://substackcdn.com/image/fetch/$s_!DEZN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c076823-113a-463c-a442-466a0d0fc069_1322x544.png 848w, https://substackcdn.com/image/fetch/$s_!DEZN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c076823-113a-463c-a442-466a0d0fc069_1322x544.png 1272w, https://substackcdn.com/image/fetch/$s_!DEZN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c076823-113a-463c-a442-466a0d0fc069_1322x544.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>You don&#8217;t need different tools for each application type&#8212;you need different skills for the same tool</strong>.</p><h2>2. Skills Are the New Rules</h2><p>Traditional security tools rely on rules:</p><ul><li><p>Semgrep rules for SAST</p></li><li><p>YARA rules for malware</p></li><li><p>Snort rules for IDS</p></li></ul><p>Skills are the LLM-native equivalent. But instead of pattern matching, they enable <strong>reasoning</strong>. An agent with the right skills can:</p><ul><li><p>Understand context, not just syntax</p></li><li><p>Chain together multi-step attack scenarios</p></li><li><p>Identify domain-specific risks</p></li></ul><h2>3. The Open Source Shift</h2><p>Trail of Bits recently <a href="https://github.com/trailofbits/skills">open-sourced their skills</a> for security research and audit workflows. This signals a shift: <strong>the future of security tooling isn&#8217;t monolithic products - it&#8217;s composable, shareable expertise that makes everyone&#8217;s agents smarter.</strong></p><p>The industry is moving toward:</p><ul><li><p><strong>Composable expertise</strong> that can be shared</p></li><li><p><strong>Community-driven knowledge</strong> that improves over time</p></li><li><p><strong>Portable skills</strong> that work across tools</p></li></ul><p>This is how we collectively get better at security - not by hoarding knowledge in proprietary tools, but by sharing it as skills anyone can use.</p><div><hr></div><h1>How I Created the Agentic Security Skills</h1><p>Here&#8217;s the part that surprised me: <strong>creating these skills was fast.</strong></p><p>The traditional bottleneck in security tooling has always been knowledge engineering. Someone has to read the documentation, understand the threat landscape, and encode that knowledge into rules or patterns. This takes weeks or months.</p><p>Here&#8217;s what I did instead:</p><ol><li><p>Downloaded the OWASP Top 10 for Agentic Applications <a href="https://genai.owasp.org/resource/owasp-top-10-for-agentic-applications-for-2026/">PDF</a></p></li><li><p>Converted it to markdown - a straightforward transformation</p></li><li><p>Pointed Claude Code at it with context from a <a href="https://github.com/anshumanbh/securevibes/blob/main/docs/references/AGENT_SKILLS_GUIDE.md">skills best practices guide</a> I curated</p></li><li><p>Gave it the existing DAST skills as an example of the structure and format I wanted</p></li><li><p>Asked it to create skills for all ASI01-ASI10 categories</p></li></ol><p>The result? A complete skill set for agentic threat modeling in a fraction of the time it would have taken to build manually.</p><p><strong>This is the paradigm shift.</strong> Humans used to be the bottleneck in encoding security knowledge. But if you give AI the right context - the source material, the format you want, and examples to follow - you can move incredibly fast.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.boringappsec.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The BoringAppSec Community! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h1>Embracing Non-Determinism</h1><p>Here&#8217;s something that trips people up when they first use AI for security analysis: <strong>the results aren&#8217;t always the same.</strong></p><p>Run the same threat model twice and you might get slightly different threats. Run it with a different model and you&#8217;ll definitely get different results. This bothers people who are used to deterministic tools where the same input always produces the same output.</p><p>But here&#8217;s the thing: <strong>this isn&#8217;t a bug - it&#8217;s a feature you should embrace.</strong></p><p>The right approach isn&#8217;t to expect consistency. It&#8217;s to:</p><ol><li><p><strong>Run the workflow multiple times</strong> - maybe 2-3 runs</p></li><li><p><strong>Try different models</strong> - Sonnet and Opus often catch different things</p></li><li><p><strong>Consolidate the results</strong> - union of all findings</p></li></ol><p>What you&#8217;ll find is that <strong>critical risks show up consistently on every run</strong>. These are the threats that matter most - the ones where the signal is so strong that the model can&#8217;t miss them regardless of sampling variance.</p><p>The threats that appear inconsistently? They&#8217;re often edge cases or lower-severity issues that won&#8217;t materially change your security posture. They&#8217;re nice to have, but missing them in one run isn&#8217;t catastrophic.</p><p>This is fundamentally different from traditional SAST tools where you expect deterministic output. But it&#8217;s also how human security researchers work - run the same pentest twice with two different testers and you&#8217;ll get different findings. We&#8217;ve always accepted that in human-driven security work. It&#8217;s time to accept it in AI-driven work too.</p><div><hr></div><h1>Building Your Own Skills</h1><p>The agentic security skill in SecureVibes follows a simple structure:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ubIr!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c953f4a-4a82-4db0-bb9c-a36fae652c99_954x368.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ubIr!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c953f4a-4a82-4db0-bb9c-a36fae652c99_954x368.png 424w, https://substackcdn.com/image/fetch/$s_!ubIr!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c953f4a-4a82-4db0-bb9c-a36fae652c99_954x368.png 848w, https://substackcdn.com/image/fetch/$s_!ubIr!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c953f4a-4a82-4db0-bb9c-a36fae652c99_954x368.png 1272w, https://substackcdn.com/image/fetch/$s_!ubIr!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c953f4a-4a82-4db0-bb9c-a36fae652c99_954x368.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ubIr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c953f4a-4a82-4db0-bb9c-a36fae652c99_954x368.png" width="954" height="368" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6c953f4a-4a82-4db0-bb9c-a36fae652c99_954x368.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:368,&quot;width&quot;:954,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:77237,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.boringappsec.com/i/185505486?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c953f4a-4a82-4db0-bb9c-a36fae652c99_954x368.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ubIr!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c953f4a-4a82-4db0-bb9c-a36fae652c99_954x368.png 424w, https://substackcdn.com/image/fetch/$s_!ubIr!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c953f4a-4a82-4db0-bb9c-a36fae652c99_954x368.png 848w, https://substackcdn.com/image/fetch/$s_!ubIr!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c953f4a-4a82-4db0-bb9c-a36fae652c99_954x368.png 1272w, https://substackcdn.com/image/fetch/$s_!ubIr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c953f4a-4a82-4db0-bb9c-a36fae652c99_954x368.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The <strong>SKILL.md</strong> teaches the agent:</p><ol><li><p><strong>When to activate</strong> - Detection patterns for agentic code</p></li><li><p><strong>What to look for</strong> - Threat categories with code patterns</p></li><li><p><strong>How to report</strong> - Structured templates with required fields</p></li></ol><p>Here&#8217;s the detection logic:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!h6c3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb4c8b48-fd32-4e71-8f34-5ff32c5429e0_1292x642.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!h6c3!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb4c8b48-fd32-4e71-8f34-5ff32c5429e0_1292x642.png 424w, https://substackcdn.com/image/fetch/$s_!h6c3!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb4c8b48-fd32-4e71-8f34-5ff32c5429e0_1292x642.png 848w, https://substackcdn.com/image/fetch/$s_!h6c3!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb4c8b48-fd32-4e71-8f34-5ff32c5429e0_1292x642.png 1272w, https://substackcdn.com/image/fetch/$s_!h6c3!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb4c8b48-fd32-4e71-8f34-5ff32c5429e0_1292x642.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!h6c3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb4c8b48-fd32-4e71-8f34-5ff32c5429e0_1292x642.png" width="1292" height="642" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fb4c8b48-fd32-4e71-8f34-5ff32c5429e0_1292x642.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:642,&quot;width&quot;:1292,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:95491,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.boringappsec.com/i/185505486?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb4c8b48-fd32-4e71-8f34-5ff32c5429e0_1292x642.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!h6c3!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb4c8b48-fd32-4e71-8f34-5ff32c5429e0_1292x642.png 424w, https://substackcdn.com/image/fetch/$s_!h6c3!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb4c8b48-fd32-4e71-8f34-5ff32c5429e0_1292x642.png 848w, https://substackcdn.com/image/fetch/$s_!h6c3!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb4c8b48-fd32-4e71-8f34-5ff32c5429e0_1292x642.png 1272w, https://substackcdn.com/image/fetch/$s_!h6c3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb4c8b48-fd32-4e71-8f34-5ff32c5429e0_1292x642.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h1>You Don&#8217;t Need SecureVibes to Use These Skills</h1><p>Here&#8217;s what I love about skills: <strong>they&#8217;re just files</strong>.</p><p>You don&#8217;t need to run SecureVibes if you don&#8217;t want to. Grab the <a href="https://github.com/anshumanbh/securevibes/tree/main/packages/core/securevibes/skills/threat-modeling/agentic-security">agentic-security skill</a> and drop it in your Claude Code workspace.</p><p>Add the skill to your <strong>.claude/</strong> directory or reference it in your <strong>CLAUDE.md</strong>. The next time you ask Claude Code to threat model an agentic application, it will have the full OWASP ASI taxonomy with detection patterns and examples.</p><div><hr></div><h1>Key Takeaways</h1><ol><li><p><strong>Generic threat modeling produces generic threats.</strong> STRIDE on an agentic app gives you STRIDE categories, missing agentic-specific risks.</p></li><li><p><strong>Skills enable context-aware analysis.</strong> The skill-augmented run found 9 threats in ASI categories that STRIDE couldn&#8217;t categorize&#8212;including cascade failures, goal hijacking, and context pollution.</p></li><li><p><strong>Relevant threats &gt; More threats.</strong> 9 agentic-specific threats that actually apply to your multi-agent system are more valuable than 30 generic threats that may or may not be relevant.</p></li><li><p><strong>Skills are portable</strong>. Use them with Claude Code&#8212;they&#8217;re just markdown files with structured knowledge.</p></li><li><p><strong>Creating skills is now fast.</strong> Download the PDF, convert to markdown, give AI the right context and examples, and validate the output. What used to take weeks now takes hours.</p></li></ol><div><hr></div><p>If you have ideas for skills or want to contribute, please reach out! The more skills we share, the smarter everyone&#8217;s agents become.</p><p>Until next time, ciao!</p>]]></content:encoded></item><item><title><![CDATA[Ep 32: Rethinking Enterprise Security in an AI- and Platform-First World with Kane Narraway]]></title><description><![CDATA[In this episode, we sit down with Kane Narraway to unpack how enterprise security is changing as AI, platforms, and developer-driven security become the norm.]]></description><link>https://www.boringappsec.com/p/ep-32-rethinking-enterprise-security</link><guid isPermaLink="false">https://www.boringappsec.com/p/ep-32-rethinking-enterprise-security</guid><dc:creator><![CDATA[Anshuman Bhartiya]]></dc:creator><pubDate>Mon, 19 Jan 2026 11:43:42 GMT</pubDate><enclosure url="https://api.substack.com/feed/podcast/183596385/b5fc875fa1a9c0f42bc0946614605306.mp3" length="0" type="audio/mpeg"/><content:encoded><![CDATA[<p>In this episode, we sit down with Kane Narraway to unpack how enterprise security is changing as AI, platforms, and developer-driven security become the norm. Kane shares his path from digital forensics to leading security at Canva, and why understanding company culture matters just as much as choosing the right tools.</p><p>We discuss why modern security is becoming platform-first, why much of the security vendor market optimizes for finding problems rather than fixing them, and why Kane believes security teams need more engineers and fewer manual processes.<br><br>The conversation also digs into AI security, shadow IT (and shadow AI), and the real-world trade-offs between usability and control, especially as low-code and no-code tools become more common inside companies.</p><p><strong>00:00&#8211;03:25 &#8212; Kane&#8217;s journey from law enforcement to platform security, shaped by our time at Atlassian</strong></p><p><strong>03:25&#8211;06:37 &#8212; Why enterprise security becomes platform-first faster than AppSec</strong></p><p><strong>06:37&#8211;09:26 &#8212; Why security teams fail when they fight company culture</strong></p><p><strong>09:26&#8211;13:36 &#8212; Platforms vs best-of-breed tools: trade-offs, not ideology</strong></p><p><strong>13:36&#8211;17:45 &#8212; Why most security startups are built to be acquired</strong></p><p><strong>17:45&#8211;22:16 &#8212; Open source agents, and business-specific vulnerability research</strong></p><p><strong>22:16&#8211;27:09 &#8212; AI security, prompt injection, and the access-control problem</strong></p><p><strong>27:09&#8211;31:29 &#8212; Build vs buy in the AI era. Speed is easy, and why maintenance remains the real bottleneck.</strong></p><p><strong>31:29&#8211;40:42 &#8212; Agents, MCPs, and why stopgap solutions dominate today</strong></p><p><strong>40:42&#8211;48:57 &#8212; Shadow AI, low-code automation, and familiar security failures</strong></p><p>Tune in for a deep dive!</p><p><strong>Connect with Kane Narraway:</strong></p><p>LinkedIn: <a href="https://www.linkedin.com/in/kane-n/">https://www.linkedin.com/in/kane-n/</a></p><p>Blog: <a href="https://kanenarraway.com/">https://kanenarraway.com/</a></p><p><strong>Connect with Anshuman:</strong></p><p>LinkedIn: &#8288;&#8288;&#8288;&#8288;<a href="https://www.linkedin.com/in/anshumanbhartiya/">anshumanbhartiya</a></p><p>X: &#8288;&#8288;&#8288;&#8288;<a href="https://x.com/anshuman_bh">https://x.com/anshuman_bh</a></p><p>Website: &#8288;&#8288;&#8288;&#8288;<a href="https://anshumanbhartiya.com/">https://anshumanbhartiya.com/</a></p><p>&#8288;&#8288;&#8288;&#8288;Instagram: <a href="https://www.instagram.com/anshuman.bhartiya/#">anshuman.bhartiya</a><br><br><strong>Connect with Sandesh:</strong></p><p>LinkedIn: &#8288;&#8288;&#8288;&#8288;<a href="https://www.linkedin.com/in/anandsandesh/">anandsandesh</a></p><p>X: &#8288;&#8288;&#8288;&#8288;<a href="https://x.com/JubbaOnJeans">https://x.com/JubbaOnJeans</a><br></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.boringappsec.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The BoringAppSec Community! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[Welcome! ]]></title><description><![CDATA[Anshuman & Sandesh on Security. 
2 blogs. 
1 podcast.
A Slack community.]]></description><link>https://www.boringappsec.com/p/welcome-to-the-boring-appsec-community</link><guid isPermaLink="false">https://www.boringappsec.com/p/welcome-to-the-boring-appsec-community</guid><pubDate>Wed, 31 Dec 2025 06:11:58 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/02fb94ea-f1e2-41f0-8072-3d8259fb11e7_1600x900.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p></p><p></p>]]></content:encoded></item><item><title><![CDATA[Edition #2: Agent Security Standards + Identity/Authorization + Secure Agent Engineering + SecureVibes Update]]></title><description><![CDATA[Sandesh Mysore Anand and I recorded a couple of podcast episodes for The Boring AppSec Podcast with Ken Huang and Teja Myneedu over the past few weeks.]]></description><link>https://www.boringappsec.com/p/edition-2-agent-security-standards</link><guid isPermaLink="false">https://www.boringappsec.com/p/edition-2-agent-security-standards</guid><dc:creator><![CDATA[Anshuman Bhartiya]]></dc:creator><pubDate>Mon, 15 Dec 2025 16:00:00 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!bgcB!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5911cde3-3e68-4008-ac52-950fdb9e2f91_1488x830.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><strong><a href="https://www.linkedin.com/in/anandsandesh/">Sandesh Mysore Anand</a></strong> and I recorded a couple of podcast episodes for <strong><a href="https://www.youtube.com/watch?v=6bX_946tGug&amp;list=PLnr7iEAhCZbASkQrTiQ-X1rQ2LZerWk3N">The Boring AppSec Podcast</a></strong> with <strong><a href="https://www.linkedin.com/in/kenhuang8/">Ken Huang</a></strong> and <strong><a href="https://www.linkedin.com/in/myneedu/">Teja Myneedu</a></strong> over the past few weeks. Below are some key takeaways from them:</p><h2><strong>Architecting AI Security: Standards and Agentic Systems with Ken Huang</strong></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!bgcB!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5911cde3-3e68-4008-ac52-950fdb9e2f91_1488x830.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!bgcB!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5911cde3-3e68-4008-ac52-950fdb9e2f91_1488x830.png 424w, https://substackcdn.com/image/fetch/$s_!bgcB!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5911cde3-3e68-4008-ac52-950fdb9e2f91_1488x830.png 848w, https://substackcdn.com/image/fetch/$s_!bgcB!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5911cde3-3e68-4008-ac52-950fdb9e2f91_1488x830.png 1272w, https://substackcdn.com/image/fetch/$s_!bgcB!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5911cde3-3e68-4008-ac52-950fdb9e2f91_1488x830.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!bgcB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5911cde3-3e68-4008-ac52-950fdb9e2f91_1488x830.png" width="1456" height="812" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5911cde3-3e68-4008-ac52-950fdb9e2f91_1488x830.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:812,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Article content&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Article content" title="Article content" srcset="https://substackcdn.com/image/fetch/$s_!bgcB!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5911cde3-3e68-4008-ac52-950fdb9e2f91_1488x830.png 424w, https://substackcdn.com/image/fetch/$s_!bgcB!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5911cde3-3e68-4008-ac52-950fdb9e2f91_1488x830.png 848w, https://substackcdn.com/image/fetch/$s_!bgcB!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5911cde3-3e68-4008-ac52-950fdb9e2f91_1488x830.png 1272w, https://substackcdn.com/image/fetch/$s_!bgcB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5911cde3-3e68-4008-ac52-950fdb9e2f91_1488x830.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Infographics of the podcast summary generated by NotebookLM</figcaption></figure></div><p>The conversation focused on the necessity of new security frameworks and authentication methods to manage the unique risks posed by autonomous AI agents.</p><h3><strong>New Standards for Measuring AI Agent Risk: OWASP AIVSS</strong></h3><p>Ken detailed the <strong><a href="https://aivss.owasp.org/">AIVSS</a></strong> framework&#8217;s purpose and structure.</p><ul><li><p><strong>The Goal:</strong> AIVSS aims to provide a way to <strong>measure core agent AI security risks</strong> to enable better risk management, fitting into the &#8220;measure&#8221; component of the <strong><a href="https://www.nist.gov/itl/ai-risk-management-framework">NIST AI-RMF</a></strong> framework.</p></li><li><p><strong>Addressing Autonomy:</strong> Traditional scoring systems like <strong><a href="https://www.first.org/cvss/">CVSS</a></strong> are deterministic, measuring code and configuration. They are insufficient for agentic AI due to its <strong>non-deterministic and autonomous nature</strong>.</p></li><li><p><strong>The Scoring Approach:</strong> AIVSS builds upon CVSS by adding an <strong>agent AI risk factor</strong> to account for non-deterministic risks. This factor considers the agent&#8217;s <strong>level of autonomy</strong> (ranging from non-existent to full autonomy), as different levels present varying risk factors.</p></li><li><p><strong>Framework Components:</strong> AIVSS offers a quantitative, numerical score. It is being developed alongside a <strong>qualitative, decision-matrix-based system called <a href="https://www.cisa.gov/stakeholder-specific-vulnerability-categorization-ssvc">SSVC</a></strong> (Stakeholder-Specific Vulnerability Categorization).</p></li></ul><h3><strong>The Shortcomings of Traditional IAM for AI Agents</strong></h3><p>Ken asserted that traditional Identity and Access Management (IAM) systems, such as <strong>OAuth and SAML, are fundamentally inadequate</strong> for securing AI agents. These legacy standards were designed for web applications acting on a human&#8217;s behalf.</p><ul><li><p><strong>Session-Scoped vs. Task-Scoped:</strong> The primary issue is that current OAuth flows are <strong>session-scoped</strong> (time-based) and grant access that is additive upon request. Agents, however, require <strong>dynamic, fine-grained access</strong> that is strictly <strong>task-scoped</strong>. Access should be removed once a task is finished, requiring a new permission request for subsequent tasks.</p></li><li><p><strong>Coarse-Grained Access:</strong> Traditional IAM is often either <strong>too restrictive</strong>, stifling the agent&#8217;s necessary agency, or <strong>too coarse-grained</strong>. For instance, an HR agent might need access to a resume database but should be restricted from the salary database; granting the full human identity is too risky.</p></li><li><p><strong>Multi-Agent Complexity:</strong> Current systems struggle to accommodate <strong>multi-agent systems</strong>, which are key to future AI workflows. In these environments, different agents assume different identities, and access must be managed with a dynamic task scope.</p></li><li><p><strong>The Way Forward:</strong> A new standard is necessary. This standard must allow for agency while maintaining security by consistently checking the agent&#8217;s <strong>intent</strong> before granting access.</p></li></ul><h3><strong>Securing Agent-to-Agent (A2A) Communication</strong></h3><p>The rise of agent development kits (ADKs) and A2A protocols (like Google&#8217;s A2A protocol) introduces new security challenges beyond those seen in traditional API security.</p><ul><li><p><strong>Beyond BOLA:</strong> While standard API issues like BOLA (Broken Object Level Authorization) still exist, A2A communication requires systems to handle issues like <strong>trust, capability, and quality of service</strong>. Agents must be protected from risks like <strong>poisoned agent cards </strong>or <strong>rug pull</strong> attacks.</p></li><li><p><strong>New Protocols:</strong> Ken emphasized the need for protocols like the <strong><a href="https://arxiv.org/abs/2506.13590">Agent Capability Negotiation and Binding Protocol</a> (ACNBP)</strong>. This protocol facilitates validation using <strong>digital signatures</strong> to ensure the agent possesses the capabilities and quality of service it claims.</p></li><li><p><strong>Goal Manipulation Attacks:</strong> A major threat to autonomous systems is <strong>goal manipulation</strong>, which is challenging to defend against. This includes attacks like <strong>Drifting (Crescendo Attack) </strong>- Gradually shifting the agent&#8217;s intended goal (e.g., prompting a security agent to open ports instead of locking them), <strong>Malicious Goal Expansion </strong>- Using prompt injection to force an agent to execute its assigned task while also performing a malicious secondary task, such as leaking secret environment variables and <strong>Exhaustion Loop - </strong>Using direct or indirect prompt injection to make the agent perform a task that never completes, leading to a denial of service or a &#8220;denial of wallet&#8221;.</p></li></ul><p>Security professionals were encouraged to engage in <strong>research-oriented learning</strong> and contribute to these evolving standards to keep pace with the rapidly innovating field of AI security.</p><div><hr></div><h2><strong>Scaling Product Security In The AI Era with Teja Myneedu</strong></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!2ShE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87b374c6-a127-4322-8411-51228fa44826_1488x830.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!2ShE!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87b374c6-a127-4322-8411-51228fa44826_1488x830.png 424w, https://substackcdn.com/image/fetch/$s_!2ShE!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87b374c6-a127-4322-8411-51228fa44826_1488x830.png 848w, https://substackcdn.com/image/fetch/$s_!2ShE!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87b374c6-a127-4322-8411-51228fa44826_1488x830.png 1272w, https://substackcdn.com/image/fetch/$s_!2ShE!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87b374c6-a127-4322-8411-51228fa44826_1488x830.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!2ShE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87b374c6-a127-4322-8411-51228fa44826_1488x830.png" width="1456" height="812" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/87b374c6-a127-4322-8411-51228fa44826_1488x830.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:812,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Article content&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Article content" title="Article content" srcset="https://substackcdn.com/image/fetch/$s_!2ShE!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87b374c6-a127-4322-8411-51228fa44826_1488x830.png 424w, https://substackcdn.com/image/fetch/$s_!2ShE!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87b374c6-a127-4322-8411-51228fa44826_1488x830.png 848w, https://substackcdn.com/image/fetch/$s_!2ShE!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87b374c6-a127-4322-8411-51228fa44826_1488x830.png 1272w, https://substackcdn.com/image/fetch/$s_!2ShE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87b374c6-a127-4322-8411-51228fa44826_1488x830.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Infographics of the podcast summary generated by NotebookLM</figcaption></figure></div><p>In this conversation, Teja noted that his transition from focusing purely on product and application security to leading broader security teams provided a crucial worldview: securing products extends beyond the boundary of writing code and affects the entire enterprise.</p><h3><strong>Security Philosophy and Prioritization</strong></h3><p>A key evolution in Teja&#8217;s philosophy centers on practicality and urgency. He believes security breaches often occur because organizations failed to do the <em><strong>hard work</strong></em> of tightening access or addressing individual vulnerabilities, rather than failing to find the <em><strong>next cool thing</strong></em>.</p><ul><li><p><strong>Security by Obscurity:</strong> Teja emphasized that he is willing to accept <strong>incremental steps toward security improvement</strong>, asserting, &#8220;<em><strong>Let&#8217;s not make perfect the enemy of good</strong></em>&#8220;. While acknowledging the historical debate around &#8220;security by obscurity,&#8221; he argued that any measure (such as implementing a WAF rule) that improves security by even 1% or 2% daily is valuable. Given that bad actors are increasingly using AI agents to explore attack surface areas, the sense of urgency necessitates immediately plugging the bleeding rather than waiting weeks for an ideal fix. The goal should be to increase the <strong>economic complexity of an attack</strong> for bad actors.</p></li><li><p><strong>Risk Prioritization:</strong> The discussion touched on the challenge of risk prioritization. Teja noted the dilemma between presenting the full scope of vulnerabilities (which can feel overwhelming) and prioritizing risks. However, all prioritization is inherently flawed and only necessary when resources prevent fixing everything. Security tools often fail at prioritization because they lack necessary context regarding people, processes, and organizational strategy.</p></li></ul><h3><strong>AI, Context, and the Future of Fixing</strong></h3><p>The conversation explored how AI and automation are changing the role of security teams, particularly concerning code fixes. Traditionally, security teams manage vulnerabilities, while developers own the fixing.</p><ul><li><p><strong>Security Engineers as Fixers:</strong> We discussed whether security engineers should raise PRs for code fixes. I mentioned that security engineers should know how to fix vulnerabilities and can now use AI to easily propose PRs for engineers to approve or reject. Teja added a crucial nuance: the problem isn&#8217;t the technical fix itself, but ensuring the fix doesn&#8217;t cause <strong>unintended downstream effects</strong> (like authorization changes breaking service-to-service calls), which relies heavily on <strong>tribal knowledge</strong> within the engineering teams.</p></li><li><p><strong>The Power of Context:</strong> AI&#8217;s promise lies in reducing the cognitive load on engineers by helping them discover context quickly, serving as &#8220;product archaeologists&#8221;. Critical product context includes the code repository and deployment infrastructure. The harder aspects of context to capture include team ownership (especially after reorgs), business intent, use cases, and priority. The vendors&#8217; ability to gather and contextualize organizational constraints is the &#8220;game changer&#8221; for security tooling.</p></li></ul><h3><strong>Secure Design and Emerging Threats</strong></h3><ul><li><p><strong>Secure by Design vs. Secure Defaults:</strong> Secure by Design requires clear architecture and the application of standard security practices. While AI increases the promise of applying known security patterns consistently, we discussed that the term &#8220;secure by design&#8221; has become so broad it has lost meaningful definition, often encompassing &#8220;all of security&#8221;. The critical distinction lies between secure <em><strong>design</strong></em> (before building) and secure <em><strong>defaults</strong></em> (implementation).</p></li><li><p><strong>LLM Novel Threats:</strong> Beyond known issues like <strong>prompt injection</strong>, Teja views the biggest threat as the complexity of <strong>identity and authorization</strong>. When agents are integrated, they dynamically determine business logic and act as decision-making engines, blurring trust boundaries. This compounds the already difficult problem of access control in microservices. The challenge is granting an agent delegate access with appropriate, limited privileges. Teja also expressed heightened concern over the enterprise environment, particularly the <strong>software supply chain risk</strong> associated with browser plugins and insecure desktop downloads.</p></li></ul><p>The links to both the episodes are provided in the Appendix below.</p><div><hr></div><h2><strong>On Secure Agent Engineering</strong></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!VJyz!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e7f682a-83b6-4306-b794-872bdb10e42c_1488x830.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!VJyz!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e7f682a-83b6-4306-b794-872bdb10e42c_1488x830.png 424w, https://substackcdn.com/image/fetch/$s_!VJyz!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e7f682a-83b6-4306-b794-872bdb10e42c_1488x830.png 848w, https://substackcdn.com/image/fetch/$s_!VJyz!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e7f682a-83b6-4306-b794-872bdb10e42c_1488x830.png 1272w, https://substackcdn.com/image/fetch/$s_!VJyz!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e7f682a-83b6-4306-b794-872bdb10e42c_1488x830.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!VJyz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e7f682a-83b6-4306-b794-872bdb10e42c_1488x830.png" width="1456" height="812" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7e7f682a-83b6-4306-b794-872bdb10e42c_1488x830.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:812,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Article content&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Article content" title="Article content" srcset="https://substackcdn.com/image/fetch/$s_!VJyz!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e7f682a-83b6-4306-b794-872bdb10e42c_1488x830.png 424w, https://substackcdn.com/image/fetch/$s_!VJyz!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e7f682a-83b6-4306-b794-872bdb10e42c_1488x830.png 848w, https://substackcdn.com/image/fetch/$s_!VJyz!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e7f682a-83b6-4306-b794-872bdb10e42c_1488x830.png 1272w, https://substackcdn.com/image/fetch/$s_!VJyz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e7f682a-83b6-4306-b794-872bdb10e42c_1488x830.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><p>I read a <strong><a href="https://www.philschmid.de/why-engineers-struggle-building-agents">blog</a></strong> recently that describes the difference between traditional software engineering and <em><strong>agent engineering</strong></em>, and why senior engineers struggle to build AI agents because they try to code away the probabilistic nature of agents as opposed to embracing its nature. This blog touched upon some key points that resonated well with me. I highly recommend giving it a read.</p><p>If I were to draw parallels from this blog to security engineering, below are some quick off the top of my head thoughts on the points mentioned in the blog. Please note that the below points are not exhaustive by any means:</p><ul><li><p><em><strong>Text is the new State</strong></em> - The fact that a lot of nuance required in general engineering gets lost with data structures, are now possible to be fed via prompts. And, agents have a tendency to pay attention to them. This is essentially a breeding ground for <em><strong>prompt injection</strong></em> attacks. Prompt Injection at large still hasn&#8217;t been solved with folks still trying to settle the debate whether its a vulnerability or not. In a chatbot application, where the very nature of the app is to take user input via prompt and respond back, there could be a middleware or some kind of a proxy/filter that can look for prompt injection attacks as a defensive measure but imo, the greater risk is that with something like an indirect prompt injection - where in, the main functionality of an app could be something innocuous, for example, to upload files. But, if the app uses some kind of an AI engine in the backend to process these docs (that might contain hidden malicious prompts), it could result in catastrophic outcomes that weren&#8217;t obvious. My prediction is that direct/indirect prompt injection will continue to be a major pain when it comes to agentic apps/systems. The only way to manage this risk is to follow a defense in depth strategy, reducing blast radius, and following general security principles of implementing secure defaults, least privilege and having good observability in place. In order to defend against these attacks, the <strong><a href="https://github.com/google-research/camel-prompt-injection">CaMeL</a></strong> approach looks promising but time will tell how effective and scaleable it will be.</p></li><li><p><em><strong>Agent Intent</strong></em> - There could be multiple ways of getting to an outcome and we as humans might not know about all of them. So, instead of hardcoding them and restricting the agents, we should really focus more on the agents <em><strong>intent</strong></em> and the outcome and let the agent decide how they want to get to that. We can set meaningful milestones to ensure the trajectory is correct but restricting the agents by coding in all the edge cases is really not building an effective agentic system. Having said that, this is a security nightmare because protecting probabilistic outcomes is not trivial. Security needs to focus on the intent of the agent before taking any step - <em><strong>what its trying to do, what permissions it has, what systems it is connected to</strong></em>, etc. If its a high risk action, getting a human to approve it are going to be table stakes. The dynamic and adaptable nature of security guardrails/policies is what is needed in the agentic AI space. The traditional rules based policy engines aren&#8217;t going to cut it unfortunately.</p></li><li><p><em><strong>Error handling</strong></em> - Agents can operate autonomously so giving them the agency to take errors and resolve them dynamically, instead of failing the entire workflows is the way to build effective autonomous agentic systems. Lets consider this from a DAST (Dynamic Application Security Testing) perspective - Imagine you are building a DAST agent. By nature, a DAST agent needs to send a bunch of payloads to its target, observe the response and deduce whether something is a vulnerability or not. This is how traditional DAST scanners have worked. No dynamic decisions are made based on the applications behavior. In the AI era now, imo - there is a lot of room for improvement in such scenarios. For example, depending upon the errors received in the response, the DAST agent can adapt dynamically and continue probing the target more effectively instead of simply <em><strong>spraying and praying</strong></em>. This will also address the DDOS type attacks by such scanners because there won&#8217;t be a need to throw a bunch of non-relevant payloads at a target and bring them down. Making your agents <strong>smart</strong>, <strong>adaptable</strong> and <strong>stealthy</strong> will really test the efficacy of security controls. Having said that, one important point worth mentioning here is to enable an agent to self correct/adapt in a <em><strong>sandboxed</strong></em> environment. You don&#8217;t want to give your DAST agent the permission to access the file system, only to realize that it rm -rf&#8217;ed itself, while trying to fix something.</p></li><li><p><em><strong>Evaluating behavior or testing probabilistic systems</strong></em> - In agentic systems, unit tests just aren&#8217;t enough. <em><strong>Reliability</strong></em>, <em><strong>Quality</strong></em> and <em><strong>Tracing</strong></em> are key things to evaluate agentic behavior against. Let me explain this by using my own experience of building a <em><strong>vulnerability triage agent</strong></em> - I was flummoxed by its outcome because it was different every time I ran it. I wasn&#8217;t sure how to improve it because it wasn&#8217;t like the variance in the outcome was acceptable. It was basically true positive on one run and false positive on the next. And, without the AI/ML background, I had no idea how to build evals to actually make this triage agent work reliably. I started thinking from first principles. I started using the outcome from each run and fed it back to the AI to help me understand how I could improve the prompt so that the outcome was consistent and aligned with what I&#8217;d expect it to be. The AI would suggest some changes like adding logs at different points where the agent was making decisions. I&#8217;d look at the suggestions, make minor improvements and implement them. I&#8217;d then re-run the workflow and see if the outcome changed - whether it got better or worse. I was really <em><strong>prompt engineering</strong></em> at this point. Soon enough, I started seeing consistent outcomes from the agent. I could trust it, the quality of the reasoning was solid and I had traces of every action/decision the agent was making. I didn&#8217;t realize that I was unintentionally building some sort of an eval system manually where I had an <em><strong>input</strong></em> (vulnerability data), the <em><strong>expected output</strong></em> (human triage of the vulnerability whether a TP or a FP) and a <em><strong>prompt</strong></em> (context) that I could use AI to help improve to get to the desired <em><strong>outcome</strong></em>. Simple approaches like this are often lost in the hype cycle, but when you actually stick to first principles, it all makes sense.</p></li><li><p><em><strong>Implicit vs Explicit Context</strong></em> - As humans, we have a lot of tribal knowledge and assumptions/perceptions of the world i.e. implicit context. In software engineering, this translates to things like variable / function / tool naming, etc. But, more often than not, we don&#8217;t do a good enough job of this nomenclature, and keep it ambiguous. Agents work differently. <em><strong>The more accurate context we provide them, the less ambiguity we leave upto them to decipher, and the better outcomes we are going to see.</strong></em> Another way to think about this is that traditional APIs are also not adaptable in the sense that it expects a pre-defined input and has specified output formats. Its too restrictive. Agents, on the other hand have the capability to adapt during runtime by reading tool definitions and adjusting the inputs accordingly. MCP tool definitions are a great example here. A tool having a wrong docstring definition or an incomplete name could lead to wrong invocations resulting in unintended consequences. Removing the ambiguity and making it dead simple for agents is a recipe to build reliable and secure agentic systems.</p></li></ul><p>I feel that the quote below from the blog sums it up pretty well, where &#8220;it&#8221; refers to the probabilistic nature of agents.</p><blockquote><p>&#8220;You must manage it through evals and self-correction.&#8221;</p></blockquote><div><hr></div><h2><strong>SecureVibes Update</strong></h2><p>For those who might not know, I open sourced a project called &#8220;SecureVibes&#8221; that is meant to help vibecoders find security vulnerabilities in their codebase using a slightly different approach as compared to traditional tools. You can read more about it <strong><a href="https://www.anshumanbhartiya.com/posts/securevibes-intro">here</a></strong>. The project is on my Github <strong><a href="https://github.com/anshumanbh/securevibes">here</a></strong>. Below are some updates on it:</p><ul><li><p><strong><a href="https://www.linkedin.com/in/mahmud-muhammad/">Mahmud Muhammad</a></strong> gave a presentation at Devfest Llorin and demo&#8217;ed SecureVibes. <strong><a href="https://docs.google.com/presentation/d/12xr5UqUV5DLwix-Cdmt9_ofeA8GZxv16DXDEFoeS8VI/edit?slide=id.g35a5db83d97_0_536#slide=id.g35a5db83d97_0_536">Here</a></strong> is the link to this deck.</p></li><li><p>I got an opportunity to present SecureVibes at a local <strong><a href="http://aitinkerers.org/">AI Tinkerers</a></strong> Meetup in San Diego. The community loved it and it was voted the <em><strong>community favorite</strong></em> and was also featured in their newsletter <strong><a href="https://post-training.aitinkerers.org/p/community-spotlights-issue-10">here</a></strong>.</p></li><li><p>Shoutout to <strong><a href="https://www.linkedin.com/in/yogikortisa/">Yogi Kortisa</a></strong> and <strong><a href="https://www.linkedin.com/in/hkolla/">Kolla Harish</a></strong> for contributing to its code. We are going to keep improving it and learning from it, as we continue to operate in this greenfield area. Watch out for this space as we will share new learnings here. If you&#8217;d like to help contribute to it, have questions or simply interested in following along our journey, please feel free to join the Discord server below.</p></li></ul>]]></content:encoded></item><item><title><![CDATA[Ep 31: The Future of Developer Security with Travis McPeak]]></title><description><![CDATA[In this episode, we sit down with Travis McPeak, one of the most prominent thinkers in the space of developer security.]]></description><link>https://www.boringappsec.com/p/the-future-of-developer-security</link><guid isPermaLink="false">https://www.boringappsec.com/p/the-future-of-developer-security</guid><dc:creator><![CDATA[Sandesh Mysore Anand]]></dc:creator><pubDate>Mon, 15 Dec 2025 16:00:00 GMT</pubDate><enclosure url="https://api.substack.com/feed/podcast/181863163/8dfac7033ac62ef9efee38e47b5919f9.mp3" length="0" type="audio/mpeg"/><content:encoded><![CDATA[<p>In this episode, we sit down with Travis McPeak, one of the most prominent thinkers in the space of developer security. Travis, who built his career at the intersection of security automation and developer productivity, shares his philosophy on achieving security at scale in the AI era. <br><br>His career spans security leadership roles at major tech companies, including Symantec, IBM, Netflix, and Databricks. Most recently, he founded and served as CEO of Resourcely, a startup built on the idea of making cloud infrastructure secure by default, before being &#8220;acqui-hired&#8221; by Cursor, the rapidly growing AI-powered code editor, to lead security and enterprise readiness.<br><br><strong>Key Takeaways</strong></p><ul><li><p><strong>AI for Secure by Default:</strong> AI tools provide the best injection point to shift security &#8220;all the way left&#8221; and move past the reactive &#8220;whack-a-mole&#8221; approach, because developers are already motivated to use these highly effective tools.</p></li></ul><ul><li><p><strong>Changing AppSec Strategy:</strong> AI dramatically changes the nature of AppSec by making previously unscalable strategies, such as threat modeling, applicable. AI can generate architecture diagrams on demand by tracing through code.</p></li></ul><ul><li><p><strong>The Compliance Bottleneck:</strong> The dramatic consolidation of cloud security vendors reflects how compliance-minded the security industry remains. Critical infrastructure misconfigurations (like public databases being left open) often go unaddressed because they are not measured by compliance standards.</p></li></ul><ul><li><p><strong>Platform vs. Point Solutions:</strong> Travis argues against platforms that are often amalgamations of poorly integrated acquired tools. He suggests buying the single best point solution for a high-leverage problem and using AI capabilities to operationalize and wire it into internal systems, thereby simplifying integrations that platforms traditionally provide.</p></li></ul><ul><li><p><strong>The Skeptical Coder:</strong> A fundamental limitation of Large Language Models (LLMs) is their desire to &#8220;make you happy,&#8221; causing them to provide answers even if they are incorrect. Therefore, engineers must use AI output only as a starting point and only consider the code finished when they understand it fully end to end.</p></li></ul><ul><li><p><strong>Prompt Injection Defined:</strong> Prompt injection is confirmed as a legitimate vulnerability, essentially a rehash of old issues like cross-site scripting and SQL injection, arising from the improper separation between the LLM instruction and the user instruction.<br><br>Tune in for a deep dive!<br><br><strong>Connect with Travis:</strong></p><p>LinkedIn:   travismcpeak  </p><p>Company Website: <a href="https://cursor.com/">https://cursor.com/</a></p><p><br><strong>Connect with Anshuman:</strong></p><p>LinkedIn: &#8288;&#8288;&#8288;&#8288;<a href="https://www.linkedin.com/in/anshumanbhartiya/">anshumanbhartiya</a>  </p><p>X: &#8288;&#8288;&#8288;&#8288;<a href="https://x.com/anshuman_bh">https://x.com/anshuman_bh</a></p><p>Website: &#8288;&#8288;&#8288;&#8288;<a href="https://anshumanbhartiya.com/">https://anshumanbhartiya.com/</a></p><p>&#8288;&#8288;&#8288;&#8288;Instagram: <a href="https://www.instagram.com/anshuman.bhartiya/#">anshuman.bhartiya   </a></p><p><br><strong>Connect with Sandesh:</strong></p><p>LinkedIn: &#8288;&#8288;&#8288;&#8288; <a href="https://www.linkedin.com/in/anandsandesh/">anandsandesh</a></p><p>X: &#8288;&#8288;&#8288;&#8288;<a href="https://x.com/JubbaOnJeans">https://x.com/JubbaOnJeans</a></p><p></p><div class="embedded-publication-wrap" data-attrs="{&quot;id&quot;:58976,&quot;name&quot;:&quot;Boring AppSec&quot;,&quot;logo_url&quot;:null,&quot;base_url&quot;:&quot;https://boringappsec.substack.com&quot;,&quot;hero_text&quot;:&quot;E1-27: Getting the Boring aspects of AppSec right \nE28+: All aspects of building AppSec products&quot;,&quot;author_name&quot;:&quot;Sandesh Mysore Anand&quot;,&quot;show_subscribe&quot;:true,&quot;logo_bg_color&quot;:&quot;#f5f5f5&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="EmbeddedPublicationToDOMWithSubscribe"><div class="embedded-publication show-subscribe"><a class="embedded-publication-link-part" native="true" href="https://boringappsec.substack.com?utm_source=substack&amp;utm_campaign=publication_embed&amp;utm_medium=web"><span class="embedded-publication-name">Boring AppSec</span><div class="embedded-publication-hero-text">E1-27: Getting the Boring aspects of AppSec right 
E28+: All aspects of building AppSec products</div><div class="embedded-publication-author-name">By Sandesh Mysore Anand</div></a><form class="embedded-publication-subscribe" method="GET" action="https://boringappsec.substack.com/subscribe?"><input type="hidden" name="source" value="publication-embed"><input type="hidden" name="autoSubmit" value="true"><input type="email" class="email-input" name="email" placeholder="Type your email..."><input type="submit" class="button primary" value="Subscribe"></form></div></div><p><br></p></li></ul><p></p>]]></content:encoded></item><item><title><![CDATA[Ep 30: Scaling Product Security In The AI Era with Teja Myneedu]]></title><description><![CDATA[In this episode, we sit down with Teja Myneedu, Sr.]]></description><link>https://www.boringappsec.com/p/ep-30-scaling-product-security-in</link><guid isPermaLink="false">https://www.boringappsec.com/p/ep-30-scaling-product-security-in</guid><dc:creator><![CDATA[Sandesh Mysore Anand]]></dc:creator><pubDate>Fri, 05 Dec 2025 11:12:00 GMT</pubDate><enclosure url="https://api.substack.com/feed/podcast/182497681/8d09d1a4e1ad46934b7fdd778207b02a.mp3" length="0" type="audio/mpeg"/><content:encoded><![CDATA[<p>In this episode, we sit down with Teja Myneedu, Sr. Director, Security and Trust at Navan. He shares his philosophy on achieving security at scale, discussing some challenges and approaches specially in the AI era. <br><br>Teja's career spans over two decades on the front lines of product security at hyper-growth companies like Splunk. He currently operates at the complex intersection of FinTech and corporate travel, where his responsibilities include securing financial transactions and ensuring the physical duty of care for global travelers.<br><br><strong>Key Takeaways<br></strong><br>&#8226; <strong>Scaling Security Philosophy:</strong> Security programs should be built on developer empathy and innovative solutions, scaling with context and automation.<br><br>&#8226; <strong>Pragmatic Protection:</strong> Focus on incremental, practical improvements (like WAF rules) to secure the enterprise immediately, instead of letting the pursuit of perfection delay necessary defenses; security by obscurity is not always bad.<br><br>&#8226; <strong>Flawed Prioritization:</strong> Prioritization frameworks are often flawed because they lack organizational and business context, which security tools fail to provide.<br><br>&#8226; <strong>AI and Code Fixes:</strong> AI is changing the application security field by reducing the cognitive load on engineers and making it easier for security teams to propose vulnerability fixes (PRs).<br><br>&#8226; <strong>The Authorization Dilemma:</strong> The biggest novel threat introduced by LLMs is the complexity of identity and authorization, as agents require delegate access and dynamically determine business logic.<br><br>Tune in for a deep dive!<br><br><strong>Connect with Teja:</strong></p><p>LinkedIn: <a href="https://www.linkedin.com/in/myneedu/">myneedu </a> </p><p>Company Website: <a href="https://www.navan.com">https://www.navan.com</a></p><p><strong><br>Connect with Anshuman:</strong></p><p>LinkedIn: &#8288;&#8288;&#8288;&#8288;<a href="https://www.linkedin.com/in/anshumanbhartiya/">anshumanbhartiya</a>  </p><p>X: &#8288;&#8288;&#8288;&#8288;<a href="https://x.com/anshuman_bh">https://x.com/anshuman_bh</a></p><p>Website: &#8288;&#8288;&#8288;&#8288;<a href="https://anshumanbhartiya.com/">https://anshumanbhartiya.com/</a></p><p>&#8288;&#8288;&#8288;&#8288;Instagram: &#8288;&#8288;<a href="https://www.instagram.com/anshuman.bhartiya/#">anshuman.bhartiya </a>  </p><p><strong><br>Connect with Sandesh:</strong></p><p>LinkedIn: &#8288;&#8288;&#8288;&#8288;<a href="https://www.linkedin.com/in/anandsandesh/">anandsandesh</a>  </p><p>X: <a href="http://&#8288;&#8288;&#8288;&#8288;https://x.com/JubbaOnJeans">&#8288;&#8288;&#8288;&#8288;https://x.com/JubbaOnJeans</a> &#8288;&#8288;&#8288;&#8288;</p><div class="embedded-publication-wrap" data-attrs="{&quot;id&quot;:58976,&quot;name&quot;:&quot;The Boring AppSec Community&quot;,&quot;logo_url&quot;:null,&quot;base_url&quot;:&quot;https://boringappsec.substack.com&quot;,&quot;hero_text&quot;:&quot;E1-27: Getting the Boring aspects of AppSec right \nE28+: All aspects of building AppSec products&quot;,&quot;author_name&quot;:&quot;Sandesh Mysore Anand&quot;,&quot;show_subscribe&quot;:true,&quot;logo_bg_color&quot;:&quot;#f5f5f5&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="EmbeddedPublicationToDOMWithSubscribe"><div class="embedded-publication show-subscribe"><a class="embedded-publication-link-part" native="true" href="https://boringappsec.substack.com?utm_source=substack&amp;utm_campaign=publication_embed&amp;utm_medium=web"><span class="embedded-publication-name">The Boring AppSec Community</span><div class="embedded-publication-hero-text">E1-27: Getting the Boring aspects of AppSec right 
E28+: All aspects of building AppSec products</div><div class="embedded-publication-author-name">By Sandesh Mysore Anand</div></a><form class="embedded-publication-subscribe" method="GET" action="https://boringappsec.substack.com/subscribe?"><input type="hidden" name="source" value="publication-embed"><input type="hidden" name="autoSubmit" value="true"><input type="email" class="email-input" name="email" placeholder="Type your email..."><input type="submit" class="button primary" value="Subscribe"></form></div></div>]]></content:encoded></item><item><title><![CDATA[
Ep 29: Architecting AI Security: Standards and Agentic Systems with Ken Huang]]></title><description><![CDATA[In this episode, we sit down with Ken Huang, a core architect behind modern AI security standards, to discuss the revolutionary challenges posed by agentic AI systems.]]></description><link>https://www.boringappsec.com/p/ep-29-architecting-ai-security-standards</link><guid isPermaLink="false">https://www.boringappsec.com/p/ep-29-architecting-ai-security-standards</guid><dc:creator><![CDATA[Sandesh Mysore Anand]]></dc:creator><pubDate>Tue, 25 Nov 2025 11:31:00 GMT</pubDate><enclosure url="https://api.substack.com/feed/podcast/182499152/4783489957ba36620b3bfec0fa8da06e.mp3" length="0" type="audio/mpeg"/><content:encoded><![CDATA[<p>In this episode, we sit down with Ken Huang, a core architect behind modern AI security standards, to discuss the revolutionary challenges posed by agentic AI systems. Ken, who chairs the OWASP AIVSS project and co-chairs the AI safety working groups at the Cloud Security Alliance, breaks down how security professionals are writing the rulebook for a future driven by autonomous agents.</p><p><strong>Key Takeaways</strong></p><p>&#8226; <strong>AIVSS for Non-Deterministic Risk:</strong> The OWASP AIVSS project aims to provide a quantitative measure for core agent AI risks by applying an agent AI risk factor on top of CVSS, specifically addressing the autonomy and non-deterministic nature of AI agents.</p><p>&#8226; <strong>Need for Task-Scoped IAM:</strong> Traditional OAuth and SAML are inadequate for agentic systems because they provide coarse-grained, session-scoped access control. New authentication standards must be task-scoped, dynamically removing access once a specific task is complete, and driven by verifying the agent&#8217;s intent.</p><p>&#8226; <strong>A2A Security Requires New Protocols:</strong> Agent-to-Agent communication (A2A) introduces security issues beyond traditional API security (like BOLA). New systems must utilize protocols for Agent Capability Discovery and Negotiation&#8212;validated by digital signatures&#8212;to ensure the trustworthiness and promised quality of service from interacting agents.</p><p>&#8226; <strong>Goal Manipulation is a Critical Threat:</strong> Sophisticated attacks often utilize context engineering to execute goal manipulation against agents. These attacks include gradually shifting an agent&#8217;s objective (crescendo attack), using prompt injection to force the agent to expose secrets (malicious goal expansion), and forcing endless processing loops (exhaustion loop/denial of wallet).</p><p>Tune in for a deep dive!</p><p><strong>Connect with Ken:</strong></p><p>LinkedIn: <a href="https://www.linkedin.com/in/kenhuang8/">kenhuang8</a>  </p><p>Company Website: <a href="https://distributedapps.ai/">https://distributedapps.ai/</a></p><p>Substack: <a href="https://kenhuangus.substack.com/">https://kenhuangus.substack.com/</a></p><p>Paper (Agent Capability Negotiation and Binding Protocol): <a href="https://arxiv.org/abs/2506.13590">https://arxiv.org/abs/2506.13590</a></p><p>Book (Securing AI Agents): <a href="https://www.youtube.com/redirect?event=video_description&amp;redir_token=QUFFLUhqblpxQ3ZxM3B1MllFdlE5RzZ6YkZINkFzaWZnd3xBQ3Jtc0tsNGFfaTZwcmhfOUtpNmE2RlNHTlBQM2J2dkdoWEVoRVZiZlFYdkNjUlRCejBFWFZkMXVEdGFGMEVDRHVEdDlXVm9xNkpaV1VZQ3l2MHhvZzdZUzBsYkpsOXRnelFGU19XRnplOExtQWtpNzV3THNWbw&amp;q=https%3A%2F%2Fwww.amazon.com%2FSecuring-Agents-Foundations-Frameworks-Real-World%2Fdp%2F3032021294&amp;v=YNFO5xVvdzM">https://www.amazon.com/Securing-AI-Agents </a></p><p>AIVSS: <a href="https://aivss.owasp.org/">https://aivss.owasp.org/</a></p><p><strong>Connect with Anshuman:</strong></p><p>LinkedIn: &#8288;&#8288;&#8288;&#8288;<a href="https://www.linkedin.com/in/anshumanbhartiya/">anshumanbhartiya</a></p><p>X: &#8288;&#8288;&#8288;&#8288;<a href="https://x.com/anshuman_bh">https://x.com/anshuman_bh</a></p><p>Website: &#8288;&#8288;&#8288;&#8288;<a href="https://anshumanbhartiya.com/">https://anshumanbhartiya.com/</a></p><p>&#8288;&#8288;&#8288;&#8288;Instagram: <a href="https://www.instagram.com/anshuman.bhartiya/#">anshuman.bhartiya</a></p><p><strong>Connect with Sandesh:</strong></p><p>LinkedIn: &#8288;&#8288;&#8288;&#8288; <a href="https://www.linkedin.com/in/anandsandesh/">anandsandesh</a></p><p>X: &#8288;&#8288;&#8288;&#8288;<a href="https://x.com/JubbaOnJeans">https://x.com/JubbaOnJeans</a></p><div class="embedded-publication-wrap" data-attrs="{&quot;id&quot;:58976,&quot;name&quot;:&quot;The Boring AppSec Community&quot;,&quot;logo_url&quot;:null,&quot;base_url&quot;:&quot;https://boringappsec.substack.com&quot;,&quot;hero_text&quot;:&quot;E1-27: Getting the Boring aspects of AppSec right \nE28+: All aspects of building AppSec products&quot;,&quot;author_name&quot;:&quot;Sandesh Mysore Anand&quot;,&quot;show_subscribe&quot;:true,&quot;logo_bg_color&quot;:&quot;#f5f5f5&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="EmbeddedPublicationToDOMWithSubscribe"><div class="embedded-publication show-subscribe"><a class="embedded-publication-link-part" native="true" href="https://boringappsec.substack.com?utm_source=substack&amp;utm_campaign=publication_embed&amp;utm_medium=web"><span class="embedded-publication-name">The Boring AppSec Community</span><div class="embedded-publication-hero-text">E1-27: Getting the Boring aspects of AppSec right 
E28+: All aspects of building AppSec products</div><div class="embedded-publication-author-name">By Sandesh Mysore Anand</div></a><form class="embedded-publication-subscribe" method="GET" action="https://boringappsec.substack.com/subscribe?"><input type="hidden" name="source" value="publication-embed"><input type="hidden" name="autoSubmit" value="true"><input type="email" class="email-input" name="email" placeholder="Type your email..."><input type="submit" class="button primary" value="Subscribe"></form></div></div>]]></content:encoded></item><item><title><![CDATA[Edition #1: AI for Offense Is Here. Defenders Aren’t Ready.]]></title><description><![CDATA[A Chinese state-sponsored group GTG-1002 ran a full offensive campaign using Claude Code sub-agents with MCP, automating 80&#8211;90% of the kill chain across ~30 targets.]]></description><link>https://www.boringappsec.com/p/edition-1-ai-for-offense-is-here</link><guid isPermaLink="false">https://www.boringappsec.com/p/edition-1-ai-for-offense-is-here</guid><dc:creator><![CDATA[Anshuman Bhartiya]]></dc:creator><pubDate>Mon, 17 Nov 2025 16:00:00 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!B-_R!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23bccf0e-257f-489c-b80a-e9f1bd0f6876_1024x576.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!B-_R!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23bccf0e-257f-489c-b80a-e9f1bd0f6876_1024x576.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!B-_R!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23bccf0e-257f-489c-b80a-e9f1bd0f6876_1024x576.png 424w, https://substackcdn.com/image/fetch/$s_!B-_R!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23bccf0e-257f-489c-b80a-e9f1bd0f6876_1024x576.png 848w, https://substackcdn.com/image/fetch/$s_!B-_R!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23bccf0e-257f-489c-b80a-e9f1bd0f6876_1024x576.png 1272w, https://substackcdn.com/image/fetch/$s_!B-_R!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23bccf0e-257f-489c-b80a-e9f1bd0f6876_1024x576.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!B-_R!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23bccf0e-257f-489c-b80a-e9f1bd0f6876_1024x576.png" width="1024" height="576" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/23bccf0e-257f-489c-b80a-e9f1bd0f6876_1024x576.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:576,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!B-_R!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23bccf0e-257f-489c-b80a-e9f1bd0f6876_1024x576.png 424w, https://substackcdn.com/image/fetch/$s_!B-_R!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23bccf0e-257f-489c-b80a-e9f1bd0f6876_1024x576.png 848w, https://substackcdn.com/image/fetch/$s_!B-_R!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23bccf0e-257f-489c-b80a-e9f1bd0f6876_1024x576.png 1272w, https://substackcdn.com/image/fetch/$s_!B-_R!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23bccf0e-257f-489c-b80a-e9f1bd0f6876_1024x576.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>A Chinese state-sponsored group GTG-1002 ran a full offensive campaign using Claude Code <strong><a href="https://code.claude.com/docs/en/sub-agents">sub-agents</a></strong> with <strong><a href="https://modelcontextprotocol.io/docs/getting-started/intro">MCP</a></strong>, automating 80&#8211;90% of the kill chain across ~30 targets. This isn&#8217;t a red-team exercise; it&#8217;s the first real glimpse of what AI-native offensive ops look like in the wild.</p><p>The bad actors automated the entire cyber kill chain: Recon -&gt; Attack Surface Mapping -&gt; Vulnerability Detection -&gt; Vulnerability Validation -&gt; Credentials Harvesting -&gt; Lateral Movement -&gt; Data Collection</p><p>Each phase likely had a sub-agent working semi-autonomously, with human operators only stepping in for approvals and course corrections.</p><p>If you&#8217;re a defender still treating coding agents as toys, this is your wake-up call.</p><p>I read the full report from Anthropic; you can find it <strong><a href="https://assets.anthropic.com/m/ec212e6566a0d47/original/Disrupting-the-first-reported-AI-orchestrated-cyber-espionage-campaign.pdf">here</a></strong>. Below are my takeaways as someone who&#8217;s been building similar AI-native systems (like <strong><a href="https://www.anshumanbhartiya.com/posts/securevibes-intro">SecureVibes</a></strong>) AND who has built bug hunting machines in the past (like <strong><a href="https://github.com/BountyMachine/about">BountyMachine</a></strong>).</p><div><hr></div><h2><strong>What is a traditional web hacking campaign?</strong></h2><p>Before I start talking about the report itself, I want to take a slight detour first and walk you through how I would try to hack into any organization using open source tools (taking a hypothetical web based example below). It would look something like below:</p><ul><li><p>Perform reconnaissance on my target&#8217;s domain and gather all sub-domains using tools like <strong><a href="https://github.com/owasp-amass/amass">amass</a></strong>, <strong><a href="https://github.com/projectdiscovery/subfinder">subfinder</a></strong>, etc.</p></li><li><p>Port scan all domains and sub-domain using tools like <strong><a href="https://github.com/nmap/nmap">nmap</a></strong>, <strong><a href="https://github.com/robertdavidgraham/masscan">masscan</a></strong>, etc.</p></li><li><p>For any domain/sub-domain that has, let&#8217;s say, a web console exposed on port 80/443: run a directory bruteforcer like <strong><a href="https://github.com/xmendez/wfuzz">wfuzz</a></strong>, <strong><a href="https://github.com/ffuf/ffuf">ffuf</a></strong>, etc., run a screenshot utility to take screenshots using tools like <strong><a href="https://github.com/sensepost/gowitness">gowitness</a></strong>, probe the web console further to detect things like tech stack, versions, etc. using tools like <strong><a href="https://github.com/projectdiscovery/httpx">httpx</a></strong>, so on and so forth.</p></li><li><p>Triage all the results obtained</p></li><li><p>Research if there are any known exploits/CVEs against the gathered stack</p></li><li><p>Try to exploit them against the target assets and see if I can laterally move in their environment and gain a persistent foothold, eventually exfiltrating keys to their kingdom.</p></li></ul><p>You get the idea. There is nothing novel about any of this. These techniques and tools have been known to hackers since years. Some of the steps of the workflow above could definitely be automated and chained together.</p><blockquote><p>Matter of fact, we presented <em><strong>BountyMachine</strong></em> - a system that I, along with a couple other hacker friends, built to automate our bug bounty hunting workflow. We presented this system in a talk titled &#8220;<em><strong>Bug Bounty Hunting on Steroids</strong></em>&#8220; at Defcon Recon Village <strong>7</strong> years ago. You can watch that video <strong><a href="https://www.youtube.com/watch?v=7WYjSDZxFYc&amp;t=1608s">here</a></strong>.</p></blockquote><p>Building this system involved building <strong>custom orchestration</strong>, <strong>glue code</strong>, and <strong>bespoke infra</strong> using technologies like Kubernetes and <strong><a href="https://argoproj.github.io/">Argo</a></strong>, which were green field back then. Stitching tools together, handling weird output formats, and managing state across stages was half the work. We ( <strong><a href="https://www.linkedin.com/in/glenn-devalias-grant/">Glenn Grant</a></strong> , <strong><a href="https://www.linkedin.com/in/mhmdiaa/">Mohammed Diaa</a></strong> and I ) spent months building it out and it got pretty complex. The progress was slow, the incentives weren&#8217;t there and we soon gave up on it.</p><p>The point being that this is how I&#8217;d have traditionally conducted <strong>automated</strong> web hacking campaigns, before I knew anything about AI. The ROI just wasn&#8217;t there for hobbyists / white hats like me. Maybe, state sponsored groups already had such systems operating at scale, but what do I know!</p><div><hr></div><h2><strong>Why AI-orchestrated operation is different from traditional campaigns</strong></h2><p>Anthropic called the attack a &#8220;<em><strong>sophisticated cyber espionage operation</strong></em>&#8220;. There are folks in the security industry debating whether the attack was sophisticated or not, and whether there was anything new/novel in this attack as compared to traditional campaigns.</p><p>The report insinuates similar techniques/tools as mentioned in the previous section so I agree that there is nothing novel or sophisticated about them as such.</p><p>Rather, the novelty/sophistication is in the technologies (<em><strong>sub-agents, MCP, coding agent acting as an orchestrator</strong></em>, etc.) used to carry out the techniques/tools and the <em><strong>simplicity</strong></em> of building such systems these days in a fraction of time <em><strong>using AI </strong></em>by anybody, including script kiddies.</p><blockquote><p>GTG-1002 might just be a bunch of script kiddies hacking in a basement. I guess we will never know?</p></blockquote><p>MCP, in particular, is getting widely adopted by organizations trying to implement AI, yet security teams are struggling to wrap their heads around the new attack surface it has opened up. This also likely explains why this campaign went un-detected for a long time. <em><strong>MCP observability is a real risk enterprises are facing today.</strong></em></p><p>If I read between the lines in the report, the bad actors (I am speculating):</p><ul><li><p><em>used<strong> sub-agents </strong>for specific objectives leveraging their individual context windows<strong>,</strong></em></p></li><li><p><em><strong>orchestrated by Claude Code </strong>abstracting away all the glue code work and bespoke infra required to build such autonomous hacking systems<strong>,</strong></em></p></li><li><p><em><strong>leveraged the MCP protocol </strong>to call <strong>open source tools</strong> that performed vulnerability scanning and piped data from one tool to another; using LLM as the intelligence layer to deal with interoperability of the tools<strong>,</strong></em></p></li><li><p><em><strong>likely used Claude hooks </strong>to build human in the loop workflows and kept the campaign well directed</em></p></li></ul><p>Basically, they were able to automate the entire cyber kill chain using just a <strong>general purpose coding agent </strong>and its native constructs, <strong>MCP</strong> and <strong>open source tools</strong>. If this is not novel, I don&#8217;t know what is!</p><div><hr></div><h2><strong>Why I&#8217;m not surprised this worked</strong></h2><p>If you&#8217;ve been following me and my LinkedIn posts, you might already know that I have been exploring building AI native systems for a while now, using coding agents like Claude Code, Codex and Droid. I&#8217;ve blogged about my technical research on my website <strong><a href="https://www.anshumanbhartiya.com/technical-blog">here</a></strong>.</p><p>I even built an AI native security system for vibecoded applications called <strong><a href="https://github.com/anshumanbh/securevibes">SecureVibes</a></strong> in ~2 weeks working solo nights and weekends. It can already find real security vulnerabilities in a codebase using a multi-agent flow (uses Claude sub-agents) and simple prompts following a methodology that a human security professional would.</p><p>If I can do that as one person, it&#8217;s obvious that motivated, well-resourced bad actors can build something like what Anthropic described, and then push it much further.</p><p>The exciting part is what this unlocks for <strong>offensive security</strong>: automating large chunks of recon, vulnerability discovery, and exploit validation. The nerve-racking part is that this campaign going undetected for a while and successfully hitting real targets is a clear sign -</p><blockquote><p>The gap between what attackers can do with AI and what defenders are prepared for is getting wider, fast.</p></blockquote><div><hr></div><h2><strong>What can / should you do about it?</strong></h2><p>This report is going to be turned into slideware by a lot of security vendors and VCs alike. &#8220;AI-native&#8221; everything. &#8220;Our platform will save you&#8221;. &#8220;This is why we invested in this startup&#8221; Sure. They are not wrong! But, lets bring back our attention to what really matters for organizations who might be experiencing similar attack campaigns right now and have no idea about it.</p><p>If you&#8217;re a defender in an organization, the main lesson isn&#8217;t &#8220;buy more tools&#8221;. It is:</p><blockquote><p>You need to up-skill yourself in using AI as a defensive operator.</p></blockquote><p>No product/platform will save you if:</p><ul><li><p>You don&#8217;t understand how these agents actually work under the hood</p></li><li><p>You don&#8217;t know how to integrate them into your existing infra and workflows</p></li><li><p>You don&#8217;t understand the <strong>new risks</strong> they introduce</p></li></ul><p>If you are fighting against AI Offense, you&#8217;ve gotta know how AI works in the first place before you can start leveraging it as a defender to fight against it. You&#8217;ve gotta use a machine to fight against a machine.</p><p>Also, you, as a defender, know more about your organization than any security product can. I am referring to the institutional knowledge about how tools work and integrate with each other, how software gets built and deployed, the SDLC and the security integration points, etc. So, here are a few things (not an exhaustive list by any means, just some off the top of my head) you can do as defenders in your organizations:</p><ul><li><p><strong>Run safe experiments with agents in your own environment</strong> - Give a coding agent access to a staging environment and see how far it can get automating recon or vulnerability validation. Use MCP for tool calling. Make your SOC watch that run and notice the gaps. Improve the gap. Rinse and repeat.</p></li><li><p><strong>Design for &#8220;AI operator&#8221; skills on your team</strong> - Someone needs to own wiring agents into SOC automation, IR, vulnerability management, instead of leaving them as side projects.</p></li><li><p><strong>Invest in sandboxed execution by default</strong> - Any agent that can run code or touch internal systems should be executed in a tightly controlled sandbox (Cloudflare&#8217;s Claude Code <strong><a href="https://developers.cloudflare.com/sandbox/tutorials/claude-code/">sandbox</a></strong> model is a good reference). This will ensure agents can operating freely with agency and are not restricted.</p></li><li><p><strong>Try automating parts of your job that still requires critical thinking</strong> - I will take vulnerability triaging as an example here since that is something I have tried to automate myself with reasonable success and I have some data points to back my advice. Triaging vulnerabilities generally requires human critical thinking and reasoning because there are a lot of factors at play - environment, risk appetite, business impact, technical skills of the triager, etc. Automating it, pre AI, has been difficult because deterministic systems/scripts cannot reason through all of these factors the same way an AI agent can. Also, LLMs are trained on internet data so an AI agent can reason about a vulnerability covering a lot of different perspectives as compared to humans. The process of automating triaging has allowed me to learn about how to bake in determinism into AI native systems with a reasonable variation by building evals. This is a fundamental skill to learn if you want to build reliable AI systems. If you&#8217;re interested, I can cover how I think about this in a separate post. Let me know!</p></li></ul><p>To sum it up, you need to learn how to use AI as a <em><strong>force multiplier</strong></em> to build better defensive capabilities. This is just the beginning. The attackers are already moving - you don&#8217;t have the luxury of waiting.</p><div><hr></div><h2><strong>Excerpts that stood out (and what they mean)</strong></h2><blockquote><p>Claude maintained persistent operational context across sessions spanning multiple days</p></blockquote><p>Coding agents like Claude Code, Codex, Droid can operate for <strong>multiple days</strong> with minimal human intervention. The operators here pushed that to the edge: long-running agents, consistent context, and steady progress toward objectives. This is a concrete signal that fully autonomous agents aren&#8217;t far off; the timelines are shrinking fast.</p><blockquote><p>the human operators claimed that they were employees of legitimate cybersecurity firms and convinced Claude that it was being used in defensive cybersecurity testing.</p></blockquote><p>They evaded Claude&#8217;s safety controls with <strong>role-play</strong>: give the model a narrow, &#8220;defensive testing&#8221; story, assign it a persona, and frame harmful actions as part of that persona&#8217;s job. That&#8217;s one of the scariest patterns with AI systems today. As defenders, we need to explicitly think about <strong>how to detect and block role-play prompts that reframe abuse as &#8220;defense&#8221;.</strong></p><blockquote><p>Claude independently determined which credentials provided access to which services, mapping privilege levels and access boundaries without human direction</p></blockquote><p>Given the right tools and environment, coding agents can reason like a capable security operator. The key is <strong>agency inside a controlled sandbox</strong>: don&#8217;t over-constrain them, but don&#8217;t let them loose on prod either.</p><p>For exploit verification and vulnerability validation: arm the agent with the right tools, let it try attacks in a contained environment, and adapt dynamically to runtime context. Claude skills fit nicely here - you provide example scripts for a skill, and Claude Code generates new code (via the bash tool) on the fly from those patterns.</p><p>In SecureVibes, this is how I dynamically test authorization vulnerabilities against a live app. Today it&#8217;s not sandboxed. But, if I ever host SecureVibes as a service, sandboxing would be non-negotiable. (The authorization skill in SecureVibes is defined <strong><a href="https://github.com/anshumanbh/securevibes/tree/main/packages/core/securevibes/skills/dast/authorization-testing">here</a></strong>.)</p><blockquote><p>Structured markdown files tracked discovered services, harvested credentials, extracted data, exploitation techniques, and complete attack progression. This documentation enabled seamless handoff between operators, facilitated campaign resumption after interruptions, and supported strategic decision-making about follow-on activities.</p></blockquote><p>I&#8217;ve been beating the drum on <strong>Markdown as the memory layer for agents</strong>. The attackers did the same thing. SecureVibes also uses this pattern: each sub-agent writes a markdown file, and the next sub-agent consumes it. Claude Code handles the orchestration under the hood, and the flow is surprisingly robust.</p><p>Early on, I explored graph DBs and RAG DBs as the memory layer. They&#8217;re powerful but often <strong>overkill</strong> for this kind of workflow. For many use cases, you get 80&#8211;90% of the value by just letting agents read/write structured markdown files:</p><ul><li><p>Models &#8220;like&#8221; markdown</p></li><li><p>Files are human-readable</p></li><li><p>You can diff and audit them easily</p></li></ul><blockquote><p>An important limitation emerged during investigation: Claude frequently overstated findings and occasionally fabricated data during autonomous operations, claiming to have obtained credentials that didn&#8217;t work or identifying critical discoveries that proved to be publicly available information. This AI hallucination in offensive security contexts presented challenges for the actor&#8217;s operational effectiveness, requiring careful validation of all claimed results. This remains an obstacle to fully autonomous cyberattacks.</p></blockquote><p>Some folks are reading this as <em><strong>Anthropic contradicting its own claims</strong></em>. On one end, Anthropic claims bad actors used Claude to conduct a multi-entity campaign fully orchestrated end to end, running over multiple days. And, on the other end, they also claim that Claude fabricated details. I get it. It is definitely contradicting and confusing.</p><p>But, if you read between the lines, the point these folks are missing or potentially overlooking is this particular part - &#8220;<em><strong>requiring careful validation of all claimed results</strong></em>&#8220;. If you look at all the CRS built for <strong><a href="https://aicyberchallenge.com/">AIxCC</a></strong>, all teams have a &#8220;verifier&#8221; component in their systems. This is precisely to address the hallucinations or the fabrications that AI agents are prone to making. This is the very innate nature of non-deterministic AI systems. Anthropic is not contradicting its claims. It is simply stating the facts.</p><p>To me, the even more subtle takeaway here is that - if the bad actors got so far even with all the fabrication, just imagine how much more damage they can do if they build better <em><strong>verifiers</strong></em> to tackle the AI hallucinations. The race is on!</p><p>And, if you are still not convinced with this explanation, check the latest <strong><a href="https://x.com/karpathy/status/1990116666194456651">tweet</a></strong> from the OG <strong><a href="https://www.linkedin.com/in/andrej-karpathy-9a650716/">Andrej Karpathy</a></strong>.</p><div><hr></div><h2><strong>Summary</strong></h2><p>AI agents aren&#8217;t a demo anymore. They&#8217;re already running multi-day campaigns, chaining tools, and making decisions that used to require a room full of operators.</p><p>They&#8217;re dangerous if left unchecked, but useless if over-restricted. The real game is <strong>giving them agency inside tight, well-designed constraints</strong>: sandboxed environments, clear skills, human-in-the-loop checkpoints.</p><p>MCP is 100x&#8217;ing this on both sides. It makes it trivial for good and bad actors to wire agents into real systems and achieve security outcomes we haven&#8217;t seen at this scale before.</p><p>If attackers have AI-augmented teams and you don&#8217;t, you&#8217;re already behind. Time to wake up, my fellow defenders.</p><div><hr></div><h2><strong>Appendix / Resources</strong></h2><p>If you want to go deeper into how I&#8217;ve been building AI-native security systems and learning more about this space, these might help:</p><ul><li><p><strong>SecureVibes architecture walkthrough</strong> &#8211; How SecureVibes works end-to-end and how I use the Claude Agent SDK to orchestrate multiple agents.</p></li></ul><ul><li><p><strong>Vibecoding a DAST sub-agent</strong> &#8211; How I vibecoded a DAST-like sub-agent in SecureVibes and armed it with skills to test for authorization vulnerabilities.</p></li></ul><ul><li><p><strong>AI &amp; Security talk at 10x Genomics</strong> &#8211; A broader look at the AI + security landscape and how I&#8217;m thinking about it with real-world examples.</p></li></ul><div><hr></div><p>If this was useful, share it with someone on your security team and leave a comment with what you&#8217;re experimenting with on the defensive side to tackle AI offense. Also, don&#8217;t forget to like, follow and subscribe to my newsletter - <strong>AI Security Engineer</strong> so that you don&#8217;t miss the next edition!</p>]]></content:encoded></item><item><title><![CDATA[Running SecureVibes on SecureVibes - Results & What’s Next (Part 3/3)]]></title><description><![CDATA[This is Part 3 of a 3-part series on building SecureVibes, a multi-agent security system for vibecoded applications.]]></description><link>https://www.boringappsec.com/p/running-securevibes-on-securevibes</link><guid isPermaLink="false">https://www.boringappsec.com/p/running-securevibes-on-securevibes</guid><dc:creator><![CDATA[Anshuman Bhartiya]]></dc:creator><pubDate>Tue, 14 Oct 2025 15:15:00 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!fnVZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a59cd57-df56-410d-bc2e-bf7be43379db_1024x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!fnVZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a59cd57-df56-410d-bc2e-bf7be43379db_1024x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!fnVZ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a59cd57-df56-410d-bc2e-bf7be43379db_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!fnVZ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a59cd57-df56-410d-bc2e-bf7be43379db_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!fnVZ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a59cd57-df56-410d-bc2e-bf7be43379db_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!fnVZ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a59cd57-df56-410d-bc2e-bf7be43379db_1024x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!fnVZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a59cd57-df56-410d-bc2e-bf7be43379db_1024x1024.png" width="1024" height="1024" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3a59cd57-df56-410d-bc2e-bf7be43379db_1024x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1024,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1736031,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://boringappsec.substack.com/i/182127296?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a59cd57-df56-410d-bc2e-bf7be43379db_1024x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!fnVZ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a59cd57-df56-410d-bc2e-bf7be43379db_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!fnVZ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a59cd57-df56-410d-bc2e-bf7be43379db_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!fnVZ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a59cd57-df56-410d-bc2e-bf7be43379db_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!fnVZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a59cd57-df56-410d-bc2e-bf7be43379db_1024x1024.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h1>Testing a Security Scanner by Scanning Itself</h1><p>The best way to test a security scanner? Run it on its own codebase.</p><p>I built SecureVibes to find vulnerabilities in vibecoded applications. But SecureVibes itself is vibecoded&#8212;I didn&#8217;t write a single line of code myself. I used AI agents to build an AI agent system.</p><p>This meta experiment would answer two questions:</p><ol><li><p><strong>Does the multi-agent approach actually work?</strong></p></li><li><p><strong>How does it compare to traditional tools and single-agent systems?</strong></p></li></ol><p>I figured this was the perfect test case. I am familiar with what the system is supposed to be doing. Even though I vibecoded the entire thing, I am aware of the design decisions I made. I used AI as a companion and guided it to build this thing but I have no idea if its secure or not. This is exactly the problem I wanted to address in the first place.</p><div><hr></div><h1>The Experiment Design</h1><p>I ran SecureVibes on itself using three different Claude models:</p><ul><li><p><strong>Haiku</strong> (fast/cheap)</p></li><li><p><strong>Sonnet</strong> (balanced) - Also, ran it twice with Sonnet to see the variance in the results because of the non-deterministic nature of SecureVibes</p></li><li><p><strong>Opus</strong> (premium)</p></li></ul><p>Then I compared results against:</p><ul><li><p><strong>Traditional SAST:</strong> Semgrep, Bandit</p></li><li><p><strong>Single-agent systems:</strong> Claude Code, Codex, Droid</p></li><li><p><strong>Custom Droid</strong> with security focus</p></li></ul><p>All detailed reports are available at <a href="https://github.com/anshumanbh/securevibes/tree/main/docs/example-reports">github.com/anshumanbh/securevibes/docs/example-reports</a>.</p><p>Here&#8217;s what I found&#8230;</p><div><hr></div><h1>Results: Model Comparison</h1><h2>Haiku vs Sonnet vs Opus</h2><p><strong>Sonnet wins hands down.</strong> Not just subjectively, but objectively:</p><p>Model Vulnerabilities Found Cost Value Score Haiku 2 $0.15 Poor Sonnet 17 $3.44 <strong>Best</strong> Opus 12 $7.64 Good</p><p>Sonnet found 17 vulnerabilities at $3.44, while Opus found only 12 at $7.64. Haiku&#8217;s $0.15 price tag is tempting, but catching only 2 issues means you&#8217;re flying blind.</p><p><strong>The sweet spot for security scanning isn&#8217;t the cheapest or most expensive model&#8212;it&#8217;s the one that balances depth of analysis with practical cost constraints.</strong> Sonnet proves that the middle path can outperform the premium option. As to why Opus didn&#8217;t do well, I am curious about that as well. I don&#8217;t have a good answer unfortunately.</p><h2>Multiple Runs of Sonnet</h2><p>I ran Sonnet twice to see if results were consistent. About 12-13 vulnerabilities appeared in both reports (core issues like API keys, path traversal, JSON validation). But each run found 4-5 unique issues:</p><p><strong>Unique to Run 1:</strong></p><ul><li><p>Race conditions in concurrent scans</p></li><li><p>Symlink traversal enabling infinite loops</p></li><li><p>Git commit protection warnings</p></li><li><p>Report authenticity verification</p></li></ul><p><strong>Unique to Run 2:</strong></p><ul><li><p>Prompt injection defense gaps</p></li><li><p>Model downgrade attacks via env vars</p></li><li><p>Hardcoded credentials exposure flow</p></li><li><p>Tool parameter validation</p></li></ul><p>The union of both runs found ~21 distinct issue types.</p><p><strong>This reveals a powerful insight: running the same scanner multiple times might actually increase coverage.</strong> For critical codebases, consider 2-3 runs despite added cost. The probabilistic nature of LLMs means different runs can catch different issues.</p><div><hr></div><h1>Results: SecureVibes vs Everything Else</h1><h2>vs Traditional SAST</h2><p>I ran two popular open-source SAST tools:</p><ul><li><p><strong><a href="https://github.com/anshumanbh/securevibes/blob/main/docs/example-reports/securevibes_semgrep.md">Semgrep</a></strong> - 0 findings</p></li><li><p><strong><a href="https://github.com/anshumanbh/securevibes/blob/main/docs/example-reports/securevibes_bandit.md">Bandit</a></strong> - 0 findings</p></li></ul><p><strong>Why zero findings?</strong> These tools look for syntactic patterns. They can&#8217;t detect architectural issues like &#8220;CLI bypass via symlink attack&#8221; or &#8220;insufficient permission validation in file operations&#8221;&#8212;exactly what SecureVibes found.</p><p>This is unfortunately the state of current <strong>open source</strong> code security scanners. They&#8217;re excellent at finding known patterns but terrible at understanding context.</p><h2>vs Single-Agent Systems</h2><blockquote><p>Prompt - &#8220;perform a security review of the current codebase&#8221;</p></blockquote><p>I ran the same security review task using coding agents without specialized multi-agent workflows:</p><p>System with Model Vulnerabilities Found Claude Code with Sonnet 4.5 9 Codex with GPT-5-codex 4 Droid with GLM 4.6 7 <strong>SecureVibes with Sonnet</strong> <strong>16</strong></p><p>SecureVibes crushed the coding agents in their default setting:</p><ul><li><p><strong>78% more issues</strong> than Claude Code (16 vs 9)</p></li><li><p><strong>4x more issues</strong> than Codex (16 vs 4)</p></li><li><p><strong>2.3x more issues</strong> than Droid (16 vs 7)</p></li></ul><p><strong>Why the difference?</strong> Single-agent systems lack structured workflow. They scan linearly. SecureVibes builds context (Phase 1), hypothesizes (Phase 2), then validates (Phase 3). This progressive refinement mirrors how human security teams work.</p><h2>vs Custom Security Droid</h2><p>I also set up a <a href="https://github.com/anshumanbh/securevibes/blob/main/.factory/droids/security-audit.yaml">custom droid</a> specifically for security audits and ran it with Sonnet 4.5. The report is <a href="https://github.com/anshumanbh/securevibes/blob/main/docs/example-reports/securevibes_custom_droid_sonnet45.md">here</a>.</p><blockquote><p>Prompt - &#8220;security-audit: Review entire codebase for vulnerabilities&#8221;</p></blockquote><p><strong>Results: 23 vulnerabilities found</strong></p><ul><li><p>4 Critical (vs SecureVibes: 2-4)</p></li><li><p>9 High (vs SecureVibes: 6)</p></li><li><p>7 Medium (vs SecureVibes: 6-9)</p></li><li><p>3 Low (vs SecureVibes: 0)</p></li></ul><p><strong>The Custom Droid found 35-44% more vulnerabilities</strong> than SecureVibes using the same model. This taught me what I call <strong>&#8220;learning the bitter lesson&#8221;</strong>: Using the same model <code>sonnet 4.5</code>, the output from the custom Droid is actually pretty good as compared to the one with SecureVibes.</p><p><strong>What this means:</strong> All the work I did over the past few days building a custom multi-agent system essentially got matched by a feature Factory released in their coding agent. If you&#8217;re using Claude Code, I believe the same outcome can be achieved by building your own suite of Claude Code subagents&#8212;very much like what I did with SecureVibes, but you&#8217;d have to know what you&#8217;re doing.</p><p><strong>The quality difference:</strong> The Custom Droid found several unique vulnerabilities SecureVibes missed:</p><ul><li><p>More granular categorization (Low severity tier)</p></li><li><p>Additional timeout and rate limiting issues</p></li><li><p>More comprehensive error handling gaps</p></li><li><p>Better detection of compliance-related issues (GDPR, SOC 2)</p></li></ul><p><strong>But this isn&#8217;t defeat&#8212;it&#8217;s validation.</strong> The multi-agent approach works so well that platforms are building it in as native features. There are still plenty of opportunities here. This is just the first iteration of SecureVibes and I believe I can definitely improve the results and get it at par with the custom droid results:</p><ul><li><p><strong>Domain expertise matters:</strong> Continue improving agents with security-specific knowledge</p></li><li><p><strong>Privacy-first options:</strong> Build versions that work with local models to preserve IP</p></li><li><p><strong>Accessibility:</strong> Non-technical users still need a UI, not command-line tools</p></li><li><p><strong>SDLC integration:</strong> Build custom droids/agents for different security gates (PR review, pre-commit, pre-deploy)</p></li></ul><div><hr></div><h1>Key Learnings</h1><h2>Filesystem Threat Boundary</h2><p><strong>Most vulnerabilities were CLI &#8596; filesystem interactions.</strong> It makes sense&#8212;that&#8217;s the product so AI understands the threat model and the boundaries really well between the CLI program and the host machine&#8217;s file system.</p><h2>Multi-agent &gt; Single-agent</h2><p>This was the biggest validation. The multi-agent approach consistently outperformed single-agent attempts. <strong>The progressive refinement (context &#8594; threats &#8594; validation) mirrors how human security teams work,</strong> and it shows in the quality of results.</p><p>The Claude Agent SDK is a game changer for building multi-agent systems. It handles orchestration, so you can focus on designing the workflow and prompts.</p><h2>File-based Communication is Underrated</h2><p>Early versions used in-memory state passing between agents. It was a nightmare to debug when something went wrong.</p><p>Switching to file-based communication (.md and .json files) made the system so much easier to understand, debug, and extend. <strong>I can inspect any phase&#8217;s output, replay phases, and even manually edit artifacts to test edge cases.</strong> Markdown surprisingly works great for both humans and machines.</p><h2>Real-time Progress Streaming is Essential</h2><p>Initially, SecureVibes used filesystem polling to detect phase completions. During 10-15 minute scans, users would see progress updates only every 30-60 seconds, leading to &#8220;is it frozen?&#8221; moments.</p><p>I rebuilt it using the Claude SDK&#8217;s hooks system (<code>PreToolUse</code>, <code>PostToolUse</code>, <code>SubagentStop</code>) for event-driven streaming. <strong>Now users see exactly what each agent is doing in real-time</strong>&#8212;which files it&#8217;s reading, what patterns it&#8217;s searching for. This dramatically improved UX.</p><h2>STRIDE is Still Relevant</h2><p>I was skeptical about using a traditional threat modeling framework (STRIDE) in an AI-driven system. But it turned out to be perfect.</p><p><strong>It gives the Threat Modeling Agent a structured way to think about threats, ensuring comprehensive coverage across all categories.</strong> Without STRIDE, the agent would often focus too heavily on one vulnerability class (usually injection attacks) and ignore authorization or audit issues.</p><h2>False Positives are the Enemy</h2><p>Traditional SAST tools have terrible false positive rates. By using the three-phase approach where Phase 3 validates threats with concrete evidence, <strong>SecureVibes&#8217; false positive rate is dramatically lower.</strong></p><p>The agent must provide the exact line number, code snippet, and explanation of exploitability. This forces it to actually confirm the vulnerability exists rather than flagging suspicious-looking patterns.</p><h2>Claude SDK Orchestration is Magical</h2><p>I initially built a custom orchestrator agent to coordinate the workflow. Then I realized the SDK itself handles orchestration&#8212;you just define agents and Claude figures out when to invoke them.</p><p><strong>This cut hundreds of lines of coordination code and made the system more reliable.</strong> The SDK handles error recovery, retries, and state management automatically.</p><h2>AI Coding Agents Accelerate Development</h2><p>I used Factory&#8217;s Droid and Claude Sonnet 4.5 for this project. I first used Claude Code along with Github MCP and Anthropic documentation to create a comprehensive guide on the Claude Agent SDK. You can find that <a href="https://github.com/anshumanbh/securevibes/blob/main/docs/references/claude-agent-sdk-guide.md">here</a>.</p><p>Then I had Droid reference that guide to build features. <strong>The combination of context-aware coding agents and good documentation dramatically sped up development.</strong></p><blockquote><p>NOTE: I can&#8217;t recommend Factory&#8217;s Droid enough. It is a game changer. There have been multiple instances where Claude Code, Codex and Cursor just failed to deliver and Droid was able to one-shot it. If you want to try it out, here is a referral code worth $40 credits - <a href="https://app.factory.ai/r/Z2B374AY">https://app.factory.ai/r/Z2B374AY</a>. I promise you will not be disappointed!</p></blockquote><h2>Iterative refinement is key</h2><p>Text extraction from different agent outputs (especially markdown), JSON parsing, and prompt engineering all required multiple iterations. The first version of any prompt never works perfectly. I learned to build in instrumentation early (debug modes, verbose logging) to understand what&#8217;s actually happening.</p><h2>Build First, Optimize Later</h2><p>The current system is expensive. If you are on a Claude subscription plan, you don&#8217;t have to worry about this too much but if you don&#8217;t have one and want to just pay as you go for the API requests, the costs can rack up really fast, especially if you run periodic scans on entire codebases. My focus for the first iteration wasn&#8217;t on building a cost effective system. Now, that I know it works - I will continue to find ways in order to make this cheaper to run.</p><div><hr></div><h1>What&#8217;s Next: Building in Public</h1><p>This is just the beginning. I&#8217;m committed to building this in public and inviting the community to join me on this journey. Here are some items on my wishlist:</p><h2>1. Dashboard</h2><p>Right now, SecureVibes outputs results to the terminal and in different file formats - JSON and Markdown. I want to build a web dashboard that provides:</p><ul><li><p>Visual trend analysis (are vulnerabilities increasing or decreasing over time?)</p></li><li><p>Vulnerability timeline and history</p></li><li><p>Team collaboration features (assign findings, track remediation)</p></li><li><p>Integration with issue trackers (Jira, GitHub Issues, Linear)</p></li><li><p>Comparison between scans (what changed?)</p></li></ul><h2>2. Fixer Sub-Agent</h2><p>Finding vulnerabilities is great, but fixing them is where the real value is. I want to build a Fixer Agent that:</p><ul><li><p>Takes a vulnerability from <code>VULNERABILITIES.json</code></p></li><li><p>Reads the vulnerable code in context</p></li><li><p>Generates a patch that fixes the issue</p></li><li><p>Explains what it changed and why</p></li><li><p>Creates a PR with the fix (optional)</p></li></ul><p>This is tricky because the fix needs to actually work (not break functionality), preserve the original intent of the code, and consider the broader codebase context.</p><h2>3. Evaluation Framework</h2><p>The hardest problem in AI security tools: how do you know if it&#8217;s actually working?</p><p>I want to build a comprehensive evaluation framework:</p><ul><li><p><strong>Benchmark datasets</strong> - Known vulnerable applications (WebGoat, pygoat, NodeGoat, etc.)</p></li><li><p><strong>Ground truth</strong> - Manually verified vulnerability sets for each benchmark</p></li><li><p><strong>Metrics</strong> - Precision, recall, F1 score for each vulnerability class</p></li><li><p><strong>Regression testing</strong> - Ensure updates don&#8217;t decrease detection quality</p></li><li><p><strong>Comparison</strong> - How does SecureVibes compare to Semgrep, Snyk, etc.?</p></li></ul><p>This is crucial for validating that improvements actually improve detection, building trust with users, identifying weak spots in detection, and benchmarking against other tools.</p><h2>4. Context Engineering</h2><h3>via MCP</h3><p>Claude Agents SDK has <a href="https://docs.claude.com/en/api/agent-sdk/mcp">MCP</a> support. This is really exciting because what this allows is for the subagents to bring in context from other services/systems.</p><p>This is essentially how AI native systems can be made smarter, more efficient and accurate. For example, if an app has an existing threat model saved in Jira, we could use the SDK to fetch that and use it with the subagents. The possibilities are endless!</p><h3>Compacting / Pre and Post Processing</h3><p>Right now, all agents get access to the entire repository. But for large codebases (10k+ files), this is inefficient and expensive. I want to build a Context Engineer that:</p><ul><li><p>Analyzes the repository structure</p></li><li><p>Identifies high-risk files (auth, API endpoints, DB queries, file handling)</p></li><li><p>Creates a &#8220;security-relevant file subset&#8221;</p></li><li><p>Passes only this subset to downstream agents</p></li></ul><p>This would dramatically reduce token usage and costs for large repositories, while focusing analysis on the code that actually matters from a security perspective.</p><h2>5. Make SecureVibes work with other models</h2><p>Currently, since I am using Claude Agent SDK, this will work with Anthropic&#8217;s models only. And, its not cheap by any means. A full comprehensive scan of a somewhat medium codebase can cost anywhere between $2-$5. Being able to use local models to achieve similar results will unlock a lot of new opportunities for this system to be used in regulated industries, where sending proprietary code (IP) information to frontier model companies is prohibited. Not to mention, this will also help with cost savings.</p><h2>6. Make SecureVibes into a Web Cyber Reasoning System (Web CRS)</h2><p>Inspired by the <a href="https://aicyberchallenge.com/">AIxCC</a> CRS (Cyber Reasoning Systems), I&#8217;d really like to emulate how those systems are designed with multiple layers of validation. If we can build such a CRS encapsulating the current SAST capabilities, along with DAST capabilities (in particular for web applications), I&#8217;d consider that a huge win! Imagine:</p><p>finding vulnerabilities via source code analysis -&gt; validating them via dynamic analysis -&gt; proposing a fix -&gt; validating the fix works.</p><h2>7. Make SecureVibes self-improving</h2><p>The current scan results are good but I don&#8217;t necessarily agree with all the severities. I also don&#8217;t want to fix a few of these yet because its a CLI tool at the end of the day that I am going to be running locally on my machine. But, they might definitely manifest into something bigger in the long term so I want to triage all of these manually and provide a justification as to what I think about them. I would then like SecureVibes to update its threat model and keep my preferences in mind so that it becomes smarter with every feedback I provide it.</p><div><hr></div><h1>How You Can Contribute</h1><p>This is an open source project, and <strong>contributions are welcome</strong>! Here are ways you can help:</p><h2>&#128295; Contribute Code</h2><p>Areas where help is especially welcome:</p><ul><li><p>Improving prompts for specific vulnerability classes</p></li><li><p>Building the dashboard</p></li><li><p>Creating benchmark datasets</p></li><li><p>Everything mentioned in the wishlist above</p></li></ul><h2>&#127908; Spread the Word</h2><p>If you find SecureVibes useful, share it! Tweet about it, write about it, present it at meetups. The more people use it, the better it gets.</p><h2>&#11088; Star the Repo</h2><p>GitHub stars help with visibility. If you think this project is interesting, <a href="https://github.com/anshumanbh/securevibes">give it a star</a>!</p><div><hr></div><h1>Conclusion</h1><p>Building SecureVibes has been one of the most rewarding projects I&#8217;ve worked on. It combines my passion for security with the exciting possibilities of AI agents. <strong>The multi-agent architecture proved that we can build AI security tools that are not just &#8220;smart pattern matchers&#8221; but systems that reason about security the way human experts do.</strong></p><p>We&#8217;re at an inflection point with AI and security. LLMs are finally capable enough to handle complex security reasoning, but we&#8217;re still figuring out the right architectures and workflows. Context is key! I believe multi-agent systems like SecureVibes are the future&#8212;not because they&#8217;re trendy, but because they work.</p><p><strong>The vibecoding era has democratized software development</strong>&#8212;anyone can build an app with AI assistance. But with that democratization comes risk. Many vibecoded applications are built by developers who aren&#8217;t familiar with security best practices, using unfamiliar tech stacks, and shipping to production quickly. <strong>SecureVibes aims to make security accessible to these developers</strong>, providing professional-grade vulnerability detection without requiring security expertise.</p><p><strong>Try SecureVibes on your codebase today.</strong> Open an issue if you find bugs. Submit a PR if you have ideas. Let&#8217;s build the future of AI-native security together.</p><div><hr></div><h2>Follow Along</h2><p>I&#8217;ll post about new features, challenges I&#8217;m facing, design decisions, and lessons learned. If you&#8217;re interested in AI agents, security tooling, or building in public, follow along!</p><ul><li><p><strong>LinkedIn</strong>: <a href="https://www.linkedin.com/in/anshumanbhartiya/">@anshumanbhartiya</a></p></li><li><p><strong>GitHub</strong>: <a href="https://github.com/anshumanbh/securevibes">securevibes repository</a></p></li><li><p><strong>Blog</strong>: <a href="https://anshumanbhartiya.com/">anshumanbhartiya.com</a></p></li><li><p><strong>Discord</strong>: https://discord.gg/9cYqTBdC9h</p></li></ul><div><hr></div><p>&#8220;I don&#8217;t know where I am going, but I know how to get there&#8221; - Boyd Varty</p>]]></content:encoded></item><item><title><![CDATA[Building SecureVibes: A Multi-Agent Security System (Part 2/3)]]></title><description><![CDATA[This is Part 2 of a 3-part series on building SecureVibes, a multi-agent security system for vibecoded applications.]]></description><link>https://www.boringappsec.com/p/building-securevibes-a-multi-agent</link><guid isPermaLink="false">https://www.boringappsec.com/p/building-securevibes-a-multi-agent</guid><dc:creator><![CDATA[Anshuman Bhartiya]]></dc:creator><pubDate>Tue, 14 Oct 2025 15:10:00 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!dR-i!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30b19941-fcc6-43f3-b948-77a9e8e463f7_1024x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!dR-i!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30b19941-fcc6-43f3-b948-77a9e8e463f7_1024x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!dR-i!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30b19941-fcc6-43f3-b948-77a9e8e463f7_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!dR-i!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30b19941-fcc6-43f3-b948-77a9e8e463f7_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!dR-i!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30b19941-fcc6-43f3-b948-77a9e8e463f7_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!dR-i!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30b19941-fcc6-43f3-b948-77a9e8e463f7_1024x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!dR-i!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30b19941-fcc6-43f3-b948-77a9e8e463f7_1024x1024.png" width="1024" height="1024" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/30b19941-fcc6-43f3-b948-77a9e8e463f7_1024x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1024,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1742426,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://boringappsec.substack.com/i/182127051?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30b19941-fcc6-43f3-b948-77a9e8e463f7_1024x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!dR-i!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30b19941-fcc6-43f3-b948-77a9e8e463f7_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!dR-i!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30b19941-fcc6-43f3-b948-77a9e8e463f7_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!dR-i!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30b19941-fcc6-43f3-b948-77a9e8e463f7_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!dR-i!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30b19941-fcc6-43f3-b948-77a9e8e463f7_1024x1024.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h1>From Hypothesis to Implementation</h1><p>In <a href="https://boringappsec.substack.com/p/the-vibecoding-security-crisis-why">Part 1</a>, I outlined the hypothesis: multi-agent systems could outperform single-agent scanners by mimicking how human security teams work&#8212;understanding architecture first, then modeling threats, then validating them in code.</p><p>I&#8217;ve been pondering what an AI-native security scanner/system could look like to secure vibecoded apps and whether I could build it with an agentic framework like the <a href="https://docs.claude.com/en/api/agent-sdk/overview">Claude Agent SDK</a>. Generally speaking, building a code security scanner isn&#8217;t trivial. And, if you add AI on top of that, you have to know what you&#8217;re doing otherwise it&#8217;s just a time sink of a project, with no real ROI. There has to be some method to this madness.</p><p>I realized that I could programmatically invoke Claude Code and build my own workflow with multiple <a href="https://docs.claude.com/en/api/agent-sdk/subagents">subagents</a> with the orchestration completely abstracted by the SDK.</p><p><strong>SecureVibes consists of four specialized agents working in sequence.</strong> Here&#8217;s how each one works, what I learned building them, and why certain design decisions matter.</p><div><hr></div><h1>The Multi-Agent Architecture Overview</h1><p>The architecture follows a simple pipeline:</p><p><strong>Codebase &#8594; Agent 1 (SECURITY.md) &#8594; Agent 2 (THREAT_MODEL.json) &#8594; Agent 3 (VULNERABILITIES.json) &#8594; Agent 4 (scan_results.json)</strong></p><p>Each agent produces a file artifact that becomes input for the next stage. This file-based communication pattern (more on this later) proved to be one of the best design decisions.</p><h2>Claude SDK Orchestration</h2><p>Unlike traditional multi-agent systems that require custom orchestration code, SecureVibes leverages the <strong>Claude Agent SDK&#8217;s built-in orchestration</strong>. Claude itself coordinates the agents:</p><ol><li><p>Receives the high-level goal: &#8220;Scan this repo for vulnerabilities&#8221;</p></li><li><p>Intelligently decides to run Phase 1 &#8594; generates <code>SECURITY.md</code></p></li><li><p>Passes <code>SECURITY.md</code> to Phase 2 &#8594; generates <code>THREAT_MODEL.json</code></p></li><li><p>Passes both artifacts to Phase 3 &#8594; generates <code>VULNERABILITIES.json</code></p></li><li><p>Runs Phase 4 &#8594; generates final <code>scan_results.json</code></p></li><li><p>Tracks costs and timing across all agents</p></li></ol><p><strong>Why this matters:</strong> The SDK handles agent coordination, tool access control, file management, and error recovery. This means less code to maintain and more reliable execution. The <code>Scanner</code> class simply provides the high-level prompt and agent definitions&#8212;Claude figures out the rest.</p><div><hr></div><h1>Phase 1: Assessment Agent (The Architect)</h1><p>The Assessment Agent acts as a software architect, analyzing your codebase to create comprehensive security documentation. It explores your code using <code>Read</code>, <code>Grep</code>, <code>Glob</code>, and <code>LS</code> tools, and generates a structured <code>SECURITY.md</code> document.</p><p><strong>What it documents:</strong></p><ul><li><p>Overall architecture and component structure</p></li><li><p>Data flow between components</p></li><li><p>Authentication and authorization mechanisms</p></li><li><p>External dependencies and APIs</p></li><li><p>Sensitive data paths (credentials, PII, etc.)</p></li><li><p>Entry points (APIs, forms, CLI commands)</p></li><li><p>Technology stack and frameworks</p></li><li><p>Existing security controls</p></li></ul><p>Think of this as the reconnaissance phase. The agent is learning: &#8220;What does this application do? How is it built? What does it handle?&#8221;</p><p>Here&#8217;s an example of what the output looks like:</p><pre><code><code>## Authentication Mechanism
- JWT tokens stored in localStorage
- Refresh tokens in httpOnly cookies
- Token validation in middleware/auth.py:45-67
- Session management using Redis store
</code></code></pre><p><strong>Key design decision:</strong> I gave this agent only read-only tools. It can explore but not modify. This ensures it focuses purely on understanding, not changing anything.</p><p><strong>Prompt engineering insight:</strong> I gave it an exact template. First attempts produced walls of unstructured text. The template ensures consistency&#8212;every <code>SECURITY.md</code> follows the same structure, making it reliable input for Phase 2.</p><div><hr></div><h1>Phase 2: Threat Modeling Agent (The Strategist)</h1><p>The Threat Modeling Agent takes the <code>SECURITY.md</code> from Phase 1 and performs STRIDE-based threat analysis.</p><p><strong>STRIDE stands for:</strong></p><ul><li><p><strong>S</strong>poofing - Identity verification issues</p></li><li><p><strong>T</strong>ampering - Data integrity issues</p></li><li><p><strong>R</strong>epudiation - Audit and logging issues</p></li><li><p><strong>I</strong>nformation Disclosure - Confidentiality issues</p></li><li><p><strong>D</strong>enial of Service - Availability issues</p></li><li><p><strong>E</strong>levation of Privilege - Authorization issues</p></li></ul><p>For each identified threat, it generates:</p><ul><li><p>Specific threat title and description</p></li><li><p>STRIDE category</p></li><li><p>Severity level (critical, high, medium, low)</p></li><li><p>Affected components</p></li><li><p>Attack scenario</p></li><li><p>Potential vulnerability types (with CWE IDs)</p></li><li><p>Mitigation strategies</p></li></ul><p>The output is a structured <code>THREAT_MODEL.json</code> with all identified threats.</p><p>Here&#8217;s what the output looks like:</p><pre><code><code>{
  "threat_id": "T-001",
  "title": "SQL Injection in User Login",
  "stride_category": "Tampering",
  "severity": "critical",
  "affected_components": ["auth.py", "/api/v1/login"],
  "attack_scenario": "Attacker crafts malicious SQL in username field...",
  "cwe_id": "CWE-89",
  "mitigation": "Use parameterized queries or ORM"
}
</code></code></pre><p><strong>Key design decision:</strong> I constrained the agent to output structured JSON rather than free-form text. This makes the output machine-readable and eliminates parsing ambiguity. File-based communication between agents is way more reliable than trying to parse natural language.</p><p><strong>Why STRIDE still matters:</strong> It forces comprehensive coverage. Without it, the agent fixates on injection attacks and ignores authorization issues. STRIDE gives the Threat Modeling Agent a structured way to think about threats, ensuring coverage across all categories.</p><div><hr></div><h1>Phase 3: Code Review Agent (The Validator)</h1><p>The Code Review Agent takes both <code>SECURITY.md</code> (context) and <code>THREAT_MODEL.json</code> (threats to validate) and searches the actual codebase to confirm which threats are real vulnerabilities.</p><p>For each confirmed vulnerability, it provides:</p><ul><li><p>Exact file path and line number</p></li><li><p>Code snippet showing the vulnerability</p></li><li><p>Detailed explanation of how it&#8217;s exploitable</p></li><li><p>CWE ID</p></li><li><p>Severity level</p></li><li><p>Specific remediation recommendation</p></li><li><p>Evidence of exploitability</p></li></ul><p>The output is <code>VULNERABILITIES.json</code> with only confirmed, validated vulnerabilities&#8212;no theoretical risks.</p><p>Here&#8217;s an example:</p><pre><code><code>{
  "threat_id": "VULN-001",
  "title": "SQL Injection in User Authentication",
  "description": "The user_id from request.args is concatenated directly into SQL query without sanitization. An attacker can inject SQL commands to bypass authentication or extract database contents.",
  "severity": "critical",
  "file_path": "api/auth.py",
  "line_number": 157,
  "code_snippet": "query = f\"SELECT * FROM users WHERE id = {user_id}\"",
  "cwe_id": "CWE-89",
  "recommendation": "Use parameterized queries: cursor.execute(\"SELECT * FROM users WHERE id = ?\", (user_id,))",
  "evidence": "The variable user_id is read from request.args at line 155 without any validation. It's then directly interpolated into the SQL string at line 157. Testing with user_id='1 OR 1=1--' would bypass the WHERE clause. Exploitability: HIGH."
}
</code></code></pre><blockquote><p>NOTE: These vulnerabilities still need to be confirmed as exploitable dynamically. Since we don&#8217;t have that feature/agent (yet!), this agent will only be able to confirm by statically analyzing the codebase. As SecureVibes continues to evolve over time, my hope is that we will be able to combine both static and dynamic analysis at some point to be 100% sure. Stay tuned!</p></blockquote><p><strong>Key design decision:</strong> The prompt explicitly instructs the agent to distinguish between real vulnerabilities and false positives. It must provide concrete evidence, not just flag suspicious patterns. This dramatically reduces false positive rates.</p><p><strong>This is where the multi-agent approach shines.</strong> Phase 3 isn&#8217;t guessing&#8212;it&#8217;s validating specific hypotheses from Phase 2 with architectural context from Phase 1. The agent knows what to look for, where to look, and how to interpret what it finds.</p><div><hr></div><h1>Phase 4: Report Generator (The Compiler)</h1><p>The Report Generator Agent takes the raw vulnerability data from Phase 3 and compiles it into a final, structured report.</p><p><strong>What it does:</strong></p><ul><li><p>Reads <code>VULNERABILITIES.json</code></p></li><li><p>Standardizes the format across all findings</p></li><li><p>Adds metadata (scan time, file count, costs)</p></li><li><p>Generates <code>scan_results.json</code> with consistent schema</p></li><li><p>Calculates severity distribution stats</p></li></ul><p><strong>Key design decision:</strong> This separate formatting step ensures the final output is always consistent, even if Phase 3&#8217;s output varies slightly. It also makes it easier to add new output formats (Markdown, HTML, SARIF, etc.) in the future.</p><div><hr></div><h1>Building It: The Journey</h1><h2>Initial Approach (What Didn&#8217;t Work)</h2><p>I&#8217;ve been playing with terminal based coding agents like Claude Code and Codex for a while now. When I first tried to build a security scanner, I took the obvious approach: give Claude access to the entire codebase and ask it to find vulnerabilities.</p><p>The results were&#8230; underwhelming.</p><p>The agent would either:</p><ol><li><p>Get overwhelmed by the context and produce generic findings</p></li><li><p>Focus too narrowly on one file and miss the bigger picture</p></li><li><p>Report patterns that looked suspicious but weren&#8217;t actually vulnerable</p></li></ol><p>I realized the problem: <strong>I was asking one agent to be an architect, a threat modeler, AND a security auditor simultaneously.</strong></p><p>Human security teams don&#8217;t work that way. Why should AI?</p><p>When a security professional reviews an application, they don&#8217;t just grep for &#8220;SQL injection&#8221;. They follow a structured process. And this realization led me to a different approach.</p><h2>The Multi-Agent Breakthrough</h2><p>I&#8217;ve been following the Claude Agent SDK developments pretty closely. They recently revamped the SDK. So, I decided to give it a try.</p><p>The breakthrough came when I structured it as a pipeline:</p><ol><li><p><strong>Assessment</strong> creates context</p></li><li><p><strong>Threat Modeling</strong> uses that context to hypothesize threats</p></li><li><p><strong>Code Review</strong> validates those specific threats in code</p></li></ol><p>Each agent has a narrow, focused task. And each agent&#8217;s output becomes the next agent&#8217;s input. This creates a progressive refinement of analysis.</p><p><strong>The results were dramatically better.</strong> False positives dropped significantly because Phase 3 is validating specific threats, not randomly pattern-matching. The findings were more detailed because each agent had the right context for its specific task.</p><p>My first multi-agent version tried to pass data in-memory between agents. Debugging was a nightmare. I couldn&#8217;t see what Phase 1 actually sent to Phase 2. When something went wrong in Phase 3, I had no visibility into whether the issue was bad input from Phase 2 or a problem with Phase 3&#8217;s logic.</p><p>Switching to file-based communication (.md and .json files) made the system so much easier to understand, debug, and extend. I can inspect any phase&#8217;s output, replay phases, and even manually edit artifacts to test edge cases.</p><h2>Prompt Engineering Hell (and Heaven)</h2><p>Getting the prompts right was the hardest part. Here&#8217;s what I learned:</p><p><strong>For the Assessment Agent:</strong></p><ul><li><p>Initially, it would produce walls of text with no structure</p></li><li><p>Solution: Provide an exact template in the prompt with section headers</p></li><li><p>Result: Consistent, well-structured <code>SECURITY.md</code> every time</p></li></ul><p><strong>For the Threat Modeling Agent:</strong></p><ul><li><p>First attempts produced generic threats like &#8220;SQL injection might exist&#8221;</p></li><li><p>Solution: Explicitly instruct it to be specific based on the <em>actual</em> architecture</p></li><li><p>Added: &#8220;Focus on SPECIFIC threats based on the ACTUAL architecture, not generic security advice&#8221;</p></li><li><p>Result: Specific threats like &#8220;SQL injection in user login endpoint at /api/v1/login due to string concatenation in auth.py line 157&#8221;</p></li></ul><p><strong>For the Code Review Agent:</strong></p><ul><li><p>It would sometimes return line ranges instead of exact line numbers</p></li><li><p>Solution: Prompt emphasized &#8220;Provide actual line numbers, not ranges&#8221; and &#8220;Include actual vulnerable code, not pseudocode&#8221;</p></li><li><p>Result: Precise findings with exact locations</p></li></ul><p>The system prompt is where you define the agent&#8217;s expertise. The user prompt is where you give it the task and constraints. Getting both right is critical.</p><p>At some point down the line, I wish to get rid of prompt engineering completely by adapting to something like <a href="https://dspy.ai/">DSPy</a>.</p><div><hr></div><h1>What&#8217;s Next?</h1><p>The architecture is elegant. The prompts are refined. The agents work in harmony. But here&#8217;s the real question&#8230;</p><p><strong>Does this multi-agent approach actually find more vulnerabilities than single-agent systems? Than traditional SAST tools? Than using Claude Code directly?</strong></p><p>I ran SecureVibes on its own codebase to find out. The results surprised me.</p><p>In Part 3, I share the full comparative analysis (SecureVibes vs Semgrep, Bandit, Claude Code, Codex, and more), model comparison (Haiku vs Sonnet vs Opus), key learnings, what&#8217;s next for SecureVibes, and how you can contribute.</p>]]></content:encoded></item></channel></rss>