Published on 2026-06-05 • 8 min read

Ensuring Safe AI Pen Testing: A Chat with Team Shinobi

Learn how AI can safely conduct penetration testing with expert insights on guardrails, risk management, and more.

Author

Varun Uppal

Founder

Table of Contents

Transcript: Ensuring Safe AI Pen Testing: A Chat with Team Shinobi

Hey guys. So, uh, you know, we get, uh, a lot of questions around how safe is Shinobi or how safe is AI at pentesting. So, I thought it's a good idea we all just get together and do a Q&A. Uh, so we've got Dave on the call. Uh Dave uh you know you've been uh training human pen testers in your previous life. Now you train AIS specifically you train Shinobi. Um and Abishek uh you know you're ultimately responsible for making sure our system does pen testing safely. Uh so you create the guardrails in the system. So I think uh you know between the two of you we have uh the best people on the call or the best people we could hope for on the call uh to answer some of the questions that we typically get asked from customers. Um Dave, you know, a few months ago, we were having this conversation and you had this profound insight where you said, "Look, it's not going to be uh the question is not going to be can AI find vulnerabilities or can it hack. The question is going to be how safely can it do it?" I mean, what led you to that insight? Yeah, I guess what led me to that is I found that the overriding question that people seem to be asking was can AI find vulnerabilities and through all of the work that we've done with our customers with PC with our testing we already knew that yes it can find vulnerabilities for me the the question was more can it find them safely can it operate as a as as a pentester. So having trained human pentesters, one of the big things that is instilled in us is the uh or are the frameworks that we operate to. So when you release a pentester into a customer's environment, they're given a lot of offensive security autonomy within that environment. Um and we know that we need to know they're going to be safe. They're handling a lot of sensitive data within those client environments. um we need to make sure that they can highlight the risk without actually causing undue harm, unnecessary harm within that environment. So how do we transfer that over to to AI I think is the more um the more pertinent question and I think that safety of AI pentesting is the next frontier because I said everyone knows that they can find vulnerabilities. I think this is the next stage on from that. Awesome. because we uh like to do things at the frontier. I want to ask Abishek uh you know what sort of guardrails do we have built in uh that ensure safety of pentesting or pentesting safety? I think I would start by saying that autonomous AI agents are generally always non-deterministic and even more so when they are operating in high-risk and high responsibility environments or activities such as pentesting that shinobi agents do. So we can never rely on just an agent behaving and that's why every test runs inside a set of let's say independent guard rails that constrain what agents can physically do regardless of what the model decides or just has its mood today for or let's say an attacker is trying to influence agents into doing. Awesome. So are guardrail things on two levels. One would be the agent level or the agent layer wherein comes the behavioral contract. There's the dual LM systems and then on the platform level that sits outside the agents as the deterministic rules and guards surrounding the whole uh system. Awesome. So we've got these this concept of guardrails that are deterministic implemented at like at the network level and then we've got these behavioral guardrails that work at the agentic level. Awesome. Uh right I'm going to go through the questions that we are typically asked like the first one is one of my favorite uh sure you're asking us to run pent tests in development or staging. uh it's less risky than doing it in production, but most dev and staging environments are still shared between teams. So if you knock it off, if you knock one off um or there's downtime in the dev and staging environments means our development and test teams can't do their work. So what controls do we have in Shinobi that even if it operates in like a dev and staging environment, it's mindful uh and doesn't cause an outage. Mhm. So I would again I would say most of my answers would be uh in the same structure of both agent layer and the platform layer. So on the agent layer we have this environment awareness when a scope is created for a test. Shinobi is aware what environment it is testing for. And that doesn't just mean that productions are kept safe and devs are open for bazookas, but it still means that agents are aware that things that are identified as sensitive or fragile like bulb deletes, destructive rights, major replaces, stateful workflows or anything else that is let's say a high blast radius. uh those things are treaded with caution in those areas. those actions are taken uh with more safety in mind for the agents on a behavioral level. And then obviously on the network level on the platform controls we have two major things for this. One is the DOS controls that means any and all activity that could lead to denial of service in any manner is controllable from scope. So as long as the system is known that it cannot take that that activity would never touch the system the target system and second the more uh controllable layer is rate limits on all the network traffic that is hitting the target. So that is also controlled by a transparent proxy that sees each and every bit and bite of the traffic that goes from Shinobi's browser the terminal every tool it runs all passes through the same proxy. So requests are always controlled uh and moderated or throttled so that it never affects the target in that way also right and I think that's the crucial point right the all the parameters that you describe are controllable by the user via the scope yes so it is entirely up to the user's risk appetite that if they want to do more riskier testing in devon stage they have the option of doing it or if they want to keep devon stage testing as safe as production they can do that as well via the scope isn't it uh the next one is domain control so uh couple of weeks ago I was just like doing this uh like a small test against one of our one of the apps in our dojo I was using claude code to see how it would perform out of the out of the box and I told it specifically like this is the target application. Test this application to find vulnerabilities. Give it a bunch of skills. And there was a point where it identified like a chatbot in the application. The chatbot was hosted in a completely different domain. But out of the box, the default model in cloud code, it decided uh or it took the liberty of going and testing that chatbot. Now, if this was a production test or customers test, that would be a complete scope violation. So can you talk a little bit about like how do we make sure that only the domains that the user wants tested or the customers want to test show only tests those domains. So that again comes back to the agent level where the behavioral guard or the rule of engagement the contract layer is set. Um when filling the scope for the test or even mid test, the user always is in control on deciding which subdomains, which parts are in scope for testing, which are let's say dependencies of the application that the agents can still reach out and use successfully for understanding and testing the application but out of scope for testing. Then there are there is a third layer that is totally out of scope that is blacklisted and agents cannot even touch that all network traffic going to that is dropped by the proxy. So the dual LM architecture the test marshals who act as gates or wardens to all the activity that is happening. They make sure that no activity that is harmful or let's just say all testing activity is confined to in scope targets and nothing that is touching outside of scope is allowed to pass and even if it does then it gets dropped on the network layer. So I guess that brings us to this uh figure right that compared to a base model like clot code being used for pentest how much it can uh stray out of scope but shinobi's guardrails drop those numbers to zero in an average over multiple tests. Nice one. So the network guardrails are uh or the deterministic guardrails they're enforced based on the user scope as well isn't it? So whatever domains they specify in scope um only those domains will be contacted or attacked as part of the test and obviously there is the reconnaissance and application mapping stage because we see this quite often where the users will enter one domain but ultimately their application depends and relies on a bunch of extra domains as well. So Shinobi gives them a chance to select which ones are relevant uh which ones can be attacked, which ones can be like spoken to or communicated with just so that the application works. Um but it won't attack those domains. So ultimately which domains are attackable, which ones can be uh communicated with or spoken to uh are all uh up to the user to select, isn't it? And then they once the user has selected them they get enforced um at the proxy level. Yeah. And even going one step beyond that even during testing process if something is discovered some new domain or some new portion is detected even then uh the agents are designed to ask the user a permission first before deciding which bucket to put this into. So it even goes to that extent. culture. For the uh next part, I'll start with like one of my favorite horror stories. Um you know like uh all sometimes even intended functionality can have undesired consequences or unintended well yeah undesired consequences like this one time when I was uh or it was actually one of my first application pentest ever and I decided to crawl this uh insurance policy administration uh application with an admin credential and the crawler ended up deleting all of the policies in the platform. Crawlers are dumb, right? Uh so it repeatedly kept calling the delete endpoint and kept iterating through the various ids and ended up deleting all the policies on the platform. Obviously I was excused at that point. It was one of my first pentests ever. Um but how do we get Shinovi and I'm sure Dave you've got some horror stories to share also on this topic. Uh very similar ones. Yeah. Yeah. Where where like the functionality is totally legit but like a human pentester wouldn't ever use it or or even if they use it they will use it in a uh in a safe way that doesn't cause an outage or doesn't accidentally end up deleting data. So the question is like how do we get our agents to recognize functionality like that and either either completely avoid it or use it in a safe way. Say it again comes back to just the mere difference of crawlers being dumb and agents being smart. So they are capable enough to identify what is uh something that should be kept safe. What is something that should not be uh rained fire on so that you don't lose data, you don't drop databases, you don't take the whole application down or any such scenario. And then again going onto the same path that we can't totally depend on the agents moods. So test marshals enforce that and obviously the other scope guards also build that contract in such a way that such actions are always always uh put out of the activity. Even if even if the agent somehow ended up trying to do it that does not happen and does not go through. Alo is there any data related to this that you can share something related to the behavioral guardrails? Um yeah, so this is a comparison on uh again the same baseline of cloud code versus shinobi on multiple test applications in our dojo. Uh the restricted payloads here means anything that was going into destructive actions that was anywhere near uh deleting user data or dropping entire uh storefront policies or something like that. So cloud code even after getting the same scope getting the same instructions still ends up doing that because it is not designed to be mindful of rules of engagement in a pentest scope and it will still end up writing codes and commands and scripts that would uh crawl to those pages or use even if it you if it's using a browser it uses the browser in such a way that uh it does not differentiate and it just ends up trying everything. So when it ends up trying everything on an admin console that is it ends up trying to delete the users, delete the database, clear it up, reset all the passwords and everything else that it could do. Uh that's where Shinobi's guardrails play a role that any such restricted payload that could end up uh having any sort of adverse impact on the target system which is bad even for testing purposes gets dropped out and uh that keeps the application and the environment safe. Yeah. And I might sound like a broken record, but just because I finished documenting our scoping process, uh I wanted to like give a shout out to that part in the scoping process where the user is again fully in control of what parts of the application they want the agents to focus on. So or not focus on. So you know if there's like a part of the admin portal where you know that data can be deleted you can actually state that in um in in the scope itself that hey there's this part of the application that will delete all the policies don't go there don't test that. Now the agents are trained to obviously like you said um behave and operate safely anyways but this is just like a uh this is for added comfort or extra peace of mind that if you know explicitly there are parts of the application you don't want the agents touching or interacting with you can uh share that in the scope as well. Cool. Um then traceability. So one of the questions we get off asked often is like let's say how can we identify uh the attack traffic or shinobi's activity when it's doing a pentest. So what sort of stuff do we have built in that makes it easy for a customer sock teams or the development teams to investigate shinobi's activities. Mhm. So this is a very basic platform level capability that you can just designate custom headers or user agents to to your scope and all the traffic originating from Shinobi then carries those headers. So you can easily monitor and attribute traffic on your network and VAF and seam by filtering on shinobi headers. And even on top of that all the request data or the application data that shinobi creates is also attributed with uh shinobi labels. So outside of traffic even on the application surface you can identify the accounts and the data and the pages or whatever is created by shinobi during the testing time. Awesome. And obviously I got to give a shout out to the kill switch as well, right? like in and inherently pentesting is a risky thing but if for whatever reason you attribute something to shobi or uh you know sometimes we notice like uh applications would have a downtime or uh they would experience an outage uh and you want to stop all pen testing activity then there's a kill switch in the uh in the platform itself like you can hit it and all testing activity stops immediately Yep. I think that's that's the best thing because are always going to get the blame. No matter what happens on the network during a panest, it's the pentest that get Oh, yeah. Exactly. Uh Dave, do you want to talk a little bit about the intrusiveness of uh Shinobi's exploits and its tactics? Like how intrusive is it? Yeah. So obviously a core part of pentesting is to be able to show what is the impact of the issues that we're finding and as as a human you want to go as far as possible. Hold on a second. Yeah. So during a pentest, you want to go as far as you can with uh what you believe to be exploitable in order to show the full true impact and the true um risk associated to that organization. Now obviously within the bounds of doing agentic testing, you don't just want to release your agent into the network going off wild. So um what we've done is look at any infrastructure that supports the running of that application is within the scope for Shinobi. So if it can identify that it's got an rce and it knows that it's in a dock container it will try doing docker breakouts. um if it's understanding that it's in an AWS environment, it will try and um identify any areas of that AWS environment that it can reach, but it won't go any further than the infrastructure that's just supporting that application. So, it's not going to go off and start end mapping your internal infrastructure and then going off towards your, you know, AD controllers and stuff like that. it's purely um focused on if this infrastructure is supporting this application, I need to know about it and so does the organization. And I think that's the the best way to go about that until we until we start layering on our internal network pen testing. Um but within the scope and the bounds of that application test that is uh that's the boundary for the the agents and where they draw the line how far they can go and what they can do. Exactly. I mean, and and that's the thing about pentesting, right? It's it has to strike that balance, right? Which it has to be more than just a vulnerability scan because it has to demonstrate impact. And to demonstrate impact, you have to show or you have to be a bit more intrusive than what a vulnerability scan would be, which you can't be as intrusive as say like a full-on threat team, isn't it? Yeah. Correct. Awesome. That's pretty much it. Um I guess uh Abishek if uh any of our customers or anyone who's interested would want to see receipts on all the testing that we do internally to prove the agents are safe and to ensure the guardrails are working obviously they can reach out to us right awesome. Anything else, guys? Or it's a wrap.

In the world of cybersecurity, one question is increasingly critical: "How safe is AI when it comes to penetration testing?" As AI technologies like Shinobi evolve, understanding their safety protocols becomes essential for organizations looking to leverage these tools. In this post, we'll break down insights from experts in the field, including how AI can conduct tests safely while managing risks effectively.

Understanding AI in Penetration Testing

AI's role in penetration testing has shifted from merely identifying vulnerabilities to focusing on how it can do so safely. As David, a seasoned expert in training AI for pen testing, highlighted, the core question isn't whether AI can find vulnerabilities; it's about how safely it can operate within sensitive environments. This shift in perspective is crucial for organizations aiming to protect their data while utilizing cutting-edge technology.

The Importance of Safety in AI Pen Testing

The safety of AI in penetration testing hinges on several factors:

Risk Management: AI must be able to highlight risks without causing undue harm to the client's environment.
Frameworks and Protocols: AI should operate within established frameworks similar to human pen testers, ensuring it adheres to safety guidelines.
Guardrails: Implementing strong guardrails helps manage the AI's autonomy, keeping operations within safe boundaries.

Implementing Guardrails for Safe Testing

Abhishek, responsible for creating safety protocols for Shinobi, explains two levels of guardrails: agent-level and platform-level.

Agent-Level Guardrails: These include behavioral contracts and dual-layer machine learning systems that guide AI actions. For instance, agents are designed to recognize sensitive operations, such as bulk deletes or destructive writes, and treat them with caution.
Platform-Level Controls: These include deterministic rules that govern the overall system behavior, ensuring that all activities remain within the defined scope, thus preventing accidental damage to critical data.

Ensuring Safe Operations in Development Environments

Testing in development or staging environments presents unique challenges. Even though these settings are less risky than production, they often involve shared resources. Abhishek shared that Shinobi is designed to be aware of its testing environment, ensuring that sensitive operations are handled with care. This includes:

Environment Awareness: Ensuring that the AI recognizes whether it is operating in a production or development environment.
Scope Control: Users can define what is in scope for testing, allowing flexibility and safety based on risk appetite.

Handling Domain Control and Scope Violations

When conducting tests, it's vital that AI only interacts with specified domains to prevent scope violations. Abhishek described how Shinobi uses a dual-layer architecture to enforce these rules:

Behavioral Contracts: Users specify which domains are in scope, and the AI is programmed to adhere strictly to these parameters.
Network Traffic Management: All requests are monitored and controlled, ensuring that any out-of-scope attempts are blocked.

The Role of User Control in AI Testing

One of the critical aspects of Shinobi's operations is user control. Users can designate specific parts of an application they wish to test or avoid, adding an extra layer of safety. Abhishek mentioned that the AI is also designed to request permission if it encounters new domains during testing, ensuring that user choices are respected.

Lessons from Real-World Testing

In a discussion about potential pitfalls, Varun shared a cautionary tale from his early pen testing days, where a crawler unintentionally deleted all policies in an application. This underscores the importance of ensuring AI systems can differentiate between legitimate functionality and potentially harmful actions. Abhishek reassured that Shinobi is designed to recognize such risks and avoid destructive actions through built-in guardrails.

Traceability and Accountability in Testing

For organizations concerned about traceability, Shinobi includes features that allow users to monitor AI activity during tests. Custom headers and user agents can be designated for all traffic, making it easier to attribute actions to Shinobi in network logs. This capability is essential for teams wanting to investigate any issues post-testing.

Conclusion

AI is transforming penetration testing, but with great power comes great responsibility. By implementing robust guardrails, ensuring user control, and maintaining transparency, organizations can leverage AI safely and effectively. The insights shared by industry experts highlight that while AI can enhance pen testing capabilities, its safe operation is paramount to protect sensitive environments. If you're looking to explore AI-powered pen testing further, consider reaching out to experts in the field for guidance.

Frequently Asked Questions

What are the main safety concerns with AI in penetration testing?

AI must operate within defined scopes, recognize sensitive actions, and avoid causing harm to client environments.

How does Shinobi ensure it only tests specified domains?

Shinobi uses behavioral contracts and strict network traffic management to enforce user-defined scopes, preventing out-of-scope testing.

Can users control what Shinobi tests?

Yes, users can specify which parts of an application are in scope for testing, ensuring control over the testing process.

Author

Varun Uppal

Founder

Table of Contents

Compliance

Geolocation

Bug of the Week: Betting on Compliance Controls That Stopped at the Front End

A licensed sportsbook accepted bets carrying a DENIED geolocation verdict. The SDK worked. The token was signed. The backend just never enforced the decision.

David Mound

9 min read

Prompt Injection

Compliance Bypass

Bug of the Week: How a 30-Word PDF Approved Unsafe Products for Shipment

A crafted PDF containing 30 words of plain text bypassed the entire product compliance validation pipeline at a global retail organisation. The LLM read the document, accepted the injected instructions as fact, and approved non-compliant products for shipment.