Published on 2026-03-17 • 6 min read

A Pentest That Skips MFA Is Fiction

How Shinobi's AI agents handle email and SMS authentication without weakening your security posture.

Author

Priyank Gupta

Founding Software Engineer

Table of Contents

A penetration test is only as valid as the environment it tests. If the environment is modified to accommodate the testing tool, the results describe a system that doesn't exist in production. They are, by definition, fiction.

Multi-factor authentication is the most common place this compromise occurs. AI-powered pentesting is a new field, and the tooling is still catching up to the complexity of real-world authentication. When an automated agent encounters MFA, the easiest answer is to ask the client to disable it on test accounts. This produces a test environment that no real user and no real attacker will ever encounter. Every finding downstream of that decision is tainted by a false premise.

Shinobi takes a different position: the agent must operate in the production-equivalent environment, MFA included. We engineered the infrastructure to make this possible.

The Dependency Problem

Autonomous pentesting requires zero human intervention during execution. MFA based on time-based one-time passwords (TOTP) satisfies this constraint. TOTP is a deterministic function of a shared secret and the current timestamp. The agent computes it programmatically. Shinobi supports TOTP-based MFA natively, and it has never been a blocker.

Email OTPs, SMS codes, and magic links are a different problem. They require receiving a message on an external channel. When Shinobi's agent encountered these during an engagement, it had no channel to receive on. The OTP would arrive in the inbox or phone of someone on the client's appsec team. The agent would halt and wait for that person to retrieve the code and relay it back.

This introduced delays measured in hours when the person was unavailable, and days when engagements spanned weekends. A single authentication event could stall an entire engagement by a week or more.

Why Disabling MFA Invalidates the Test

The intuitive fix is to remove MFA from test accounts. This eliminates the dependency but introduces a more serious problem: it removes a real attack surface from the scope.

MFA implementations are themselves targets. They contain logic that governs OTP expiration, brute-force rate limiting, magic link reuse, session issuance post-authentication, and token validation across channels. These are common sources of vulnerabilities in production applications.

A pentest conducted without MFA cannot evaluate any of this. It produces a report that is structurally incomplete: the client receives assurance about an authentication layer that was never tested. This is worse than a gap in coverage. It is a false negative at the architectural level.

The principle is straightforward. A security test that requires weakening the target's defenses to function is not a valid security test.

The Solution: Agent-Owned Communication Channels

Rather than asking clients to disable MFA, we inverted the problem. We gave the agent its own communication infrastructure, its own email addresses and phone numbers, so it could receive and process OTPs, verification emails, and magic links without human involvement.

The agent doesn't bypass MFA. It completes it.

How It Works

The implementation has two layers: infrastructure provisioning and agent integration.

On the infrastructure side, we provision dedicated email addresses through AWS SES and dedicated phone numbers through Twilio. These are not shared across engagements. Each test account gets its own channel, isolated to that engagement.

On the agent side, these channels are exposed as tool calls. When the agent triggers a login and the application sends an OTP to the provisioned email, the agent calls a tool to check that inbox, extracts the code, and submits it. The same mechanism handles SMS codes and magic links. The agent clicks the link, follows the redirect, and continues. There is no polling delay, no human relay, no context switch. The authentication flow completes in seconds.

Agent logging in using the MFA tool

The scoping layer ties this together. During engagement setup on Shinobi, the client indicates their MFA type. If they select email or SMS-based MFA, we provision the channels and the client creates test accounts in their application using the Shinobi-controlled email or phone number. If the application requires email verification or a link-based confirmation at registration, the client selects that during scoping as well, and the same infrastructure handles it.

By the time the agent begins the engagement, every authentication flow in the target application is within its operational reach. No human touchpoint. No modified environment.

Conclusion

The validity of a penetration test is determined by the fidelity of the test environment to production. Every modification made to accommodate tooling is a deviation from reality, and every deviation introduces the possibility of false assurance.

MFA is the most visible example, but the principle extends to any human-dependent authentication flow: email verification, link-based login, SMS challenges. Shinobi's approach is to equip the agent with the infrastructure to operate within these flows natively, rather than asking the client to remove them.

The agent operates in the real environment. The findings reflect the real attack surface. The report describes the actual system.

A security test should never require weakening the thing being tested.

Want to see how Shinobi handles your authentication flows? Book a demo.

Author

Priyank Gupta

Founding Software Engineer

Table of Contents

Prompt Injection

Compliance Bypass

Bug of the Week: How a 30-Word PDF Approved Unsafe Products for Shipment

A crafted PDF containing 30 words of plain text bypassed the entire product compliance validation pipeline at a global retail organisation. The LLM read the document, accepted the injected instructions as fact, and approved non-compliant products for shipment.

David Mound

10 min read

Fix Verification

Exploit Replay

One-Click Fix Verification

How long a developer waits to find out whether their patch actually worked directly determines how long their software stays vulnerable. Shinobi shrinks that window to a single click.

Varun Uppal

7 min read