Vexilium is a black-box AI pentester. Point it at any web app or API, and an ensemble of two engines (a deep single-session agent and a parallel red team) returns verified exploits. Every finding ships with a runnable curl. Source code never leaves your perimeter, because it's never touched.
… on the waitlist · No spam. We email once when your slot opens.
Real run, sped 10×. Captions are diegetic: the agent's own reasoning, payloads, and verifier outcomes. Nothing is cherry-picked or replayed. This capture is the deep single-engine run; the full ensemble adds a parallel red team for the numbers below.
Where the coverage comes from
Neither engine alone finds the full set. The deep agent's persistent session catches stateful, chained bugs (server-side JS injection → RCE), while the red team's parallel specialists catch breadth (SSRF, NoSQL injection, broken access control on endpoints the deep agent never reached). The ensemble reports their deduplicated union.
Same target. Same conditions. No human config.
| Tool | Target | Runtime | Verified active exploits |
|---|---|---|---|
| Vexilium · ensemble | OWASP Juice Shop | ~10 min · two engines | 11 |
| Vexilium · ensemble | OWASP NodeGoat | ~10 min · two engines | 8 |
| OWASP ZAP baseline | Juice Shop | ~5 min | 0 (10 passive) |
| Nuclei v3.8 · 6,618 templates | Juice Shop | 7m 32s | 0 (1 passive) |
Industry-standard scanners (same network, same machine, default config) found zero active exploits. Across 14,000+ requests and 6,618 templates, Nuclei matched a single passive metrics endpoint. Vexilium's ensemble landed 19 working exploits across two unrelated web stacks, spanning auth bypass, SQLi, SSRF, NoSQL injection, IDOR, CSRF, and server-side JS injection → RCE.
Recon and auth run once and share a session. Then two engines attack the same surface from different angles, one for depth, one for breadth, and a critic plus a deterministic verifier merge what survives into a single deduplicated report.
A recon agent fingerprints the framework and enumerates the API
surface; an auth agent establishes one session and shares it on a
blackboard. Every request is scope-locked to the target's
host:port at the tool layer.
A single deep-session agent chains multi-step attacks down one path; a red team of specialists sweeps the surface in parallel. Different angles, same target: the union is larger than either alone.
A critic adversarially challenges every candidate; a per-class verifier rejects anything without observable exploit evidence. Survivors merge into one report, tagged by which engine found each.
The moat is the data.
Every customer run logs (target_fingerprint, payload, verifier_outcome). Binary signal,
not noisy chat traces. That dataset trains specialist 7B–32B models that beat frontier
on cost and latency for this narrow task. Open-source tools can replicate the agent;
they cannot replicate the dataset.
Hover to disturb the surface. The deep agent drills one path to the core; the red team sweeps the perimeter in parallel. Together they leave no facet untested.
' OR 1=1--
on /rest/user/login returns a valid admin JWT.
Role admin, full access.
UNION SELECT on /rest/products/search
leaks the full Users table: emails, MD5 hashes, roles for all 23 accounts.
/research?url= proxies any URL through
the server. Internal recon, data exfil via internal endpoints, full
blind-SSRF surface.
/contributions accepts forged POSTs
with attacker Referer and no CSRF token. The agent quoted the
commented-out fix from the page source as evidence.
Requires source-code ingestion. Excludes external pentest, vendor-risk review, bug-bounty research, and every regulated buyer that can't ship source to a US AI vendor.
Enterprise SaaS sells reports to security teams. We ship verified exploits with curl reproducers into developer workflow: PR-integrated, per-commit, continuous.
Template-based scanners find missing headers and metrics endpoints. They cannot bypass auth, exfiltrate data, or chain attacks. We consistently do all three.
We're rolling out to a small set of design partners first. Join the waitlist and we'll reach out before public launch, usually with a personal walkthrough scheduled to your stack.