Skip to main content
ClaudeWave
Slash Command304 repo starsupdated 2d ago

mantishack

The `/mantishack` command executes a comprehensive autonomous penetration test combining deterministic static analysis (Semgrep, CodeQL, SCA, dataflow across stages 0 through F) with a parallel adversarial red-team war-game using seven distinct attack personas that validate findings and construct end-to-end kill chains. Use it to generate a complete Red Team Report for authorized targets by running `/mantishack <repo-path | git-url | host>` with optional flags for scope, depth, and exploit generation, requiring explicit authorization confirmation before execution on remote systems.

Install in Claude Code
Copy
mkdir -p ~/.claude/commands && curl -fsSL https://raw.githubusercontent.com/deonmenezes/mantishack/HEAD/.claude/commands/mantishack.md -o ~/.claude/commands/mantishack.md
Then start a new Claude Code session; the slash command loads automatically.

mantishack.md

# /mantishack — Maximal Autonomous Red-Team Pentest

The single most powerful entrypoint in the framework. **One target in, a kill-chain Red Team
Report out.** It fuses the deterministic `/agentic` + `/validate` pipeline (Semgrep + CodeQL + SCA +
dataflow + Stages 0→F) with a **parallel red-team agent war-game** — seven adversarial personas
spawned via the Task tool, each attacking the target through a different lens — then adversarially
verifies every finding to kill false positives and stitches confirmed bugs into end-to-end attack
chains.

```
/mantishack <repo-path | git-url | host> [--scope "..."] [--authorized]
                                          [--deep] [--relentless] [--rounds N]
                                          [--model M ...] [--consensus M] [--judge M]
                                          [--binary] [--exploit] [--patch]
```

> **Load the `redteam-hunting` skill before Phase 1** — it is the continuous-loop engine that drives
> the war-game to *convergence* (keep attacking until consecutive rounds find nothing new AND the
> attack surface is fully covered), so nothing is left on the table.

Nothing is applied to the target — every artifact is generated under `out/`. Exploit PoCs and
patches are only produced with `--exploit` / `--patch`, and exploitation is never *run* without an
explicit confirmation.

---

## ⛔ Authorization gate (MANDATORY — do this first)

This command is offensive. Before Phase 0, confirm authorization **in this conversation**:

- The target must be one the user owns or is explicitly authorized to test (written scope, bug-bounty
  program, internal asset). If `--authorized` is absent and the target is a remote host/URL, **ask
  once** for confirmation and the scope, then proceed.
- Record the scope string in the run header. Treat anything outside it as out-of-bounds and refuse to
  touch it. If the target is a local repo path, authorization is assumed (it's the user's code).

---

## Phase 0 — Recon & seed corpus  *(mechanical, parallel)*

Build the attack-surface map and a high-recall seed-finding corpus that the red-team agents sharpen.

1. **Scan + audit** — run the deterministic pipeline; this owns scanner orchestration and selection:
   ```bash
   libexec/mantishack-agentic --repo "$TARGET" --understand            # Semgrep + CodeQL + auth/logging audit + dedup + prep
   ```
   `--understand` makes it emit `context-map.json` (entry points, trust boundaries, sinks) as a sibling
   run — both the agentic checklist and the Phase 2 validator pick it up via the bridge.
2. **SCA** (if a manifest exists): `/mantis-sca` for vulnerable/compromised dependencies.
3. **Web surface** (if target is a URL): `/mantis-web` to crawl links, forms, params, JS endpoints.

Output of Phase 0: `autonomous_analysis_report.json` (seed findings with `code`, `dataflow`,
`feasibility`) + `context-map.json` (the map). Hand **both** to every red-team agent below — they are
the agents' starting corpus, **not their ceiling.**

---

## Phase 1 — 🐲 RED-TEAM WAR GAME  *(parallel agent swarm — the core power feature)*

Spawn the **ten hunting** personas concurrently via the **Task tool**, in a single message (they run
in parallel). Each gets: the target path, the Phase 0 seed corpus, and `context-map.json`. Each is a
different attacker mindset and finds the bugs deterministic scanners structurally cannot (logic flaws,
broken authorization, trust-assumption breaks, multi-step chains). Skip any whose surface the target
lacks (e.g. no CI config → skip `supply-chain-saboteur`):

| Persona (Task subagent) | War-game lens | Primarily surfaces |
|---|---|---|
| `threat-actor-wargame` | "build the cheapest kill chain to the crown jewels" | initial-access → privesc → impact paths |
| `insider-betrayal-sim` | "a trusted user / dependency turns hostile" | IDOR / BOLA / BFLA, privesc, supply-chain hooks |
| `single-point-of-compromise` | "where does ONE bug = total compromise" | secret stores, auth middleware, deserializers, SSRF egress |
| `threat-landscape-shift` | "what emerging attack breaks today's defenses" | desync/smuggling, dep-confusion, prompt-injection & tool-abuse |
| `assumption-pressure-test` | "attack every implicit trust assumption" | confused-deputy, parser differentials, mass-assignment, 2nd-order injection |
| `llm-agent-abuse` | "coerce the AI/agent surface" | prompt injection (direct + indirect/RAG), tool-call hijack, model-output → eval/SQL/shell, secret leakage |
| `workflow-abuse-economist` | "abuse the business logic, not the bug" | price/coupon/quota/refund tampering, free-trial re-abuse, state-machine skips |
| `federated-identity-breaker` | "break the SSO handshake, not the JWT" | OAuth redirect_uri/state theft, PKCE downgrade, SAML XSW, account-linking takeover |
| `http-edge-desync` | "make two HTTP hops disagree" | request smuggling (CL.TE/TE.CL/CL.0), cache poisoning, cache deception |
| `supply-chain-saboteur` | "own the build, own everything" | poisoned-pipeline execution, runner secret exfil, dependency confusion, container escape |

Each persona returns findings in the standard block (`## [SEVERITY] … Location / Type / Attack vector
/ Impact / PoC / Reachability / Remediation`).

**Continuous loop until converged** — run by the `redteam-hunting` skill (load it now). Maintain a
coverage ledger under `$OUTPUT_DIR/hunt/`: every attack-surface unit (source, sink, route,
trust-boundary, auth-check, deserializer, secret), seeded from `context-map.json`, tagged
`unexplored / explored / finding`. Each round:

1. Prioritize `unexplored` units (crown-jewel-adjacent first) and re-spawn the hunters against them,
   passing the `findings` + `dead_ends` ledgers as exclusion lists (never re-chase a disproven lead,
   never re-report a dup).
2. **Rotate the attack lens** each round (kill-chain → trust-flip → chokepoint → assumption →
   differential → variant → chain → emerging) so a single blind spot can't hide a bug all run.
3. Merge new findings (dedup
api-abuse-fuzzerSubagent

Use this agent when the target is a LIVE REST or GraphQL API you are authorized to test and the question is "can I tamper request bodies, headers, ids, and tokens to read or act on data that isn't mine?" — active, request-driven abuse of the API contract, not static code review. It drives REAL HTTP at the endpoints: BOLA/IDOR object-id enumeration (increment/swap/UUID-shuffle the id and diff the access decision), broken function-level authz (replay an admin verb/path with a low-priv token), mass-assignment (inject role/is_admin/is_verified/owner_id into the JSON body), excessive-data-exposure (the response over-returns fields the UI never shows), GraphQL introspection + alias/batch amplification + nested-query DoS, content-type and HTTP-verb tampering (POST→PUT/PATCH/DELETE, application/json→text/plain→x-www-form-urlencoded), JWT/session/token swap across two users, and rate-limit / idempotency-key bypass. It proves every finding with a behavioral oracle — a status/length/timing/field-set diff between the authorized baseline and the tampered request — never a guess. Prefer this agent over a code reader when you hold a base URL or a schema and want to mutate live traffic methodically.\n\n<example>\nContext: The user has a running API with numeric resource ids and two test accounts.\nuser: "Here's our staging API at https://api.staging.acme.test and tokens for user A and user B — can user A read user B's orders?"\nassistant: "That's textbook BOLA: same endpoint, swap the object id (or the bearer token) and diff the access decision. I'll use the Task tool to launch the api-abuse-fuzzer agent to enumerate /orders/{id} with A's token against B's ids and prove the cross-tenant read with a status + ownership-field oracle."\n<agent_launch>\nDelegating to api-abuse-fuzzer: a live authorized API + two tokens + object-id enumeration is its core BOLA/IDOR mission.\n</agent_launch>\n</example>\n\n<example>\nContext: The user exposes a GraphQL endpoint and isn't sure introspection or query batching is locked down.\nuser: "Our /graphql is behind auth but I want to know if a low-priv user can pull admin fields, brute force via aliases, or knock it over with a deep nested query."\nassistant: "GraphQL abuse surface: introspect the schema, alias-batch a login/lookup to bypass per-request rate limits, and send a bounded cyclic nested query as a timing oracle. I'll launch the api-abuse-fuzzer agent to tamper the operation and measure the depth/timing oracle."\n<agent_launch>\nDelegating to api-abuse-fuzzer for GraphQL introspection, alias/batch amplification, and nested-query DoS against the live endpoint.\n</agent_launch>\n</example>\n\nProactively suggest using this agent when: a live base URL + an OpenAPI/Swagger/GraphQL schema (or a captured request) is in hand and the target is authorized in-scope; endpoints take a resource identifier in the path/query/body (/users/{id}, ?account=, {"order_id": ...}) — BOLA/IDOR territory; the user holds 2+ accounts or tokens (low-priv + high-priv, tenant A + tenant B) to run an authorization differential; there are admin/privileged verbs (DELETE, PUT /admin/*, role-changing mutations) and you want to hit them as a non-admin; a write endpoint accepts a JSON object — test mass-assignment of role/is_admin/verified/balance/owner_id; a /graphql endpoint exists (introspection, alias/batch abuse, nested-query DoS, field-level authz); or the user mentions rate limiting, coupon/OTP brute force, idempotency keys, BOLA, BFLA, mass assignment, or "excessive data exposure".

assumption-pressure-testSubagent

Use this agent when a codebase, PR, or service needs its IMPLICIT TRUST ASSUMPTIONS enumerated and attacked — every place the code silently trusts a header, an "internal-only" route, an "already-validated-upstream" input, an ID that "belongs to the caller," or a value that "can't be null/negative." It specializes in confused-deputy (CWE-441), mass-assignment (CWE-915), input-validation gaps (CWE-20), and second-order/stored injection (CWE-89/CWE-79). This is the agent for trust-boundary audits, multi-service request paths, ORM/serializer review, and proxy/gateway/header-forwarding code.\n\n<example>\nContext: User has a multi-service backend where an edge proxy forwards auth context downstream.\nuser: "Our gateway sets X-User-Id from the JWT and the internal services read it to authorize. Can you check this?"\nassistant: "This is a classic confused-deputy / trust-boundary surface — the downstream services trust a header they can't verify. I'll use the Task tool to launch the assumption-pressure-test agent to enumerate every internal endpoint that reads X-User-Id and prove whether an attacker can reach it directly and forge it."\n<agent_launch>\nDelegating to assumption-pressure-test: the request hinges on an implicit 'this header is trusted because the gateway set it' assumption that must be attacked.\n</agent_launch>\n</example>\n\n<example>\nContext: User just merged an ORM update endpoint.\nuser: "Here's the new profile update handler: User.objects.filter(id=request.user.id).update(**request.data)"\nassistant: "That `**request.data` spread is a mass-assignment sink — it trusts that the request body only contains the fields you intended. I'll launch the assumption-pressure-test agent to map which model columns (is_admin, balance, role) become attacker-writable and confirm reachability."\n<agent_launch>\nDelegating to assumption-pressure-test for the CWE-915 mass-assignment and the implicit 'the body only has safe fields' assumption.\n</agent_launch>\n</example>\n\nProactively suggest using this agent when:\n- Code reads request headers (X-Forwarded-For, X-User-Id, X-Real-IP, X-Internal-*, Host) for trust or authorization decisions\n- A serializer/ORM uses bulk binding: `**req.body`, `Object.assign`, `ModelMapper`, `BeanUtils.copyProperties`, `update_attributes`, `params.permit!`\n- Comments or names assert trust: "internal only", "already validated", "trusted", "comes from gateway", "sanitized upstream"\n- Data is stored then later concatenated into SQL/HTML/shell (second-order injection)\n- An endpoint takes an `id`/`uuid`/`account`/`order` param that maps to a resource (IDOR / object ownership)

coverage-analyzerSubagent

Generate gcov coverage data for a code repository.

crash-analysis-agentSubagent

Analyze security bugs from any C/C++ project with full root-cause tracing

crash-analyzerSubagent

Analyze crashes using rr recordings, function traces, and coverage data to produce root-cause analyses.

crash-analysis-checkerSubagent

Carefully analyze root cause analysis reports for crashes to make sure they are correct

exploitability-validator-agentSubagent

Multi-stage pipeline to validate vulnerability findings are real, reachable, and exploitable

federated-identity-breakerSubagent

|