Skip to main content
ClaudeWave
Slash Command304 repo starsupdated 2d ago

mantis-exploit

The mantis-exploit Claude Code command generates working proof-of-concept exploit code for identified security vulnerabilities by analyzing SARIF findings from the preceding scan. It produces exploit implementations in languages like Python and C, saving them to designated output directories. Use this command after running the scan step to generate concrete, executable demonstrations of discovered vulnerabilities without producing patches.

Install in Claude Code
Copy
mkdir -p ~/.claude/commands && curl -fsSL https://raw.githubusercontent.com/deonmenezes/mantishack/HEAD/.claude/commands/mantis-exploit.md -o ~/.claude/commands/mantis-exploit.md
Then start a new Claude Code session; the slash command loads automatically.

mantis-exploit.md

# /exploit - Generate Exploit PoCs (beta)

Generate working exploit proof-of-concepts for vulnerabilities.

**Requires:** SARIF file from previous /scan, or identified vulnerabilities

**What it does:**
- Analyzes findings with LLM
- Generates working exploit code (Python, C, pwntools)
- Saves to out/*/exploits/
- Does NOT generate patches (use /patch for that)

**Run:** `python3 mantishack.py agentic --repo <path> --no-patches --max-findings <N>`

(Pre-fix this listed `--sarif <sarif-file>` but
`mantishack.py agentic` doesn't accept that flag — the
agentic flow always reads the SARIF emitted by the
embedded scan stage. The flag exists only on
`mantishack.py analyze`, a different subcommand. Operators
following the example got "unrecognized arguments:
--sarif" from argparse.)

**Example:**
```bash
/scan test/                    # First, find vulnerabilities
/exploit                       # Then, generate exploits for findings
```

## Pre-check: Existing Validation Data

Run this first to check if `/validate` has already produced feasibility data:

```python
import sys, os; sys.path.insert(0, os.environ["MANTISHACK_DIR"])
from packages.exploitation import exploit_bootstrap
import json

result = exploit_bootstrap(
    target_path=None,    # Source directory, if known
    finding_id=None,     # e.g., "FIND-0003", if user specified one
    binary_path=None,    # Binary path, if known
)
print(json.dumps(result, indent=2, default=str))
```

| status | Action |
|--------|--------|
| `valid` | Use `result["findings"]` directly. Each finding has `feasibility.exploitation_paths` ({technique, target} pairs), `feasibility.verdict`, `feasibility.impact`, and `feasibility.chain_breaks`. This is **richer** than fresh analysis — it includes per-finding paths mapped against binary constraints. Load binary context via `result["context_file"]` if needed. **Skip the fresh analysis below.** |
| `stale` | Source changed since validation. Warn user, offer to re-validate or proceed with existing data. |
| `not_found` / `partial` / `schema_error` | No usable data. **Run the fresh analysis below.** |

## MANDATORY: Run Feasibility Analysis First (Persisted)

**You MUST run this if the pre-check above did not return `valid`.**
**DO NOT use checksec, readelf, or other tools instead - they miss critical constraints.**

```python
import sys, os; sys.path.insert(0, os.environ["MANTISHACK_DIR"])
from packages.exploit_feasibility import save_exploit_context, print_exploit_context

# Run analysis and SAVE to persistent file (survives context compaction)
context_file = save_exploit_context('/path/to/target/binary')
print(f"\n[!] Context saved to: {context_file}")
print("[!] After context compaction, reload with: print_exploit_context('{context_file}')\n")

# Display the analysis
print(print_exploit_context(context_file))
```

**IMPORTANT: Context Persistence**
The `save_exploit_context()` function saves critical data to a JSON file that survives
context window compaction. If the conversation gets long and context is compacted:

```python
import sys, os; sys.path.insert(0, os.environ["MANTISHACK_DIR"])
from packages.exploit_feasibility import print_exploit_context, load_exploit_context

# Reload after compaction - use the path printed above
print(print_exploit_context('/path/to/binary_exploit_context.json'))

# Or load as dict for programmatic access
ctx = load_exploit_context('/path/to/binary_exploit_context.json')
```

**This analysis provides information that checksec does NOT:**
- Empirical %n verification (does it actually work on this glibc?)
- Null byte constraints from input handlers (strcpy can't write full addresses)
- ROP gadget quality (are there enough gadgets to build a chain?)
- Alternative write targets when GOT/hooks are blocked
- Honest difficulty assessment based on all constraints combined

**If you skip this step, you WILL suggest techniques that don't work.**

## CRITICAL: Follow the Mitigation Analysis

The mitigation analysis output contains authoritative information about what works and what doesn't. You MUST:

1. **Check the verdict** - Exploitable, Likely exploitable, Difficult, or Unlikely
2. **Read the chain breaks** - These tell you exactly which techniques are blocked and why
3. **Check alternative targets** - If standard targets (GOT, hooks) are blocked, look at suggested alternatives
4. **Read the "Reality check"** - This gives an honest assessment of practical exploitation

**DO NOT suggest techniques that are listed as blocked.** For example:
- If "%n format specifier disabled" is listed, do NOT suggest format string writes
- If "Full RELRO" is listed, do NOT suggest GOT overwrites OR .fini_array overwrites
- If "hooks removed" is listed, do NOT suggest __malloc_hook/__free_hook overwrites

**CRITICAL: Full RELRO typically blocks BOTH GOT and .fini_array** (standard linker
scripts place them in the same RELRO segment). Many hours have been wasted trying to
write to .fini_array when Full RELRO is enabled. The mitigation analysis will NOT
list .fini_array as an alternative target if Full RELRO is on.

**DO follow the suggested paths.** For example:
- If "alternative_targets" lists .fini_array, focus on that approach
- If "what_would_help" suggests an info leak, develop that first
- If verdict is Difficult, explain the challenges honestly to the user

## Output Structure

When presenting exploitation strategy:
1. State the verdict from mitigation analysis
2. List what IS possible (from "What you CAN still do")
3. List what is NOT possible (from chain breaks)
4. Propose a path using only viable techniques
5. **Always offer next steps** (see below)

## IMPORTANT: Always Offer Next Steps

**Even when verdict is Difficult or Unlikely, offer the user choices:**

For Difficult verdict:
- "Try alternative targets from the analysis (check which ones are actually viable)"
- "Focus on info leaks only (useful for chaining with other vulns)"
- "Run in older environment (Docker with Ubuntu 20.04)"
- "Mo
api-abuse-fuzzerSubagent

Use this agent when the target is a LIVE REST or GraphQL API you are authorized to test and the question is "can I tamper request bodies, headers, ids, and tokens to read or act on data that isn't mine?" — active, request-driven abuse of the API contract, not static code review. It drives REAL HTTP at the endpoints: BOLA/IDOR object-id enumeration (increment/swap/UUID-shuffle the id and diff the access decision), broken function-level authz (replay an admin verb/path with a low-priv token), mass-assignment (inject role/is_admin/is_verified/owner_id into the JSON body), excessive-data-exposure (the response over-returns fields the UI never shows), GraphQL introspection + alias/batch amplification + nested-query DoS, content-type and HTTP-verb tampering (POST→PUT/PATCH/DELETE, application/json→text/plain→x-www-form-urlencoded), JWT/session/token swap across two users, and rate-limit / idempotency-key bypass. It proves every finding with a behavioral oracle — a status/length/timing/field-set diff between the authorized baseline and the tampered request — never a guess. Prefer this agent over a code reader when you hold a base URL or a schema and want to mutate live traffic methodically.\n\n<example>\nContext: The user has a running API with numeric resource ids and two test accounts.\nuser: "Here's our staging API at https://api.staging.acme.test and tokens for user A and user B — can user A read user B's orders?"\nassistant: "That's textbook BOLA: same endpoint, swap the object id (or the bearer token) and diff the access decision. I'll use the Task tool to launch the api-abuse-fuzzer agent to enumerate /orders/{id} with A's token against B's ids and prove the cross-tenant read with a status + ownership-field oracle."\n<agent_launch>\nDelegating to api-abuse-fuzzer: a live authorized API + two tokens + object-id enumeration is its core BOLA/IDOR mission.\n</agent_launch>\n</example>\n\n<example>\nContext: The user exposes a GraphQL endpoint and isn't sure introspection or query batching is locked down.\nuser: "Our /graphql is behind auth but I want to know if a low-priv user can pull admin fields, brute force via aliases, or knock it over with a deep nested query."\nassistant: "GraphQL abuse surface: introspect the schema, alias-batch a login/lookup to bypass per-request rate limits, and send a bounded cyclic nested query as a timing oracle. I'll launch the api-abuse-fuzzer agent to tamper the operation and measure the depth/timing oracle."\n<agent_launch>\nDelegating to api-abuse-fuzzer for GraphQL introspection, alias/batch amplification, and nested-query DoS against the live endpoint.\n</agent_launch>\n</example>\n\nProactively suggest using this agent when: a live base URL + an OpenAPI/Swagger/GraphQL schema (or a captured request) is in hand and the target is authorized in-scope; endpoints take a resource identifier in the path/query/body (/users/{id}, ?account=, {"order_id": ...}) — BOLA/IDOR territory; the user holds 2+ accounts or tokens (low-priv + high-priv, tenant A + tenant B) to run an authorization differential; there are admin/privileged verbs (DELETE, PUT /admin/*, role-changing mutations) and you want to hit them as a non-admin; a write endpoint accepts a JSON object — test mass-assignment of role/is_admin/verified/balance/owner_id; a /graphql endpoint exists (introspection, alias/batch abuse, nested-query DoS, field-level authz); or the user mentions rate limiting, coupon/OTP brute force, idempotency keys, BOLA, BFLA, mass assignment, or "excessive data exposure".

assumption-pressure-testSubagent

Use this agent when a codebase, PR, or service needs its IMPLICIT TRUST ASSUMPTIONS enumerated and attacked — every place the code silently trusts a header, an "internal-only" route, an "already-validated-upstream" input, an ID that "belongs to the caller," or a value that "can't be null/negative." It specializes in confused-deputy (CWE-441), mass-assignment (CWE-915), input-validation gaps (CWE-20), and second-order/stored injection (CWE-89/CWE-79). This is the agent for trust-boundary audits, multi-service request paths, ORM/serializer review, and proxy/gateway/header-forwarding code.\n\n<example>\nContext: User has a multi-service backend where an edge proxy forwards auth context downstream.\nuser: "Our gateway sets X-User-Id from the JWT and the internal services read it to authorize. Can you check this?"\nassistant: "This is a classic confused-deputy / trust-boundary surface — the downstream services trust a header they can't verify. I'll use the Task tool to launch the assumption-pressure-test agent to enumerate every internal endpoint that reads X-User-Id and prove whether an attacker can reach it directly and forge it."\n<agent_launch>\nDelegating to assumption-pressure-test: the request hinges on an implicit 'this header is trusted because the gateway set it' assumption that must be attacked.\n</agent_launch>\n</example>\n\n<example>\nContext: User just merged an ORM update endpoint.\nuser: "Here's the new profile update handler: User.objects.filter(id=request.user.id).update(**request.data)"\nassistant: "That `**request.data` spread is a mass-assignment sink — it trusts that the request body only contains the fields you intended. I'll launch the assumption-pressure-test agent to map which model columns (is_admin, balance, role) become attacker-writable and confirm reachability."\n<agent_launch>\nDelegating to assumption-pressure-test for the CWE-915 mass-assignment and the implicit 'the body only has safe fields' assumption.\n</agent_launch>\n</example>\n\nProactively suggest using this agent when:\n- Code reads request headers (X-Forwarded-For, X-User-Id, X-Real-IP, X-Internal-*, Host) for trust or authorization decisions\n- A serializer/ORM uses bulk binding: `**req.body`, `Object.assign`, `ModelMapper`, `BeanUtils.copyProperties`, `update_attributes`, `params.permit!`\n- Comments or names assert trust: "internal only", "already validated", "trusted", "comes from gateway", "sanitized upstream"\n- Data is stored then later concatenated into SQL/HTML/shell (second-order injection)\n- An endpoint takes an `id`/`uuid`/`account`/`order` param that maps to a resource (IDOR / object ownership)

coverage-analyzerSubagent

Generate gcov coverage data for a code repository.

crash-analysis-agentSubagent

Analyze security bugs from any C/C++ project with full root-cause tracing

crash-analyzerSubagent

Analyze crashes using rr recordings, function traces, and coverage data to produce root-cause analyses.

crash-analysis-checkerSubagent

Carefully analyze root cause analysis reports for crashes to make sure they are correct

exploitability-validator-agentSubagent

Multi-stage pipeline to validate vulnerability findings are real, reachable, and exploitable

federated-identity-breakerSubagent

|