Skip to main content
ClaudeWave
Skill304 estrellas del repoactualizado 2d ago

exploitability-validation

# ClaudeWave: exploitability-validation The exploitability-validation skill implements a multi-stage pipeline that verifies vulnerability findings are genuine, reachable, and practically exploitable before proceeding to exploit development. Use this after initial vulnerability scanning to filter out hallucinated findings, dead code paths, and findings with unrealistic preconditions, preventing wasted effort on false positives.

Instalar en Claude Code
Copiar
git clone --depth 1 https://github.com/deonmenezes/mantishack /tmp/exploitability-validation && cp -r /tmp/exploitability-validation/.claude/skills/exploitability-validation ~/.claude/skills/exploitability-validation
Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

SKILL.md

# Exploitability Validation Skill

A multi-stage pipeline for validating that vulnerability findings are real, reachable, and exploitable.

## Purpose

Prevents wasted effort on:
- Hallucinated findings (file doesn't exist, code doesn't match)
- Unreachable code paths (dead code, test-only)
- Findings with unrealistic preconditions

## When to Use

After scanning produces findings, BEFORE exploit development:
1. Scanner finds potential vulnerability
2. **This skill validates it's real and reachable**
3. Exploit Feasibility checks binary constraints
4. Exploit development proceeds

---

## [CONFIG] Configuration

```yaml
models:
  native: true
  additional: false  # Set true to also run GPT, Gemini

output_when_additional:
  display: "agreement: 2/3"
  threshold: "1/3 is enough to proceed"
```

---

## [EXEC] Execution Rules

1. Run the full pipeline end-to-end.
2. Solve and fix any issues you encounter, unless you failed five times in a row, or need clarification.
3. Run on latest thinking/reasoning model available (verify model name).
4. Pipeline must be deterministic - if ran again, results should be the same.
5. **Validate after writing.** Run `libexec/mantishack-validate-schema <type> <file>` after each Write. Match the type to what you wrote:
   - `stage` for any `stage-*.json` file (e.g., `stage-a.json`, `stage-c.json`, `stage-f.json`)
   - `attack-tree`, `attack-paths`, `attack-surface`, `hypotheses`, `disproven` for the matching working doc

   Fix any errors before proceeding to the next stage.
6. No finding may reach Stage D without passing through Stages B and C, even if Stage A produced a successful PoC.
7. Do not narrate gate compliance ("GATE-8 satisfied"), schema validation passes ("findings.json: OK"), or stage transitions ("Stage C complete") to the user. Do show substantive work: PoC test output, tool investigations (objdump, checksec), binary protections, hypothesis results, and evidence discovered. Document gate compliance in validation-report.md only. Report schema or pipeline failures immediately.
8. **Python imports:** All `python3 -c` snippets must start with `import sys, os; sys.path.insert(0, os.environ["MANTISHACK_DIR"])` before importing from `packages.*` or `core.*`.
9. **Build directory:** Stage 0 creates `$OUTPUT_DIR/build/`. Compile and run PoCs there, not in the target repo.
10b. **Sandbox:** Run ALL compilation and execution via `libexec/mantishack-run-sandboxed <cmd> [args]`. This blocks network, restricts writes, and limits resources. Never run gcc or binaries directly.
10. **libexec scripts:** Run `libexec/` scripts exactly as shown in the prompts — do not prepend `export` commands, do not use absolute paths, do not wrap in additional shell logic. The permission system auto-approves `libexec/mantishack-*` commands only when run in this exact form.
11. **Per-stage JSON files.** Write your stage's output to `stage-X.json` (e.g., `stage-a.json`, `stage-b.json`), not to `findings.json`. The prep script merges stage files into findings.json automatically. Do not read or write findings.json directly. Do not use `python3 -c` scripts for JSON — use the Write tool.

---

## [GATES] MUST-GATEs

Rationale: Without these gates, models sample instead of checking all code, hedge with "if" and "maybe" instead of verifying, and miss exploitable findings.

**GATE-1 [ASSUME-EXPLOIT]:** Your goal is to discover real exploitable vulnerabilities. If you think something isn't - don't assume. First, investigate under the assumption that it is.

**GATE-2 [STRICT-SEQUENCE]:** Strictly follow instructions. If you think or try something else, or a new idea comes up, present the results of that analysis separately at the end. Always display the results of the strict criteria first, and only then display the results of the additional methods, if any.

**GATE-3 [CHECKLIST]:** Check pipeline, update checklist, and collect evidence of compliance to present at the end that you successfully executed all actions through these gates.

**GATE-4 [NO-HEDGING]:** If your Chain-of-Thought or results include "if", "maybe", "uncertain", "unclear", "could potentially", "may be possible", "depending on", "in theory", "in certain circumstances", or similar - immediately verify the claim. Do not leave unverified.

**GATE-5 [FULL-COVERAGE]:** Test the entire code provided (file(s)/code base) against checklist.json, ensuring you checked all functions and lines of code. Do not sample, estimate, or guess.

**GATE-6 [PROOF]:** Always provide proof and show the vulnerable code.

**GATE-7 [CONSISTENCY]:** Before finalizing each finding, verify that `vuln_type`, `severity`, and `status` are consistent with the `description` and `proof` text. A description that explains why a bug is benign must not carry high severity.

**GATE-8 [POC-EVIDENCE]:** A PoC requires observable evidence: a crash, changed output, callback, file read, error message, or measurable state change. "Ran without error" is not evidence. If the expected effect is not observed, either the PoC is wrong or the bug is not triggered — investigate which.

---

## [STYLE] Output Formatting

**Status values in JSON must be snake_case:**
- `exploitable` not `EXPLOITABLE` or `Exploitable`
- `confirmed` not `CONFIRMED` or `Confirmed`
- `ruled_out` not `RULED_OUT` or `Ruled Out`
- `disproven` not `DISPROVEN` or `Disproven`

**RULE: Any text shown to the user (chat, tables, summaries, stage progress) MUST use Title Case, never snake_case.** This applies at every stage, not just the final report. Convert on output:
- `poc_success` → "PoC Success"
- `not_disproven` → "Not Disproven"
- `buffer_overflow` → "Buffer Overflow"
- `command_injection` → "Command Injection"
- `confirmed_constrained` → "Confirmed (Constrained)"

BAD (snake_case leaked into chat):
```
- FIND-001 (buffer_overflow): poc_success
```
GOOD:
```
- FIND-001 (Buffer Overflow): PoC Success
```

**No colored circles or emojis:**
- Do not use 🔴/🟡/🟢 - they are perspective-dependent (red =
api-abuse-fuzzerSubagent

Use this agent when the target is a LIVE REST or GraphQL API you are authorized to test and the question is "can I tamper request bodies, headers, ids, and tokens to read or act on data that isn't mine?" — active, request-driven abuse of the API contract, not static code review. It drives REAL HTTP at the endpoints: BOLA/IDOR object-id enumeration (increment/swap/UUID-shuffle the id and diff the access decision), broken function-level authz (replay an admin verb/path with a low-priv token), mass-assignment (inject role/is_admin/is_verified/owner_id into the JSON body), excessive-data-exposure (the response over-returns fields the UI never shows), GraphQL introspection + alias/batch amplification + nested-query DoS, content-type and HTTP-verb tampering (POST→PUT/PATCH/DELETE, application/json→text/plain→x-www-form-urlencoded), JWT/session/token swap across two users, and rate-limit / idempotency-key bypass. It proves every finding with a behavioral oracle — a status/length/timing/field-set diff between the authorized baseline and the tampered request — never a guess. Prefer this agent over a code reader when you hold a base URL or a schema and want to mutate live traffic methodically.\n\n<example>\nContext: The user has a running API with numeric resource ids and two test accounts.\nuser: "Here's our staging API at https://api.staging.acme.test and tokens for user A and user B — can user A read user B's orders?"\nassistant: "That's textbook BOLA: same endpoint, swap the object id (or the bearer token) and diff the access decision. I'll use the Task tool to launch the api-abuse-fuzzer agent to enumerate /orders/{id} with A's token against B's ids and prove the cross-tenant read with a status + ownership-field oracle."\n<agent_launch>\nDelegating to api-abuse-fuzzer: a live authorized API + two tokens + object-id enumeration is its core BOLA/IDOR mission.\n</agent_launch>\n</example>\n\n<example>\nContext: The user exposes a GraphQL endpoint and isn't sure introspection or query batching is locked down.\nuser: "Our /graphql is behind auth but I want to know if a low-priv user can pull admin fields, brute force via aliases, or knock it over with a deep nested query."\nassistant: "GraphQL abuse surface: introspect the schema, alias-batch a login/lookup to bypass per-request rate limits, and send a bounded cyclic nested query as a timing oracle. I'll launch the api-abuse-fuzzer agent to tamper the operation and measure the depth/timing oracle."\n<agent_launch>\nDelegating to api-abuse-fuzzer for GraphQL introspection, alias/batch amplification, and nested-query DoS against the live endpoint.\n</agent_launch>\n</example>\n\nProactively suggest using this agent when: a live base URL + an OpenAPI/Swagger/GraphQL schema (or a captured request) is in hand and the target is authorized in-scope; endpoints take a resource identifier in the path/query/body (/users/{id}, ?account=, {"order_id": ...}) — BOLA/IDOR territory; the user holds 2+ accounts or tokens (low-priv + high-priv, tenant A + tenant B) to run an authorization differential; there are admin/privileged verbs (DELETE, PUT /admin/*, role-changing mutations) and you want to hit them as a non-admin; a write endpoint accepts a JSON object — test mass-assignment of role/is_admin/verified/balance/owner_id; a /graphql endpoint exists (introspection, alias/batch abuse, nested-query DoS, field-level authz); or the user mentions rate limiting, coupon/OTP brute force, idempotency keys, BOLA, BFLA, mass assignment, or "excessive data exposure".

assumption-pressure-testSubagent

Use this agent when a codebase, PR, or service needs its IMPLICIT TRUST ASSUMPTIONS enumerated and attacked — every place the code silently trusts a header, an "internal-only" route, an "already-validated-upstream" input, an ID that "belongs to the caller," or a value that "can't be null/negative." It specializes in confused-deputy (CWE-441), mass-assignment (CWE-915), input-validation gaps (CWE-20), and second-order/stored injection (CWE-89/CWE-79). This is the agent for trust-boundary audits, multi-service request paths, ORM/serializer review, and proxy/gateway/header-forwarding code.\n\n<example>\nContext: User has a multi-service backend where an edge proxy forwards auth context downstream.\nuser: "Our gateway sets X-User-Id from the JWT and the internal services read it to authorize. Can you check this?"\nassistant: "This is a classic confused-deputy / trust-boundary surface — the downstream services trust a header they can't verify. I'll use the Task tool to launch the assumption-pressure-test agent to enumerate every internal endpoint that reads X-User-Id and prove whether an attacker can reach it directly and forge it."\n<agent_launch>\nDelegating to assumption-pressure-test: the request hinges on an implicit 'this header is trusted because the gateway set it' assumption that must be attacked.\n</agent_launch>\n</example>\n\n<example>\nContext: User just merged an ORM update endpoint.\nuser: "Here's the new profile update handler: User.objects.filter(id=request.user.id).update(**request.data)"\nassistant: "That `**request.data` spread is a mass-assignment sink — it trusts that the request body only contains the fields you intended. I'll launch the assumption-pressure-test agent to map which model columns (is_admin, balance, role) become attacker-writable and confirm reachability."\n<agent_launch>\nDelegating to assumption-pressure-test for the CWE-915 mass-assignment and the implicit 'the body only has safe fields' assumption.\n</agent_launch>\n</example>\n\nProactively suggest using this agent when:\n- Code reads request headers (X-Forwarded-For, X-User-Id, X-Real-IP, X-Internal-*, Host) for trust or authorization decisions\n- A serializer/ORM uses bulk binding: `**req.body`, `Object.assign`, `ModelMapper`, `BeanUtils.copyProperties`, `update_attributes`, `params.permit!`\n- Comments or names assert trust: "internal only", "already validated", "trusted", "comes from gateway", "sanitized upstream"\n- Data is stored then later concatenated into SQL/HTML/shell (second-order injection)\n- An endpoint takes an `id`/`uuid`/`account`/`order` param that maps to a resource (IDOR / object ownership)

coverage-analyzerSubagent

Generate gcov coverage data for a code repository.

crash-analysis-agentSubagent

Analyze security bugs from any C/C++ project with full root-cause tracing

crash-analyzerSubagent

Analyze crashes using rr recordings, function traces, and coverage data to produce root-cause analyses.

crash-analysis-checkerSubagent

Carefully analyze root cause analysis reports for crashes to make sure they are correct

exploitability-validator-agentSubagent

Multi-stage pipeline to validate vulnerability findings are real, reachable, and exploitable

federated-identity-breakerSubagent

|