Skip to main content
ClaudeWave
Skill71 repo starsupdated yesterday

ai-pentesting

>-

Install in Claude Code
Copy
git clone --depth 1 https://github.com/TerminalSkills/skills /tmp/ai-pentesting && cp -r /tmp/ai-pentesting/skills/ai-pentesting ~/.claude/skills/ai-pentesting
Then start a new Claude Code session; the skill loads automatically.

SKILL.md

# AI Pentesting

## Overview

Use AI agents to autonomously conduct penetration tests on web applications. Combine LLM reasoning with security tools (nmap, subfinder, nuclei, sqlmap, browser automation) to find and prove vulnerabilities with minimal human intervention.

## Instructions

### Methodology

AI pentesting follows the same phases as human pentesting, but the AI orchestrates each phase autonomously:

```
Phase 1: RECONNAISSANCE
├── Subdomain enumeration (subfinder)
├── Technology fingerprinting (whatweb, wappalyzer)
├── Port scanning (nmap)
├── API schema discovery (crawling, OpenAPI/GraphQL introspection)
└── Source code analysis (if white-box)
    AI decides: which tools to run, in what order, based on findings

Phase 2: VULNERABILITY ANALYSIS
├── Known CVE scanning (nuclei)
├── Web vulnerability scanning (OWASP ZAP, nikto)
├── API fuzzing (schemathesis)
├── Code-level vulnerability hunting (semgrep, CodeQL)
└── Data flow analysis (input → dangerous function)
    AI decides: which findings are likely exploitable

Phase 3: EXPLOITATION
├── SQL injection (sqlmap, manual payloads)
├── XSS (reflected, stored, DOM)
├── SSRF (internal access, cloud metadata)
├── Authentication bypass (broken auth, privilege escalation)
├── Business logic flaws (price manipulation, race conditions)
└── Browser-based exploitation (Playwright/Puppeteer)
    AI decides: exploitation order, payload selection, chaining

Phase 4: REPORTING
├── Proof-of-concept for each finding
├── Reproducible steps (curl commands, screenshots)
├── Severity rating (CVSS score)
├── Remediation guidance
└── Executive summary
    AI generates: structured, evidence-based report
```

### Setting Up Shannon

Shannon is an open-source AI pentester that automates the full lifecycle:

```bash
# Clone and set up Shannon
git clone https://github.com/KeygraphHQ/shannon.git
cd shannon

# Configure credentials
export ANTHROPIC_API_KEY="your-api-key"
export CLAUDE_CODE_MAX_OUTPUT_TOKENS=64000

# Run a pentest against your application
# Requires: Docker, target URL, source code repo
./shannon start URL=https://your-app.com REPO=./your-repo

# Monitor progress
./shannon logs

# View results in Temporal UI
open http://localhost:8233
```

Shannon's architecture:
- **Reconnaissance agent**: Maps attack surface using nmap, subfinder, whatweb
- **Vulnerability agents**: Specialized per OWASP category (injection, XSS, SSRF, auth bypass)
- **Exploitation agent**: Uses browser automation to prove vulnerabilities with real exploits
- **Reporting agent**: Generates findings with copy-paste PoC commands

### Building a Custom AI Pentest Pipeline

For cases where Shannon doesn't fit, build a custom pipeline:

```python
# ai_pentester.py
# Custom AI pentesting pipeline using LLM + security tools

import subprocess
import json
from openai import OpenAI

client = OpenAI()

class AIPentester:
    """Autonomous AI penetration tester.
    
    Orchestrates security tools using LLM reasoning
    to find and prove vulnerabilities.
    """
    
    def __init__(self, target_url: str, scope: list[str] = None):
        self.target = target_url
        self.scope = scope or [target_url]
        self.findings = []
        self.recon_data = {}
    
    async def run_pentest(self) -> dict:
        """Execute full penetration test lifecycle.
        
        Returns:
            Dict with findings, evidence, and recommendations
        """
        # Phase 1: Recon
        self.recon_data = await self._recon()
        
        # Phase 2: AI-guided vulnerability analysis
        targets = await self._analyze_attack_surface(self.recon_data)
        
        # Phase 3: AI-guided exploitation
        for target in targets:
            finding = await self._exploit(target)
            if finding:
                self.findings.append(finding)
        
        # Phase 4: Generate report
        report = await self._generate_report()
        return report
    
    async def _recon(self) -> dict:
        """Run reconnaissance tools and aggregate results."""
        recon = {}
        
        # Subdomain enumeration
        result = subprocess.run(
            ['subfinder', '-d', self._get_domain(), '-silent'],
            capture_output=True, text=True, timeout=120
        )
        recon['subdomains'] = result.stdout.strip().split('\n')
        
        # Technology fingerprinting
        result = subprocess.run(
            ['whatweb', self.target, '--log-json=/dev/stdout', '-a', '3'],
            capture_output=True, text=True, timeout=60
        )
        recon['technologies'] = json.loads(result.stdout) if result.stdout else {}
        
        # Port scanning
        result = subprocess.run(
            ['nmap', '-sV', '--top-ports', '1000', '-oJ', '-', self._get_domain()],
            capture_output=True, text=True, timeout=300
        )
        recon['ports'] = result.stdout
        
        # Nuclei scan for known CVEs
        result = subprocess.run(
            ['nuclei', '-u', self.target, '-severity', 'critical,high',
             '-json', '-silent'],
            capture_output=True, text=True, timeout=300
        )
        recon['known_vulns'] = [
            json.loads(line) for line in result.stdout.strip().split('\n')
            if line.strip()
        ]
        
        return recon
    
    async def _analyze_attack_surface(self, recon: dict) -> list:
        """Use AI to analyze recon data and prioritize attack targets."""
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=[
                {"role": "system", "content":
                 "You are an expert penetration tester. Analyze the "
                 "reconnaissance data and identify the most promising "
                 "attack vectors. Return JSON array of targets."},
                {"role": "user", "content":
                 f"Recon data:\n{json.dumps(recon, indent=2)}\n\n"
                 "Identify attack targets with: endpoint, vulnerability_