offensive-ai-security
This Claude Code skill provides a structured methodology for conducting security assessments of AI and language model systems. It covers offensive techniques including prompt injection, jailbreaking, model extraction, training data poisoning, and adversarial input testing. Use this skill when performing red-team evaluations of LLM systems, assessing AI application security, or researching vulnerabilities in machine learning deployments and their integration points.
git clone --depth 1 https://github.com/SnailSploit/Claude-Red /tmp/offensive-ai-security && cp -r /tmp/offensive-ai-security/Skills/ai/offensive-ai-security ~/.claude/skills/offensive-ai-securitySKILL.md
# SKILL: AI Pentest
## Metadata
- **Skill Name**: ai-security
- **Folder**: offensive-ai-security
- **Source**: https://github.com/SnailSploit/offensive-checklist/blob/main/ai.md
## Description
AI/LLM security offensive checklist: prompt injection, jailbreaking, model extraction, training data poisoning, adversarial inputs, LLM-assisted attack automation, and AI system reconnaissance. Use when assessing AI/ML systems, red-teaming LLMs, or researching AI attack vectors.
## Trigger Phrases
Use this skill when the conversation involves any of:
`AI security, LLM security, prompt injection, jailbreak, model extraction, training data poisoning, adversarial input, AI red team, ML security, RAG poisoning, AI attack`
## Instructions for Claude
When this skill is active:
1. Load and apply the full methodology below as your operational checklist
2. Follow steps in order unless the user specifies otherwise
3. For each technique, consider applicability to the current target/context
4. Track which checklist items have been completed
5. Suggest next steps based on findings
---
## Full Methodology
# AI Pentest
## Shortcut
- Understand the AI system, its components (LLM, APIs, data sources, plugins), and functionalities. Identify critical assets and potential business impacts.
- Collect details about the model, underlying technologies, APIs, and data flow.
- Vulnerability Assessment:
- Use tools like `garak`, `LLMFuzzer` to identify common vulnerabilities.
- Craft prompts to test for injections, jailbreaks, and biased outputs.
- Probe for data leakage and insecure output handling.
- Assess plugin security and excessive agency.
- Attempt to exploit identified vulnerabilities and chain them for greater impact (e.g., prompt injection leading to data exfiltration via excessive agency).
- If access is gained, explore possibilities like model theft, further data exfiltration, or lateral movement.
## Mechanisms
AI/LLM vulnerabilities stem from several core mechanisms:
- **Instruction Following & Ambiguity**: LLMs are designed to follow instructions (prompts). Ambiguous, malicious, or cleverly crafted prompts can trick them into unintended actions. The boundary between instruction and data is often blurry.
- **Data Dependency**: Models learn from vast datasets.
- **Training Data Issues**: Biased, poisoned, or sensitive data in training sets can lead to skewed, insecure, or privacy-violating outputs.
- **Input Data Issues**: Untrusted input data (user prompts, documents, web content) can be a vector for attacks like indirect prompt injection.
- **Complexity and Lack of Transparency ("Black Box" Nature)**: The internal workings of large models are complex and not always fully understood, making it hard to predict all possible outputs or identify all vulnerabilities.
- **Integration with External Systems (Agency & Plugins)**: LLMs are often given "agency" – the ability to interact with other systems, APIs, and tools (plugins). If these integrations are insecure or the LLM has excessive permissions, it can become a powerful attack vector.
- **Output Handling**: How the LLM's output is used by downstream applications is critical. If unvalidated output is fed into other systems, it can lead to code execution, XSS, SSRF, etc.
- **Resource Consumption**: LLMs can be resource-intensive. Specially crafted inputs can lead to denial of service by exhausting computational resources.
- **Supply Chain**: Vulnerabilities can exist in pre-trained models, third-party datasets, or the MLOps pipeline components.
- **Overreliance**: Humans placing undue trust in LLM outputs without verification can lead to the propagation of misinformation or the execution of flawed, AI-generated advice/code.
- **Policy‑Layer Conflicts** – layered provider, vendor and application rules can clash, creating latent bypass windows.
- **Sparse Fine‑Tuning Drift** – lightweight adapter training frequently overrides base‑model safety alignment.
- **Multi‑Modal Expansion** – V‑L and audio‑language models inherit text flaws while adding steganographic channels.
- **Model Extraction via Embeddings** – probing embedding space boundaries through carefully crafted prompts can leak training data membership or approximate model parameters.
- **Virtualization Attacks** – convincing the model it operates in a test/sandbox environment to bypass production safety rules.
- **Constitutional Jailbreaks** – exploiting conflicts between layered safety rules (provider policy vs. developer system prompt vs. user context).
- **Tool Chaining Escalation** – multi-agent frameworks allowing Agent A to delegate to Agent B to reach privileged Agent C, bypassing single-hop restrictions.
- **Memory Poisoning** – injecting persistent malicious instructions into agent memory systems (AutoGPT, CrewAI, LangChain Memory).
- **Tokenization Exploits** – zero-width characters, Unicode normalization mismatches between input sanitizers and model tokenizers.
## Hunt
### Preparation
1. **Understand the Target AI System**:
- What type of model is it (e.g., text generation, code generation, chat)?
- What are its intended functions and capabilities?
- What data does it process (input/output)? Sensitive data?
- What external tools, APIs, or plugins does it interact with?
- Are there any documented security measures or content filters?
2. **Review OWASP Top 10 for LLM Applications**: Familiarize yourself with common attack vectors.
3. **Gather Information/Reconnaissance**:
- Identify API endpoints, input parameters, and output formats.
- Look for publicly available information about the model, its version, and underlying technologies.
- Understand the context in which the LLM operates (e.g., a chatbot on a website, a code assistant in an IDE).
4. **Check Emerging Regulatory/Governance Requirements (EU AI Act 2025, ISO/IEC 42001)** – log any class‑specific controls or audit obligations the target claims to meet.
5. **Map Trust BoundariesActive Directory attack methodology for internal network red team engagements. Covers reconnaissance (BloodHound, PowerView, ADExplorer), credential abuse (Kerberoasting, ASREProasting, NTLM relay, LLMNR/NBT-NS poisoning), privilege escalation (ACL abuse, GPO abuse, unconstrained/constrained delegation), lateral movement (Pass-the-Hash, Pass-the-Ticket, Overpass-the-Hash, WMI/WinRM/PsExec), persistence (Golden/Silver/Diamond Tickets, DCSync, DCShadow, AdminSDHolder, Skeleton Key), forest trust attacks, ADCS abuse (ESC1-ESC15), and modern MDI/Defender for Identity evasion. Use when assessing on-prem AD, hybrid AD/Entra ID environments, or ADCS deployments.
JWT attack methodology for penetration testers. Covers algorithm confusion (alg:none, RS256→HS256), weak HMAC secret brute force, kid parameter injection (SQLi, path traversal), jku/x5u/jwk header injection, JWKS cache poisoning, JWS/JWE confusion, timing attacks, and mobile JWT storage extraction. Use when testing JWT-based authentication, hunting auth bypass via token manipulation, or evaluating JWT implementation security in web or mobile apps.
Cloud security attack methodology covering AWS, Azure, and GCP. Includes credential harvesting (IMDS, ~/.aws, env vars, leaked CI secrets, instance roles), enumeration with cloud-specific tools (pacu, ScoutSuite, Prowler, ROADtools, gcp_enum), privilege escalation paths (IAM PassRole, AssumeRole chains, Lambda/Functions privilege flips, Azure Owner-on-self, GCP serviceAccountTokenCreator), persistence techniques (IAM user/key creation, AAD app registration, GCP svc account key creation, EventBridge/Logic Apps backdoors), data exfiltration (S3/Blob/GCS, snapshot share, RDS/CosmosDB/Cloud SQL exfil), cloud-native lateral movement (cross-account assume, Azure AD multi-tenant, GCP project hierarchy), serverless attacks (Lambda env vars, layer hijack, Step Functions), Kubernetes-on-cloud (EKS/AKS/GKE-specific paths to node and AWS metadata), and CSPM evasion (CloudTrail blind spots, GuardDuty mute, Sentinel rule shaping). Use when the engagement scope is cloud accounts, when you've stolen cloud credentials, or when assessing cloud posture.