genotoxic
Genotoxic analyzes mutation testing results by combining survived mutants with call graph analysis and test statement removal data to categorize findings as false positives, missing unit tests, or fuzzing targets. Use it after running mutation testing to prioritize which survived mutants warrant new tests and which functions need fuzz harnesses instead of traditional unit tests.
git clone --depth 1 https://github.com/trailofbits/skills /tmp/genotoxic && cp -r /tmp/genotoxic/plugins/trailmark/skills/genotoxic ~/.claude/skills/genotoxicSKILL.md
# Genotoxic
Combines mutation testing and necessist (test statement removal) with
code graph analysis to triage findings into actionable categories:
false positives, missing unit tests, and fuzzing targets.
## When to Use
- After mutation testing reveals survived mutants that need triage
- Identifying where unit tests would have the highest impact
- Finding functions that need fuzz harnesses instead of unit tests
- Prioritizing test improvements using data flow context
- Filtering out harmless mutants from actionable ones
- Finding unnecessary test statements that indicate weak assertions (necessist)
## When NOT to Use
- Codebase has no existing test suite (write tests first)
- Pure documentation or configuration changes
- Single-file scripts with trivial logic
## Prerequisites
- **trailmark** installed — if `uv run trailmark` fails, run:
```bash
uv pip install trailmark
```
**DO NOT** fall back to "manual verification" or "manual analysis"
as a substitute for running trailmark. Install it first. If installation
fails, report the error instead of switching to manual analysis.
- A **mutation testing framework** for the target language — if the framework
command fails (not found, not installed), install it using the instructions
in [references/mutation-frameworks.md](references/mutation-frameworks.md).
**DO NOT** fall back to "manual mutation analysis" or skip mutation testing.
Install the framework first. If installation fails, report the error
instead of switching to manual mutation analysis.
- **necessist** (optional, recommended) — if the target language is
supported (Go, Rust, Solidity/Foundry, TypeScript/Hardhat,
TypeScript/Vitest, Rust/Anchor), install with `cargo install necessist`.
See [references/mutation-frameworks.md](references/mutation-frameworks.md)
for details.
- An existing test suite that passes
- **macOS environment**: Run `ulimit -n 1024` before any `mull-runner`
invocation. macOS Tahoe (26+) sets unlimited file descriptors by
default, which crashes Mull's subprocess spawning. See
[references/mutation-frameworks.md](references/mutation-frameworks.md)
for details.
---
## Rationalizations to Reject
| Rationalization | Why It's Wrong | Required Action |
|-----------------|----------------|-----------------|
| "All survived mutants need tests" | Many are harmless or equivalent | Triage before writing tests |
| "Mutation testing is too noisy" | Noise means you're not triaging | Use graph data to filter |
| "Unit tests cover everything" | Complex data flows need fuzzing | Check entrypoint reachability |
| "Dead code mutants don't matter" | Dead code should be removed | Flag for cleanup |
| "Low complexity = low risk" | Boundary bugs hide in simple code | Check mutant location |
| "Tool isn't installed, I'll do it manually" | Manual analysis misses what tooling catches | Install the tool first |
| "Necessist isn't mutation testing, skip it" | Necessist finds what mutation testing misses: weak tests | Run both when the language supports it |
---
## Quick Start
```bash
# 1. Build the code graph
uv run trailmark analyze --language auto --summary {targetDir}
# 2. Run mutation testing (language-dependent)
# Python:
uv run mutmut run --paths-to-mutate {targetDir}/src
uv run mutmut results
# 2b. Run necessist (if language supported)
necessist
# 3. Analyze results with this skill's workflow (Phase 3)
```
---
## Workflow Overview
```
Phase 1: Graph Build → Parse codebase with trailmark
↓
Phase 2: Mutation Run → Execute mutation testing framework
Phase 2b: Necessist Run → Remove test statements (optional, parallel)
↓
Phase 3: Triage → Classify findings using graph data
↓
Output: Categorized Report
├── Corroborated (both tools flag same function — highest value)
├── False Positives (harmless, skip)
├── Missing Tests (write unit tests)
└── Fuzzing Targets (set up fuzz harnesses)
```
---
## Decision Tree
```
├─ Need to set up mutation testing for a language?
│ └─ Read: references/mutation-frameworks.md
│
├─ Need to set up necessist or find weak test statements?
│ └─ Read: references/mutation-frameworks.md (Necessist section)
│
├─ Need to understand the triage criteria in depth?
│ └─ Read: references/triage-methodology.md
│
├─ Need to understand how graph data informs triage?
│ └─ Read: references/graph-analysis.md
│
└─ Already have results + graph? Use Phase 3 below.
```
---
## Phase 1: Build Code Graph and Run Pre-Analysis
Parse the target codebase with trailmark and run pre-analysis **before**
mutation testing. Pre-analysis computes blast radius, entry points, privilege
boundaries, and taint propagation, which Phase 3 uses for triage.
```bash
uv run trailmark analyze --language auto --summary {targetDir}
```
Use the `QueryEngine` API to build the graph and run pre-analysis:
1. `QueryEngine.from_directory("{targetDir}", language="auto")`
2. Call `engine.preanalysis()` — **mandatory** before triage
3. Export with `engine.to_json()` for cross-referencing with mutation results
If auto-detection is wrong for the target, rerun with an explicit language or
comma-separated list such as `python,rust`.
See [references/graph-analysis.md](references/graph-analysis.md) for the
full API: node mapping, reachability queries, blast radius, and
pre-analysis subgraph lookups.
---
## Phase 2: Run Mutation Testing
Select and run the appropriate framework. See
[references/mutation-frameworks.md](references/mutation-frameworks.md) for
language-specific setup.
**Capture survived mutants.** Each framework reports differently, but
extract these fields per mutant:
| Field | Description |
|-------|-------------|
| File path | Source file containing the mutant |
| Line number | Line where mutation was applied |
| Mutation type | What was changed (operator, value, etc.) |
| Status | survived, killed, timeout, error |
Filter to **survived** mutants only for Phase 3.
---Audits GitHub Actions workflows for security vulnerabilities in AI agent integrations including Claude Code Action, Gemini CLI, OpenAI Codex, and GitHub AI Inference. Detects attack vectors where attacker-controlled input reaches AI agents running in CI/CD pipelines, including env var intermediary patterns, direct expression injection, dangerous sandbox configurations, and wildcard user allowlists. Use when reviewing workflow files that invoke AI coding agents, auditing CI/CD pipeline security for prompt injection risks, or evaluating agentic action configurations.
Clarify requirements before implementing. Use when serious doubts arise.
Enables ultra-granular, line-by-line code analysis to build deep architectural context before vulnerability or bug finding.
Scans Algorand smart contracts for 11 common vulnerabilities including rekeying attacks, unchecked transaction fees, missing field validations, and access control issues. Use when auditing Algorand projects (TEAL/PyTeal).
Prepares codebases for security review using Trail of Bits' checklist. Helps set review goals, runs static analysis tools, increases test coverage, removes dead code, ensures accessibility, and generates documentation (flowcharts, user stories, inline comments).
Scans Cairo/StarkNet smart contracts for 6 critical vulnerabilities including felt252 arithmetic overflow, L1-L2 messaging issues, address conversion problems, and signature replay. Use when auditing StarkNet projects.
Systematic code maturity assessment using Trail of Bits' 9-category framework. Analyzes codebase for arithmetic safety, auditing practices, access controls, complexity, decentralization, documentation, MEV risks, low-level code, and testing. Produces professional scorecard with evidence-based ratings and actionable recommendations.
Scans Cosmos SDK blockchain modules and CosmWasm contracts for consensus-critical vulnerabilities — chain halts, fund loss, state divergence. 25 core + 16 IBC + 10 EVM + 3 CosmWasm patterns. Use when auditing custom x/ modules, reviewing IBC integrations, or assessing pre-launch chain security. Updated for SDK v0.53.x.