vector-forge
Vector Forge uses mutation testing to identify gaps in cryptographic test vector coverage, then generates new vectors targeting uncovered code paths. Use it to measure how well existing test vectors exercise implementations, find escaped mutants, compare before-and-after mutation kill rates, or create cross-implementation test suites for crypto primitives and protocols.
git clone --depth 1 https://github.com/trailofbits/skills /tmp/vector-forge && cp -r /tmp/vector-forge/plugins/trailmark/skills/vector-forge ~/.claude/skills/vector-forgeSKILL.md
# Vector Forge
Uses mutation testing to systematically identify gaps in test vector
coverage, then generates new test vectors that close those gaps.
Measures effectiveness by comparing mutation kill rates before and after.
## When to Use
- Generating test vectors for cryptographic algorithms or protocols
- Evaluating how well existing test vectors cover an implementation
- Finding implementation code paths that no test vector exercises
- Creating Wycheproof-style cross-implementation test vectors
- Measuring the concrete coverage value of a test vector suite
## When NOT to Use
- No implementations exist yet (need code to mutate)
- Single trivial implementation with no edge cases
- Testing application logic rather than algorithm implementations
- The algorithm has no public test vectors to compare against
## Prerequisites
- **trailmark** installed — if `uv run trailmark` fails, run:
```bash
uv pip install trailmark
```
- At least one implementation of the target algorithm in a
language with mutation testing support
- A test harness that consumes test vectors and exercises
the implementation
- A mutation testing framework for the target language
---
## Rationalizations to Reject
| Rationalization | Why It's Wrong | Required Action |
|-----------------|----------------|-----------------|
| "We have enough test vectors" | Mutation testing proves otherwise | Run the baseline first |
| "The implementation's own tests are sufficient" | Own tests often share blind spots with the impl | Cross-impl vectors catch different bugs |
| "FFI crates can be mutation tested at the binding layer" | Mutations to wrappers don't affect the underlying impl | Mutate the actual implementation language |
| "Timeouts mean the mutation was caught" | Timeouts are ambiguous — could be killed or alive | Resolve timeouts before drawing conclusions |
| "All mutants are equivalent" | Most aren't — verify by reading the mutation | Classify each escaped mutant individually |
| "Checking valid vectors is enough" | Permissive mutations survive without negative assertions | Assert rejection for every invalid vector |
| "Manual analysis is fine" | Manual analysis misses what tooling catches | Install and run the tools |
---
## Workflow Overview
```
Phase 1: Discovery → Find implementations to test
↓
Phase 2: Harness → Write/adapt test vector harness for each impl
↓
Phase 3: Baseline → Run mutation testing with existing vectors
↓
Phase 4: Escape Analysis → Classify escaped mutants by code path
↓
Phase 5: Vector Gen → Create test vectors targeting escapes
↓
Phase 6: Validation → Re-run mutation testing, compare before/after
↓
Output: Coverage Report + New Test Vectors
```
---
## Phase 1: Discovery
Find implementations of the target algorithm. Look for:
1. **Pure implementations** in high-level languages (Go, Rust, Python)
— these are the best mutation testing targets
2. **FFI wrapper crates** — identify these early so you don't waste
time mutating wrapper glue code
3. **Reference implementations** — useful for cross-verification but
may not be the best mutation targets
For each implementation, note:
- Language and mutation testing framework
- Whether it's pure code or FFI wrappers
- Existing test suite size and coverage
- Which API surface the test vectors will exercise
### Implementation Type Classification
| Type | Mutation Value | Example |
|------|---------------|---------|
| Pure implementation | High | zkcrypto/bls12_381 (Rust), gnark-crypto (Go) |
| FFI bindings to C/asm | Low at binding layer | blst Rust crate |
| C/C++ implementation | High (use Mull) | blst C library |
| Generated code | Medium (mutations may be equivalent) | gnark-crypto generated field arithmetic |
**Key insight:** If an implementation delegates to another language
via FFI, you must mutate the *underlying* implementation, not the
bindings. For C/C++ underneath Rust/Go/Python, use Mull or similar.
---
## Phase 2: Harness
For each implementation, create a test harness that:
1. Reads test vectors from JSON files (Wycheproof format recommended)
2. Exercises the implementation's API for each vector
3. Asserts **both acceptance and rejection**:
- Valid vectors: deserialization succeeds, output matches expected
- Invalid vectors: deserialization fails or verification rejects
4. Adds **roundtrip assertions** for valid deserialization vectors:
`serialize(deserialize(bytes)) == bytes`
5. Reports pass/fail per vector with test IDs
**Critical:** A harness that only checks valid vectors will miss all
permissive mutations (e.g., `&` → `|` in validation). See
[references/lessons-learned.md](references/lessons-learned.md) §7.
The harness must be runnable by the mutation testing framework.
For most frameworks this means:
- **Go:** A `_test.go` file in the same package as the implementation
- **Rust:** An integration test in `tests/` or inline `#[test]` functions
- **Python:** A pytest test file
- **C/C++:** A test binary linked against the implementation
### Harness Placement
The harness must live *inside the implementation's package* so the
mutation framework can see it. This usually means:
```bash
# Go: add test file to the package being mutated
cp wycheproof_test.go /path/to/impl/package/
# Rust: add integration test
cp wycheproof.rs /path/to/crate/tests/
# Python: add test to the test directory
cp test_wycheproof.py /path/to/package/tests/
```
### Handling Existing Vectors
If the implementation already has test vectors:
1. Run mutation testing with ONLY the existing vectors (baseline)
2. Run mutation testing with ONLY your new vectors
3. Run mutation testing with BOTH combined
4. The delta between (1) and (3) shows the new vectors' value
---
## Phase 3: Baseline
Run mutation testing with existing test vectors only.
### Framework Selection
See [references/mutation-frameworks.md](references/mutation-frameworks.md)
for language-specifAudits GitHub Actions workflows for security vulnerabilities in AI agent integrations including Claude Code Action, Gemini CLI, OpenAI Codex, and GitHub AI Inference. Detects attack vectors where attacker-controlled input reaches AI agents running in CI/CD pipelines, including env var intermediary patterns, direct expression injection, dangerous sandbox configurations, and wildcard user allowlists. Use when reviewing workflow files that invoke AI coding agents, auditing CI/CD pipeline security for prompt injection risks, or evaluating agentic action configurations.
Clarify requirements before implementing. Use when serious doubts arise.
Enables ultra-granular, line-by-line code analysis to build deep architectural context before vulnerability or bug finding.
Scans Algorand smart contracts for 11 common vulnerabilities including rekeying attacks, unchecked transaction fees, missing field validations, and access control issues. Use when auditing Algorand projects (TEAL/PyTeal).
Prepares codebases for security review using Trail of Bits' checklist. Helps set review goals, runs static analysis tools, increases test coverage, removes dead code, ensures accessibility, and generates documentation (flowcharts, user stories, inline comments).
Scans Cairo/StarkNet smart contracts for 6 critical vulnerabilities including felt252 arithmetic overflow, L1-L2 messaging issues, address conversion problems, and signature replay. Use when auditing StarkNet projects.
Systematic code maturity assessment using Trail of Bits' 9-category framework. Analyzes codebase for arithmetic safety, auditing practices, access controls, complexity, decentralization, documentation, MEV risks, low-level code, and testing. Produces professional scorecard with evidence-based ratings and actionable recommendations.
Scans Cosmos SDK blockchain modules and CosmWasm contracts for consensus-critical vulnerabilities — chain halts, fund loss, state divergence. 25 core + 16 IBC + 10 EVM + 3 CosmWasm patterns. Use when auditing custom x/ modules, reviewing IBC integrations, or assessing pre-launch chain security. Updated for SDK v0.53.x.