Skill314 repo starsupdated 1mo ago

adversarial-review

Adversarial-review presumes newly built code is broken and hunts for flaws using isolated reviewers with opposing lenses (correctness, security, empty-world assumptions) who run the target to reproduce findings empirically. Use it after completing a feature, before shipping, or when analyzing unfamiliar code, distinguishing it from static code-review by its requirement to execute and validate discoveries against the live application rather than reading diffs alone.

View source Repository: Software-Engineer-AI-Agent-Atlas

Install in Claude Code

Copy

git clone --depth 1 https://github.com/syahiidkamil/Software-Engineer-AI-Agent-Atlas /tmp/adversarial-review && cp -r /tmp/adversarial-review/.claude/skills/adversarial-review ~/.claude/skills/adversarial-review

Then start a new Claude Code session; the skill loads automatically.

Definition

SKILL.md

# Adversarial review — presume it broken, find where

A normal review reads code and looks for reasons to approve it. An adversarial review starts from the
opposite prior: **this was just built, it is presumed broken, and the job is to find where.** The
builder believes in their work — that belief is exactly the blind spot. So the reviewer must be
incentivized to find failure, not confirm success, and must work *without the builder's context* so it
cannot inherit the builder's assumptions.

This skill is the canonical adversarial-review protocol for ATLAS. Reach for it autonomously after
building anything non-trivial, before shipping, or when handed unfamiliar code to harden.

## When to use it — and which review

- **Use this** when you want the thing *attacked*: a feature you just finished, a build before
delivery, a flow you suspect is fragile, an unfamiliar codebase you must trust.
- **Use `/code-review` instead** for a fast, conservative pass over a **diff or PR** — it statically
reads the change for bugs and cleanups, scores confidence, and **never runs the app**. That is the
right tool for routine change review.
- **The line between them:** `/code-review` reads; adversarial-review *runs*. This skill presumes
breakage, fans out diverse hostile lenses in clean contexts, and **reproduces findings against the
live target** before believing them. Heavier, empirical, used deliberately.
- **Scale down, don't skip:** for a small change one hostile reviewer is enough. Don't fan out a fleet
to attack a typo fix — deliberating everything ships nothing.

## The four invariants (what makes a review adversarial)

1. **Presumed-broken mandate.** Every reviewer gets one instruction: *"this was just built and is
presumed broken; find where."* Never "check if it looks ok."
2. **Context isolation.** Reviewers run in **clean contexts** (separate subagents) — they get the
target and the mandate, not the build's rationalizations. A separate context window is the closest
thing to a genuine second opinion.
3. **Empirical reproduction.** A finding is a hypothesis until reproduced against the real target.
Run it. A finding that can't be reproduced is an opinion, not a bug.
4. **Senior triage.** Reproduce before fixing, fix what's real, re-verify the fix, and name the false
alarms. The report is only trustworthy if it admits what wasn't real.

## How to run it

1. **Frame the target and scale.** Name what's under review (a diff, a feature, a running app, a whole
codebase) and how to exercise it (commands to run, URLs/flows to walk, entry points). Pick reviewer
count by stakes: 1 for a small change, 3–5 lenses for a feature or build.

2. **Fan out hostile reviewers in clean contexts.** Spawn subagents, each with the presumed-broken
mandate and **one lens**, blind to each other. Choose lenses that fit the target — typically:
- **Correctness** — edge cases, off-by-one, race conditions, broken flows, state that lies
- **Security** — injection, authz/authn holes, secrets in code, unvalidated input, SSRF/path traversal
- **The empty world** — first run, zero data, no config, expired token: does it still stand up?
- **Data integrity** — partial writes, lost updates, migrations, constraints that don't hold
- **Failure & limits** — network/dependency failure, timeouts, huge inputs, concurrent users
- **UX seams** — error messages, loading/empty states, the run instructions a stranger needs
(One-shot work dies at the seams and the empty world far more than the happy path — weight those.)

3. **Reproduce every finding.** Before fixing anything, run the target and confirm the failure
actually happens. Capture the evidence (output, screenshot, failing command). Unreproducible →
demote to "unconfirmed," don't fix on faith.

4. **Triage like a senior.** Fix what reproduced, re-run to confirm the fix closed it (and opened
nothing new), and record what looked like a bug but wasn't. If a fix opens a real design fork,
invoke `free-will`.

## Output — the report

One honest summary:

- **Confirmed & fixed** — what broke, the reproduction, the fix, the re-verification.
- **Confirmed & deferred** — real but consciously not fixed now, and why (goes on the gap list).
- **False alarms** — flagged then disproven. Listing these is what makes the rest credible.
- **Residual risk** — what this pass could not exercise, and what would attack it next.

Never report "no issues found" from a read alone — that claim requires having *run* the lenses and
reproduced nothing. Report what you actually exercised, not what you assume holds.

More from this repository

code-architectSubagent

Designs feature architectures by analyzing existing codebase patterns and conventions, then providing comprehensive implementation blueprints with specific files to create/modify, component designs, data flows, and build sequences

code-explorerSubagent

Deeply analyzes existing codebase features by tracing execution paths, mapping architecture layers, understanding patterns and abstractions, and documenting dependencies to inform new development

code-reviewSubagent

Code review a pull request

code-simplifierSubagent

Simplifies and refines code for clarity, consistency, and maintainability while preserving all functionality. Focuses on recently modified code unless instructed otherwise.

commitSlash Command

Commit what is already staged — runs the commit subagent in the background, following the ATLAS commit convention.

qa-manual-testerSubagent

Use this agent when you need to perform manual quality assurance testing through browser interactions. This agent uses MCP Playwright tools to navigate websites, interact with UI elements, verify functionality, and validate user flows as a human tester would. Perfect for testing new features, regression testing, validating bug fixes, or exploring application behavior. Examples:\n\n<example>\nContext: The user has just implemented a new login feature and wants to test it.\nuser: "I've added a new login form, can you test if it works correctly?"\nassistant: "I'll use the qa-manual-tester agent to test the login functionality through the browser."\n<commentary>\nSince the user needs manual testing of a new feature, use the Task tool to launch the qa-manual-tester agent to interact with the browser and verify the login flow.\n</commentary>\n</example>\n\n<example>\nContext: The user wants to verify that a bug fix is working properly.\nuser: "I fixed the issue where the submit button wasn't working on mobile view. Can you verify?"\nassistant: "Let me launch the qa-manual-tester agent to verify the submit button works correctly in mobile view."\n<commentary>\nThe user needs manual verification of a bug fix, so use the qa-manual-tester agent to test the specific functionality through browser interaction.\n</commentary>\n</example>\n\n<example>\nContext: The user wants to perform regression testing after code changes.\nuser: "I've refactored the checkout flow. Please test that everything still works."\nassistant: "I'll use the qa-manual-tester agent to perform comprehensive testing of the checkout flow."\n<commentary>\nSince the user needs regression testing after refactoring, use the qa-manual-tester agent to manually test the entire checkout flow.\n</commentary>\n</example>

change-core-selfSlash Command

Interview Boss about the project, then reason from first principles to design the ideal ATLAS operating identity/system-prompt for it — free to drop KISS/YAGNI/DRY/clean-architecture entirely when the project (and the LLM's own distribution) calls for a different mindset

get-to-knowSlash Command

Initialize project context — understand the project, configure conventions, and set up project rules