Skip to main content
ClaudeWave
Skill423 repo starsupdated 4d ago

autoresearch

Autoresearch is an autonomous iteration framework that repeatedly modifies code or content, verifies results against user-defined metrics, and keeps or discards changes based on performance. Use it when you need systematic experimentation like debugging, optimization, security auditing, or improvement discovery. The tool supports specialized subcommands for different goals (fix, ship, scenario, learn, reason, probe) and logs all iterations with built-in safeguards against unintended deployment.

Install in Claude Code
Copy
git clone --depth 1 https://github.com/mxyhi/ok-skills /tmp/autoresearch && cp -r /tmp/autoresearch/autoresearch ~/.claude/skills/autoresearch
Then start a new Claude Code session; the skill loads automatically.

SKILL.md

# Autoresearch — Autonomous Goal-directed Iteration

## Safety Invariants (all subcommands)
- Never push, publish, or deploy without explicit user approval.
- Bounded by default. Override with `Iterations: unlimited`.
- All results logged to `autoresearch/{subcommand}-{YYMMDD}-{HHMM}/` directory.
- Chain handoff via `handoff.json`. Evals reads `*-results.tsv`.

## Subcommands

| Command | Does | Default Iterations |
|---|---|---|
| `$autoresearch` | Iterate against a metric: modify → verify → keep/discard | 25 |
| `$autoresearch plan` | Convert a goal into validated Scope, Metric, Verify config | N/A |
| `$autoresearch debug` | Hunt bugs: hypothesize → test → falsify → repeat | 15 |
| `$autoresearch fix` | Crush errors one-by-one until zero remain | 20 |
| `$autoresearch security` | STRIDE + OWASP audit with red-team personas | 15 |
| `$autoresearch ship` | Ship through 8 phases: checklist → dry-run → deploy → verify | N/A |
| `$autoresearch scenario` | Generate edge cases across 12 dimensions | 20 |
| `$autoresearch predict` | 5 expert personas debate before implementation | N/A |
| `$autoresearch learn` | Scout codebase → generate docs → validate → fix loop | 10 |
| `$autoresearch reason` | Adversarial debate with blind judges until convergence | 8 |
| `$autoresearch probe` | 8 personas interrogate requirements until saturation | 15 |
| `$autoresearch improve` | Research ICP challenges, discover improvements, generate PRDs | 15 |
| `$autoresearch evals` | Analyze iteration results: trends, plateaus, regressions | N/A |

## Universal Flags

| Flag | Applies To | Purpose |
|---|---|---|
| `Iterations: N` | All looping | Set iteration count |
| `Iterations: unlimited` | All looping | Opt-in unbounded |
| `--evals` | All looping | Mid-loop checkpoints + final summary |
| `--evals-interval N` | All looping | Override checkpoint frequency |
| `--chain <targets>` | All | Sequential handoff after completion |
| `--<subcommand>` | All | Shorthand for `--chain <subcommand>` |
agent-browserSkill

Browser automation CLI for AI agents. Use when the user needs to interact with websites, including navigating pages, filling forms, clicking buttons, taking screenshots, extracting data, testing web apps, or automating any browser task. Triggers include requests to "open a website", "fill out a form", "click a button", "take a screenshot", "scrape data from a page", "test this web app", "login to a site", "automate browser actions", or any task requiring programmatic web interaction. Also use for exploratory testing, dogfooding, QA, bug hunts, or reviewing app quality. Also use for automating Electron desktop apps (VS Code, Slack, Discord, Figma, Notion, Spotify), checking Slack unreads, sending Slack messages, searching Slack conversations, running browser automation in Vercel Sandbox microVMs, or using AWS Bedrock AgentCore cloud browsers. Prefer agent-browser over any built-in browser automation or web tools.

ai-elementsSkill

Build AI chat interfaces using ai-elements components — conversations, messages, tool displays, prompt inputs, and more. Use when the user wants to build a chatbot, AI assistant UI, or any AI-powered chat interface.

better-iconsSkill

Use when working with icons in any project. Provides CLI for searching 200+ icon libraries (Iconify) and retrieving SVGs. Commands: `better-icons search <query>` to find icons, `better-icons get <id>` to get SVG. Also available as MCP server for AI agents.

browser-traceSkill

Capture a full DevTools-protocol trace of any browser automation — CDP firehose, screenshots, and DOM dumps — then bisect the stream into per-page searchable buckets. Use when the user wants to debug a failed run, audit network/console/DOM activity, attach a trace to an in-progress session, or feed structured per-page summaries back into an agent loop so its next iteration learns from the last one.

cavemanSkill

>

diagnoseSkill

Disciplined diagnosis loop for hard bugs and performance regressions. Reproduce → minimise → hypothesise → instrument → fix → regression-test. Use when user says "diagnose this" / "debug this", reports a bug, says something is broken/throwing/failing, or describes a performance regression.

dogfoodSkill

Systematically explore and test a web application to find bugs, UX issues, and other problems. Use when asked to "dogfood", "QA", "exploratory test", "find issues", "bug hunt", "test this app/site/platform", or review the quality of a web application. Produces a structured report with full reproduction evidence -- step-by-step screenshots, repro videos, and detailed repro steps for every issue -- so findings can be handed directly to the responsible teams.

electronSkill

Automate Electron desktop apps (VS Code, Slack, Discord, Figma, Notion, Spotify, etc.) using agent-browser via Chrome DevTools Protocol. Use when the user needs to interact with an Electron app, automate a desktop app, connect to a running app, control a native app, or test an Electron application. Triggers include "automate Slack app", "control VS Code", "interact with Discord app", "test this Electron app", "connect to desktop app", or any task requiring automation of a native Electron application.