Skill111 estrellas del repoactualizado 2mo ago

skill-creator

# Skill Creator The skill-creator Claude Code skill guides users through the complete lifecycle of building, testing, and refining Claude Code skills defined in SKILL.md files. Use it when you need to create a skill from scratch, validate an existing skill's effectiveness, run evaluations or benchmarks against a SKILL.md, improve skill instructions, or diagnose triggering failures. The skill operates autonomously by default, handling skill drafting, test prompt creation, evaluation execution, and iterative refinement, but pauses for human review when explicitly requested or when ambiguities require clarification. It organizes all evaluation artifacts in timestamped workspace directories and communicates at a technical level appropriate to each user's familiarity with coding concepts.

Ver fuente Repositorio: claude-prime

Instalar en Claude Code

Copiar

git clone --depth 1 https://github.com/avibebuilder/claude-prime /tmp/skill-creator && cp -r /tmp/skill-creator/.claude/skills/skill-creator ~/.claude/skills/skill-creator

Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

Definición

SKILL.md

# Skill Creator

Default operating mode: autonomous — create or update the skill, run evals, improve it, optimize the description, and return with a final report. Only pause for human review if the user explicitly requests it or you hit an ambiguity that can't be resolved from the evidence alone.

At a high level, the process of creating a skill goes like this:

- Decide what you want the skill to do and roughly how it should do it
- Write a draft of the skill
- Create a few test prompts and run claude-with-access-to-the-skill on them
- Evaluate the results both qualitatively and quantitatively
- While the runs happen in the background, draft some quantitative evals if there aren't any (if there are some, you can either use as is or modify if you feel something needs to change about them)
- Use the `eval-viewer/generate_review.py` script to show the results if the user wants to review
- Rewrite the skill based on eval results, benchmark data, and user feedback (if review was requested)
- Repeat until quality thresholds are met or the user is satisfied
- Optimize the description for triggering accuracy parallel

Figure out where the user is in this process and then jump in and help them progress through these stages. Route based on what they need. Of course, you should always be flexible and if the user is like "I don't need to run a bunch of evaluations, just vibe with me", you can do that instead.

**Workspace convention**: All eval artifacts go in `tmp/<skill-name>-workspace/` under the project root (the directory containing `.claude/`). This directory is gitignored. Within the workspace, organize by iteration (`iteration-1/`, `iteration-2/`, etc.).

**Nested invocation**: When invoked as a subagent with an explicit `outputs` directory, use that `outputs/` directory as the root for all inner workspaces — not `tmp/<skill-name>-workspace/`.

Track your progress with tasks/todos — without them, description optimization and the judge step are commonly skipped.

## Communicating with the user

The skill creator is liable to be used by people across a wide range of familiarity with coding jargon. If you haven't heard (and how could you, it's only very recently that it started), there's a trend now where the power of Claude is inspiring plumbers to open up their terminals, parents and grandparents to google "how to install npm". On the other hand, the bulk of users are probably fairly computer-literate.

So please pay attention to context cues to understand how to phrase your communication! In the default case, just to give you some idea:

- "evaluation" and "benchmark" are borderline, but OK
- for "JSON" and "assertion" you want to see serious cues from the user that they know what those things are before using them without explaining them

It's OK to briefly explain terms if you're in doubt, and feel free to clarify terms with a short definition if you're unsure if the user will get it.

---

## Creating a skill

### Capture Intent

Start by understanding the user's intent. The current conversation might already contain a workflow the user wants to capture (e.g., they say "turn this into a skill"). If so, extract answers from the conversation history first — the tools used, the sequence of steps, corrections the user made, input/output formats observed.
**Skip if in autonomous mode**: The user may need to fill the gaps, and should confirm before proceeding to the next step.

1. What should this skill enable Claude to do?
2. When should this skill trigger? (what user phrases/contexts)
3. What's the expected output format?
4. What evaluation strategy fits? Objectively verifiable outputs (file transforms, code generation) → expectations and baselines. Subjective outputs (writing style, art) → qualitative analysis.

**Skip if in autonomous mode**: If a gap is genuinely irresolvable from context and would materially change the skill's correctness, pause and ask. Otherwise infer, state it, and proceed.

### Interview and Research

Proactively ask questions about edge cases, input/output formats, example files, success criteria, and dependencies. Wait to write test prompts until you've got this part ironed out. SKIP this interview in autonomous mode, let make decisions yourself

**Research existing skills** — MUST read `references/available-skill-resources.md` for curated skill repositories, then fetch the README or index of relevant repos to check whether skills for this domain already exist. Don't deep-dive into repos with nothing relevant — a quick scan of the index is enough to know if there's something worth borrowing.

Check available MCPs - if useful for research (searching docs, finding similar skills, looking up best practices), research in parallel via subagents if available, otherwise inline. Come prepared with context to reduce burden on the user.

### Write the SKILL.md

Based on the user interview, fill in these components:

- **name**: Skill identifier
- **description**: When to trigger, what it does. This is the primary triggering mechanism - include both what the skill does AND specific contexts for when to use it. All "when to use" info goes here, not in the body. Note: currently Claude has a tendency to "undertrigger" skills -- to not use them when they'd be useful. To combat this, please make the skill descriptions a little bit "pushy". So for instance, instead of "How to build a simple fast dashboard to display internal Anthropic data.", you might write "How to build a simple fast dashboard to display internal Anthropic data. Make sure to use this skill whenever the user mentions dashboards, data visualization, internal metrics, or wants to display any kind of company data, even if they don't explicitly ask for a 'dashboard.'" **Length: hard limit is 1024 characters (the optimizer enforces this); aim for under ~650 characters in practice — past that, every extra clause competes for attention with the other skills' descriptions and tends to dilute rather than sharpen tri

Del mismo repositorio

agent-browserSkill

Browser automation CLI for AI agents. Use when the user needs to interact with websites, including navigating pages, filling forms, clicking buttons, taking screenshots, extracting data, testing web apps, or automating any browser task. Triggers include requests to "open a website", "fill out a form", "click a button", "take a screenshot", "scrape data from a page", "test this web app", "login to a site", "automate browser actions", or any task requiring programmatic web interaction.

askSkill

Answer questions about code, architecture, and technical decisions — no implementation. Trigger on questions asking 'why', 'what does this do', 'what is the purpose of', 'explain', 'what's the difference', 'compare', or 'what are the tradeoffs' — even when referencing specific files, code snippets, or inline code. The key signal is the user wants to UNDERSTAND something, not change it. Do NOT trigger for requests to build, fix, plan, review, research, or add/modify code.

cookSkill

Implement, build, create, or add any feature, endpoint, page, component, or functionality. Use this skill whenever the user asks you to write new code or make code changes — whether it's adding an API endpoint, building a UI page, creating an export feature, wiring up a webhook, implementing a search/filter, or any other hands-on coding task. This is the default skill for all 'build this', 'add this', 'create this', 'wire up', 'implement' requests. Covers the full cycle: clarify requirements, plan if needed, write code, verify, and review. Do NOT use for pure research, debugging, documentation, or explanation — only when the user wants working code delivered.

create-docSkill

Use when the user wants to save knowledge as a file so others don't have to rediscover it — \"turn this into a doc\", \"write this up\", \"document how X works\", \"we figured this out and want to capture it\", \"nobody should have to figure this out again\". Covers any request to create or update durable written artifacts: onboarding guides, runbooks, ADRs, API docs, architecture notes, postmortems, changelogs, setup guides. The trigger: user wants knowledge captured in a file for future reference, not just a conversation. Do NOT use when still making decisions (→ give-plan), just asking for explanation without a file (→ ask), or writing code (→ cook).

diagnoseSkill

Investigate unexpected behavior and mysterious bugs. Use when the cause of a problem is unknown and the user needs to understand WHY something is happening — symptoms like: sudden unexplained changes in metrics or behavior, works locally but not in staging/production, inconsistent or intermittent failures, correct code producing wrong results, operations succeeding but having no effect, environment-specific failures, duplicate executions, stale data, or any \"why did this change?\" or \"why is this happening?\" situation. Covers infrastructure anomalies (cache hit rates dropping, latency spikes, queue behavior shifts) as well as code bugs. The key signal is confusion about root cause, not a request to implement a known fix. Do NOT use for feature requests, known fixes, planning, or documentation tasks.

discussSkill

Brainstorms and debates approaches, then drives toward an actionable decision. Use whenever someone needs a thinking partner for a decision they're facing: 'discuss', 'debate', 'brainstorm', 'weigh options', 'tradeoffs', 'should I do X or Y', 'help me decide', 'I'm torn between', 'sanity check my thinking', or 'what do you think about'. The user must be asking for help reasoning through a choice — not asking to build, fix, evaluate, plan, or modify something (even if the topic involves this skill itself). Picks the right decision lens, surfaces tradeoffs and blind spots, pushes back when reasoning is genuinely weak, and never implements.

docs-seekerSkill

Fetch up-to-date documentation for any library, framework, API, or service into context. Use when the user wants to look up API references, check function signatures or required fields, find feature-specific docs, or verify how an external tool actually works. Triggers for queries about third-party libraries like Stripe, SQLAlchemy, Tailwind, FastAPI, shadcn, Drizzle, Hono, Better Auth — any time the answer lives in official docs rather than in the project codebase. Use this instead of guessing from trained knowledge, which is stale.

fixSkill

Fix bugs and broken behavior when there is enough evidence to act on a repair path. Use for errors, crashes, incorrect results, API failures (500, 404, 403), CORS problems, database exceptions, broken rendering, duplicated or wrong data, off-by-one mistakes, timezone/date bugs, broken forms, config-caused runtime failures, and regressions. Trigger when the user wants the bug repaired and the conversation already contains a clear failing area, a reproducible failing test, a concrete error path, or a prior diagnosis to implement. Do NOT use for new features, pure explanation, architecture discussion, broad research, or bug reports where the main need is figuring out why the behavior happens — use diagnose for that.