Skill57 estrellas del repoactualizado 2mo ago

skill-scorer

# skill-scorer Skill Scorer is a Claude Code evaluation tool that rates any SKILL.md file on a 0-100 scale across ten dimensions including trigger precision, instruction clarity, output predictability, edge case coverage, and anti-hallucination guardrails. Use it when developing or reviewing Claude Code skills to validate quality against patterns from high-performing repositories and Anthropic's own guidelines, with scores of 80+ indicating production-ready skills.

Ver fuente Repositorio: simulacra

Instalar en Claude Code

Copiar

git clone --depth 1 https://github.com/mturac/simulacra /tmp/skill-scorer && cp -r /tmp/skill-scorer/skills/skill-scorer ~/.claude/skills/skill-scorer

Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

Definición

SKILL.md

# Skill Scorer — 0-100 Rating for Any SKILL.md

Calibrated against Anthropic's own skill-creator guidelines and the patterns
of 50k+ star repos (Superpowers, Caveman, mattpocock/skills). Average skill
scores 45-55. If yours scores 80+, ship it.

---

## Dependencies

None. Standalone. Reads any SKILL.md content.

---

## CRITICAL: Auto-start

SKILL.md content in the message or just created in conversation → skip to
Step 2. No preamble.

---

## Step 1. Get the skill

Content in context? Use it. Otherwise:

> Paste your SKILL.md or tell me which skill to score.

If the user names a skill in the current environment, read it.

---

## Step 2. Score 10 Dimensions (each 0-10)

Be precise. 6.5 is 6.5, not 7. Every score needs one sentence of evidence
referencing the actual skill content.

### D1: Trigger Precision (0-10)
*Will it fire when it should? Will it stay quiet when it shouldn't?*

Evaluate the YAML `description` field:
- Specific trigger phrases listed? (+2)
- Natural language variations covered? (+2)
- "Pushy" quality per Anthropic guidance? (+2)
- Edge-case triggers (bare paste, implicit intent)? (+2)
- False positive risk controlled? (+2)

| Range | Calibration |
|-------|-------------|
| 0-3 | Vague description, undertriggers or fires on everything |
| 4-6 | Some triggers but gaps in natural phrasing |
| 7-8 | Solid coverage, most natural phrasings caught |
| 9-10 | Comprehensive — includes implicit triggers and edge cases |

### D2: Instruction Clarity (0-10)
*Can Claude follow this without improvising?*

- Unambiguous step sequence? (+2)
- Decision points with explicit branches? (+2)
- Deterministic enough that two instances produce similar output? (+3)
- No contradictory instructions? (+1.5)
- Clear scope boundaries (what it does AND doesn't do)? (+1.5)

### D3: Output Predictability (0-10)
*Does the user know what they'll get?*

- Output format explicitly defined? (+3)
- Templates or examples provided? (+3)
- Length/scope appropriate and specified? (+2)
- Structured data output (JSON, table) available? (+2)

### D4: Edge Case Coverage (0-10)
*What happens with weird input?*

- Empty/missing input handled? (+2)
- Extremely long input addressed? (+2)
- Invalid/unexpected input types? (+2)
- Multi-language or mixed input? (+2)
- Graceful degradation path? (+2)

### D5: Anti-Hallucination Guardrails (0-10)
*Does it prevent Claude from fabricating?*

- Explicit "don't fabricate" rules? (+2.5)
- "If unsure, say so" instruction? (+2.5)
- Failure modes section (what skill does NOT do)? (+2.5)
- Separation of known vs. needs-verification? (+2.5)

### D6: Developer Experience (0-10)
*Is it pleasant to use?*

- Engaging interaction style? (+2.5)
- Personality without being annoying? (+2.5)
- Appropriate for target audience? (+2.5)
- Memorable enough to recommend? (+2.5)

### D7: Composability (0-10)
*Does it play well with other tools?*

- Output format consumable by other skills/tools? (+3)
- Explicit handoff points? (+3)
- Modular (not monolithic)? (+2)
- Avoids duplicating existing skills? (+2)

### D8: Open-Source Readiness (0-10)
*Would a stranger install this from a README?*

- Self-contained (no private dependencies)? (+2)
- Documented well enough for strangers? (+2)
- Solves a real, recognized problem? (+2)
- Clear naming and discoverability? (+2)
- Cross-agent compatibility mentioned? (+2)

### D9: Wow Factor (0-10)
*Does it make people go "oh damn"?*

- Novel approach or angle? (+3)
- Demo-able in 30 seconds? (+3)
- Would someone tweet/share this? (+2)
- Solves a problem in a way no one else has? (+2)

### D10: Real-World Utility (0-10)
*Would someone use this weekly?*

- Solves a recurring problem? (+3)
- Problem is painful enough people seek solutions? (+3)
- Faster/better than manual alternative? (+2)
- Would someone miss it if removed? (+2)

---

## Step 3. Calculate Composite Score

### Formula

```
RAW = (D1 × 1.2) + (D2 × 1.2) + (D3 × 1.0) + (D4 × 0.8) +
      (D5 × 1.0) + (D6 × 1.0) + (D7 × 0.6) + (D8 × 1.0) +
      (D9 × 1.2) + (D10 × 1.0)

SCORE = RAW  // max possible = 100
```

Weight rationale: Trigger precision (D1) and wow factor (D9) weighted highest
— a skill that never fires or that nobody cares about is dead on arrival.
Composability (D7) weighted lowest — standalone value matters more for v1.

### Verdict

| Score | Verdict | Meaning |
|-------|---------|---------|
| 80-100 | 🟢 **SHIP IT** | Publish. Polish the README and go. |
| 60-79 | 🟡 **REWORK** | Good bones. Fix the weak dimensions first. |
| 40-59 | 🟠 **MAJOR REWORK** | Concept may be sound. Execution needs rewrite. |
| 0-39 | 🔴 **SCRAP** | Start over or reconsider if this should be a skill. |

### Calibration Anchors

- **Anthropic's docx skill** = ~72 (solid execution, low wow factor)
- **Superpowers** = ~88 (methodology is the product, cross-agent, massive adoption)
- **Caveman** = ~85 (extreme wow, simple concept, measurable benchmark)
- **A typical first-draft skill** = ~35-45

These are reference points, not dogma. Score based on the rubric, not by
anchoring to these numbers.

---

## Step 4. Output Format

```
## 🎯 SKILL SCORE: [Name]

| Dim | Dimension | Score | Evidence |
|-----|-----------|-------|----------|
| D1 | Trigger Precision | X/10 | [one line referencing the skill] |
| D2 | Instruction Clarity | X/10 | ... |
| D3 | Output Predictability | X/10 | ... |
| D4 | Edge Case Coverage | X/10 | ... |
| D5 | Anti-Hallucination | X/10 | ... |
| D6 | Developer Experience | X/10 | ... |
| D7 | Composability | X/10 | ... |
| D8 | Open-Source Readiness | X/10 | ... |
| D9 | Wow Factor | X/10 | ... |
| D10 | Real-World Utility | X/10 | ... |

## COMPOSITE: XX/100 — [VERDICT EMOJI] [VERDICT]

### 🔥 Top 3 Strengths
1. [Specific, with evidence]
2. ...
3. ...

### 💀 Top 3 Weaknesses
1. [Specific, with actionable fix]
2. ...
3. ...

### 🛠️ Priority Fixes
1. [Highest-impact change — specific enough to implement now]
2. [Second]
3. [Third]
```

### Structured Output

Always append after the narra

Del mismo repositorio

simulacraSkill