Skill741 repo starsupdated today

bernstein-quality

Bernstein Quality Metrics analyzes and compares code generation reliability across AI models by executing quality assessment scripts and displaying success rates, pass rates for linting and testing, and completion time distributions. Use this skill when users ask about agent reliability, model performance comparisons, test failure analysis, or want to see quality metrics dashboards for decision-making on model routing and optimization.

View source Repository: bernstein

Install in Claude Code

Copy

git clone --depth 1 https://github.com/sipyourdrink-ltd/bernstein /tmp/bernstein-quality && cp -r /tmp/bernstein-quality/packages/cursor-plugin/skills/bernstein-quality ~/.claude/skills/bernstein-quality

Then start a new Claude Code session; the skill loads automatically.

Definition

SKILL.md

# Bernstein Quality Metrics

Analyze quality and reliability of agent-generated code.

## When to Use

- User asks "how reliable are the agents?" or "which model is best?"
- User wants success rates, pass rates, or completion time stats
- User asks about test failures or lint issues across models
- User says "show me quality metrics"

## Instructions

1. Run `scripts/quality.sh metrics` for overall quality metrics.
2. Run `scripts/quality.sh pass-rates` for lint/typecheck/test pass rates by model.
3. Run `scripts/quality.sh times` for completion time distributions.

4. Present a quality dashboard:

```
## Quality Dashboard

### Success Rate by Model
| Model | Tasks | Success | Fail | Rate |
|-------|-------|---------|------|------|
| claude-sonnet-4 | 24 | 22 | 2 | 91.7% |
| gpt-4.1 | 12 | 10 | 2 | 83.3% |

### Pass Rates
| Check | Overall | claude-sonnet-4 | gpt-4.1 |
|-------|---------|-----------------|---------|
| Lint | 96% | 98% | 92% |
| Type-check | 88% | 91% | 83% |
| Tests | 85% | 89% | 75% |

### Completion Times
| Percentile | Time |
|------------|------|
| p50 | 3m 20s |
| p90 | 8m 45s |
| p99 | 15m 12s |
```

5. Highlight any models with significantly lower pass rates.
6. Recommend model routing adjustments if one model consistently underperforms.