proof-of-work
This Claude Code skill provides patterns and tools for generating validation artifacts that demonstrate autonomous task completion. Use it when you need to capture screenshots across device sizes, collect test results, parse deployment logs, and calculate confidence scores to verify that work like bug fixes, feature implementations, or UI changes meet quality standards before auto-approval.
git clone --depth 1 https://github.com/MadAppGang/claude-code /tmp/proof-of-work && cp -r /tmp/proof-of-work/plugins/autopilot/skills/proof-of-work ~/.claude/skills/proof-of-workSKILL.md
plugin: autopilot
updated: 2026-01-20
# Proof-of-Work
**Version:** 0.1.0
**Purpose:** Generate validation artifacts for autonomous task completion
**Status:** Phase 1
## When to Use
Use this skill when you need to:
- Generate proof artifacts after task completion
- Capture screenshots for UI verification
- Parse and report test results
- Calculate confidence scores for task validation
- Determine if a task can be auto-approved
## Overview
Proof-of-work is the mechanism that validates task completion. Every finished task must include verifiable artifacts that demonstrate the work was done correctly.
## Proof Types by Task
### Bug Fix Proof
| Artifact | Required | Purpose |
|----------|----------|---------|
| Git diff | Yes | Show minimal, focused changes |
| Test results | Yes | All tests passing |
| Regression test | Yes | Specific test for the bug |
| Error log (before/after) | Optional | Visual evidence |
### Feature Proof
| Artifact | Required | Purpose |
|----------|----------|---------|
| Screenshots | Yes | Visual verification |
| Test results | Yes | Functionality works |
| Coverage report | Yes | >= 80% coverage |
| Build output | Yes | Builds successfully |
| Deployment URL | Optional | Live demo |
### UI Change Proof
| Artifact | Required | Purpose |
|----------|----------|---------|
| Desktop screenshot | Yes | 1920x1080 view |
| Mobile screenshot | Yes | 375x667 view |
| Tablet screenshot | Yes | 768x1024 view |
| Accessibility score | Yes | >= 80 Lighthouse |
| Visual regression | Optional | BackstopJS diff |
## Screenshot Capture
**Playwright Pattern:**
```typescript
import { chromium } from 'playwright';
async function captureScreenshots(url: string, outputDir: string) {
const browser = await chromium.launch({ headless: true });
const context = await browser.newContext();
const page = await context.newPage();
// Desktop
await page.setViewportSize({ width: 1920, height: 1080 });
await page.goto(url);
await page.waitForLoadState('networkidle');
await page.screenshot({
path: `${outputDir}/desktop.png`,
fullPage: true,
});
// Mobile
await page.setViewportSize({ width: 375, height: 667 });
await page.goto(url);
await page.waitForLoadState('networkidle');
await page.screenshot({
path: `${outputDir}/mobile.png`,
fullPage: true,
});
// Tablet
await page.setViewportSize({ width: 768, height: 1024 });
await page.goto(url);
await page.waitForLoadState('networkidle');
await page.screenshot({
path: `${outputDir}/tablet.png`,
fullPage: true,
});
await browser.close();
}
```
## Confidence Scoring
**Algorithm:**
```typescript
interface ProofArtifacts {
testResults?: { passed: number; total: number };
buildSuccessful?: boolean;
lintErrors?: number;
screenshots?: string[];
testCoverage?: number;
performanceScore?: number;
}
function calculateConfidence(artifacts: ProofArtifacts): number {
let score = 0;
// Tests (40 points)
if (artifacts.testResults) {
if (artifacts.testResults.passed === artifacts.testResults.total) {
score += 40;
}
}
// Build (20 points)
if (artifacts.buildSuccessful) {
score += 20;
}
// Coverage (20 points)
if (artifacts.testCoverage) {
if (artifacts.testCoverage >= 80) score += 20;
else if (artifacts.testCoverage >= 60) score += 15;
else if (artifacts.testCoverage >= 40) score += 10;
else score += 5;
}
// Screenshots (10 points)
if (artifacts.screenshots) {
if (artifacts.screenshots.length >= 3) score += 10;
else if (artifacts.screenshots.length >= 1) score += 5;
}
// Lint (10 points)
if (artifacts.lintErrors === 0) {
score += 10;
}
return score;
}
```
## Confidence Thresholds
| Confidence | Action |
|------------|--------|
| >= 95% | Auto-approve (In Review -> Done) |
| 80-94% | Manual review required |
| < 80% | Validation failed, iterate |
## Proof Summary Template
```markdown
# Proof of Work
**Task**: {issue_id}
**Type**: {task_type}
**Confidence**: {score}%
## Test Results
- Total: {total}
- Passed: {passed}
- Failed: {failed}
- Coverage: {coverage}%
## Build
- Status: {status}
- Duration: {duration}
## Screenshots
- Desktop: proof/desktop.png
- Mobile: proof/mobile.png
- Tablet: proof/tablet.png
## Artifacts
- test-results.txt
- coverage.json
- build-output.txt
```
## Examples
### Example 1: Feature Proof Generation
```typescript
const proof = {
testResults: { passed: 15, total: 15 },
buildSuccessful: true,
lintErrors: 0,
screenshots: ['desktop.png', 'mobile.png', 'tablet.png'],
testCoverage: 85,
};
const confidence = calculateConfidence(proof);
// 40 (tests) + 20 (build) + 20 (coverage) + 10 (screenshots) + 10 (lint) = 100%
```
### Example 2: Partial Proof
```typescript
const proof = {
testResults: { passed: 12, total: 15 }, // Some failing
buildSuccessful: true,
lintErrors: 2,
screenshots: ['desktop.png'],
testCoverage: 65,
};
const confidence = calculateConfidence(proof);
// 0 (tests fail) + 20 (build) + 15 (coverage) + 5 (1 screenshot) + 0 (lint errors) = 40%
// Result: Validation failed, must iterate
```
## Best Practices
- Always capture screenshots for UI work
- Run full test suite, not just affected tests
- Include coverage report for features
- Build must pass before any proof is valid
- Store proofs in session directory for debugging
- Generate proof summary in markdown for Linear comments|
|
|
Common agent patterns and templates for Claude Code. Use when implementing agents to follow proven patterns for Tasks integration, quality checks, and external model invocation via claudish CLI.
YAML frontmatter schemas for Claude Code agents and commands. Use when creating or validating agent/command files.
XML tag structure patterns for Claude Code agents and commands. Use when designing or implementing agents to ensure proper XML structure following Anthropic best practices.
YAML format for Claude Code agent definitions as alternative to markdown. Use when creating agents with YAML, converting markdown agents to YAML, or validating YAML agent schemas. Trigger keywords - "YAML agent", "agent YAML", "YAML format", "agent schema", "YAML definition", "convert to YAML".
Linear API patterns and examples for autopilot. Includes authentication, webhooks, issue CRUD, state transitions, file attachments, and comment handling.