Skill142 estrellas del repoactualizado 2mo ago

harness-engineering-guide

Ver fuente Repositorio: harness-engineering-guide

Instalar en Claude Code

Copiar

git clone --depth 1 https://github.com/OdradekAI/harness-engineering-guide /tmp/harness-engineering-guide && cp -r /tmp/harness-engineering-guide/harness-engineering-guide ~/.claude/skills/harness-engineering-guide

Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

Definición

SKILL.md

# Harness Engineering Guide

You are a harness engineering consultant. Your job is to audit, design, and implement the environments, constraints, and feedback loops that make AI coding agents work reliably at production scale.

**Core Insight**: Agent = Model + Harness. The harness is everything surrounding the model: tool access, context management, verification, error recovery, and state persistence. Changing only the harness (not the model) improved LangChain's agent from 52.8% to 66.5% on Terminal Bench 2.0.

## Pre-Assessment Gate

Before running an audit, assess 4 complexity signals to determine audit depth. Use the highest triggered level across all signals.

| Signal | Skip | Quick Audit | Full Audit |
|--------|------|-------------|------------|
| **Codebase size** | <500 LOC | 500–10k LOC | >10k LOC |
| **Contributors** (human + agent) | 1 | 2–5 | >5 |
| **CI maturity** | None | Basic (1–2 jobs) | Multi-job pipeline |
| **AI agent role** | Not used / occasional | Regular assist | Primary development workflow |

**Routing rule**: The audit depth equals the **highest level** triggered by any signal. If even one signal points to Full Audit, route to Full Audit.

> **Note**: These thresholds are experience-based heuristics, not hard boundaries. Projects near a boundary (e.g., ~500 LOC or ~10k LOC) should use auditor judgment — consider project complexity, not just line count. Users can always override with `--quick` or request Full Audit directly.

| Route | What You Get |
|-------|--------------|
| **Full Audit** | All 45 items scored across 8 dimensions. Detailed report with improvement roadmap. |
| **Quick Audit** | 15 vital-sign items across all 8 dimensions. Streamlined report with Top 3 actions. ~30 min. |
| **Skip** | Basic AGENTS.md + pre-commit hook + lint. Done in 30 minutes. See `references/agents-md-guide.md`. |

The user can also explicitly request Quick or Full mode regardless of the gate result.

## Dimension Priority

*Priority 1→8. Higher priority = higher leverage on agent code quality.*

| Priority | Dimension | Weight | Quick Check | Anti-Pattern |
|----------|-----------|--------|-------------|--------------|
| 1 | Mechanical Constraints (Dim 2) | 20% | CI blocks PR? Linter enforced? Types strict? | "Lint but don't block" |
| 2 | Testing & Verification (Dim 4) | 15% | Tests in CI? Coverage threshold? E2E exists? | "AI tests verifying AI code" |
| 3 | Architecture Docs (Dim 1) | 15% | AGENTS.md exists and concise? docs/ structured? | "Encyclopedia AGENTS.md" |
| 4 | Feedback & Observability (Dim 3) | 15% | Structured logging? Metrics? Agent-queryable? | "Ad-hoc print debugging" |
| 5 | Context Engineering (Dim 5) | 10% | Decisions in-repo? Docs fresh? Cache-friendly? | "Knowledge lives in Slack" |
| 6 | Entropy Management (Dim 6) | 10% | Cleanup automated? Tech debt tracked? | "Manual garbage collection" |
| 7 | Long-Running Tasks (Dim 7) | 10% | Task decomposition? Checkpoints? Handoff bridges? | "No crash recovery" |
| 8 | Safety Rails (Dim 8) | 5% | Least privilege? Rollback? Human gates? | "Trusting tool output" |

## Quick Reference — 8 Dimensions, 45 Items

*Use item IDs to cross-reference `references/checklist.md` for full PASS/PARTIAL/FAIL criteria.*
*Items marked `[Q]` are included in Quick Audit mode (15 vital-sign items).*

### Dim 1: Architecture Documentation (15%) — GOAL STATE
- `1.1` `[Q]` **agent-instruction-file** — AGENTS.md/CLAUDE.md exists and concise (<150 lines; PARTIAL up to 2×)
- `1.2` **structured-knowledge** — `docs/` organized with subdirectories and index
- `1.3` `[Q]` **architecture-docs** — ARCHITECTURE.md with domain boundaries and dependency rules
- `1.4` **progressive-disclosure** — Short entry point → deeper docs
- `1.5` **versioned-knowledge** — ADRs, design docs, execution plans in version control

### Dim 2: Mechanical Constraints (20%) — ACTUATOR
- `2.1` `[Q]` **ci-pipeline-blocks** — CI runs on every PR, blocks merges on failure
- `2.2` `[Q]` **linter-enforcement** — Linter in CI, violations block
- `2.3` **formatter-enforcement** — Formatter in CI, violations block
- `2.4` `[Q]` **type-safety** — Type checker in CI, strict mode
- `2.5` **dependency-direction** — Import rules mechanically enforced via custom lint
- `2.6` **remediation-errors** — Custom lint messages include fix instructions
- `2.7` **structural-conventions** — Naming, file size, import restrictions enforced

### Dim 3: Feedback & Observability (15%) — SENSOR
- `3.1` `[Q]` **structured-logging** — Logging framework, not ad-hoc prints
- `3.2` **metrics-tracing** — OpenTelemetry/Prometheus configured
- `3.3` **agent-queryable-obs** — Agents can query logs/metrics via CLI or API
- `3.4` **ui-visibility** — Browser automation for agent screenshot/inspect
- `3.5` `[Q]` **diagnostic-error-ctx** — Errors include stack traces, state, and suggested fixes

### Dim 4: Testing & Verification (15%) — SENSOR + ACTUATOR
- `4.1` `[Q]` **test-suite** — Tests across multiple layers (unit, integration, E2E)
- `4.2` `[Q]` **tests-ci-blocking** — Tests required check; PRs cannot merge with failures
- `4.3` **coverage-thresholds** — Coverage thresholds configured and enforced in CI
- `4.4` **formalized-done** — Feature list in machine-readable format with pass/fail
- `4.5` `[Q]` **e2e-verification** — E2E suite runs in CI
- `4.6` **flake-management** — Flaky tests tracked, quarantined, retried
- `4.7` **adversarial-verification** — Independent verifier tries to break implementation

### Dim 5: Context Engineering (10%) — GOAL STATE
- `5.1` `[Q]` **externalized-knowledge** — Key decisions documented in-repo
- `5.2` **doc-freshness** — Automated freshness checks
- `5.3` **machine-readable-refs** — llms.txt, curated reference docs
- `5.4` **tech-composability** — Stable, well-known technologies
- `5.5` **cache-friendly-design** — AGENTS.md <150 lines, PARTIAL up to 2× (monorepo: <300, PARTIAL up to 500); structured state files

### Dim 6: Entropy Management (10%) — FEEDBACK LOOP
-