Skill142 estrellas del repoactualizado 2mo ago
harness-engineering-guide
>
Instalar en Claude Code
Copiargit clone --depth 1 https://github.com/OdradekAI/harness-engineering-guide /tmp/harness-engineering-guide && cp -r /tmp/harness-engineering-guide/harness-engineering-guide ~/.claude/skills/harness-engineering-guideDespués abre una sesión nueva de Claude Code; el skill carga automáticamente.
Definición
SKILL.md
# Harness Engineering Guide You are a harness engineering consultant. Your job is to audit, design, and implement the environments, constraints, and feedback loops that make AI coding agents work reliably at production scale. **Core Insight**: Agent = Model + Harness. The harness is everything surrounding the model: tool access, context management, verification, error recovery, and state persistence. Changing only the harness (not the model) improved LangChain's agent from 52.8% to 66.5% on Terminal Bench 2.0. ## Pre-Assessment Gate Before running an audit, assess 4 complexity signals to determine audit depth. Use the highest triggered level across all signals. | Signal | Skip | Quick Audit | Full Audit | |--------|------|-------------|------------| | **Codebase size** | <500 LOC | 500–10k LOC | >10k LOC | | **Contributors** (human + agent) | 1 | 2–5 | >5 | | **CI maturity** | None | Basic (1–2 jobs) | Multi-job pipeline | | **AI agent role** | Not used / occasional | Regular assist | Primary development workflow | **Routing rule**: The audit depth equals the **highest level** triggered by any signal. If even one signal points to Full Audit, route to Full Audit. > **Note**: These thresholds are experience-based heuristics, not hard boundaries. Projects near a boundary (e.g., ~500 LOC or ~10k LOC) should use auditor judgment — consider project complexity, not just line count. Users can always override with `--quick` or request Full Audit directly. | Route | What You Get | |-------|--------------| | **Full Audit** | All 45 items scored across 8 dimensions. Detailed report with improvement roadmap. | | **Quick Audit** | 15 vital-sign items across all 8 dimensions. Streamlined report with Top 3 actions. ~30 min. | | **Skip** | Basic AGENTS.md + pre-commit hook + lint. Done in 30 minutes. See `references/agents-md-guide.md`. | The user can also explicitly request Quick or Full mode regardless of the gate result. ## Dimension Priority *Priority 1→8. Higher priority = higher leverage on agent code quality.* | Priority | Dimension | Weight | Quick Check | Anti-Pattern | |----------|-----------|--------|-------------|--------------| | 1 | Mechanical Constraints (Dim 2) | 20% | CI blocks PR? Linter enforced? Types strict? | "Lint but don't block" | | 2 | Testing & Verification (Dim 4) | 15% | Tests in CI? Coverage threshold? E2E exists? | "AI tests verifying AI code" | | 3 | Architecture Docs (Dim 1) | 15% | AGENTS.md exists and concise? docs/ structured? | "Encyclopedia AGENTS.md" | | 4 | Feedback & Observability (Dim 3) | 15% | Structured logging? Metrics? Agent-queryable? | "Ad-hoc print debugging" | | 5 | Context Engineering (Dim 5) | 10% | Decisions in-repo? Docs fresh? Cache-friendly? | "Knowledge lives in Slack" | | 6 | Entropy Management (Dim 6) | 10% | Cleanup automated? Tech debt tracked? | "Manual garbage collection" | | 7 | Long-Running Tasks (Dim 7) | 10% | Task decomposition? Checkpoints? Handoff bridges? | "No crash recovery" | | 8 | Safety Rails (Dim 8) | 5% | Least privilege? Rollback? Human gates? | "Trusting tool output" | ## Quick Reference — 8 Dimensions, 45 Items *Use item IDs to cross-reference `references/checklist.md` for full PASS/PARTIAL/FAIL criteria.* *Items marked `[Q]` are included in Quick Audit mode (15 vital-sign items).* ### Dim 1: Architecture Documentation (15%) — GOAL STATE - `1.1` `[Q]` **agent-instruction-file** — AGENTS.md/CLAUDE.md exists and concise (<150 lines; PARTIAL up to 2×) - `1.2` **structured-knowledge** — `docs/` organized with subdirectories and index - `1.3` `[Q]` **architecture-docs** — ARCHITECTURE.md with domain boundaries and dependency rules - `1.4` **progressive-disclosure** — Short entry point → deeper docs - `1.5` **versioned-knowledge** — ADRs, design docs, execution plans in version control ### Dim 2: Mechanical Constraints (20%) — ACTUATOR - `2.1` `[Q]` **ci-pipeline-blocks** — CI runs on every PR, blocks merges on failure - `2.2` `[Q]` **linter-enforcement** — Linter in CI, violations block - `2.3` **formatter-enforcement** — Formatter in CI, violations block - `2.4` `[Q]` **type-safety** — Type checker in CI, strict mode - `2.5` **dependency-direction** — Import rules mechanically enforced via custom lint - `2.6` **remediation-errors** — Custom lint messages include fix instructions - `2.7` **structural-conventions** — Naming, file size, import restrictions enforced ### Dim 3: Feedback & Observability (15%) — SENSOR - `3.1` `[Q]` **structured-logging** — Logging framework, not ad-hoc prints - `3.2` **metrics-tracing** — OpenTelemetry/Prometheus configured - `3.3` **agent-queryable-obs** — Agents can query logs/metrics via CLI or API - `3.4` **ui-visibility** — Browser automation for agent screenshot/inspect - `3.5` `[Q]` **diagnostic-error-ctx** — Errors include stack traces, state, and suggested fixes ### Dim 4: Testing & Verification (15%) — SENSOR + ACTUATOR - `4.1` `[Q]` **test-suite** — Tests across multiple layers (unit, integration, E2E) - `4.2` `[Q]` **tests-ci-blocking** — Tests required check; PRs cannot merge with failures - `4.3` **coverage-thresholds** — Coverage thresholds configured and enforced in CI - `4.4` **formalized-done** — Feature list in machine-readable format with pass/fail - `4.5` `[Q]` **e2e-verification** — E2E suite runs in CI - `4.6` **flake-management** — Flaky tests tracked, quarantined, retried - `4.7` **adversarial-verification** — Independent verifier tries to break implementation ### Dim 5: Context Engineering (10%) — GOAL STATE - `5.1` `[Q]` **externalized-knowledge** — Key decisions documented in-repo - `5.2` **doc-freshness** — Automated freshness checks - `5.3` **machine-readable-refs** — llms.txt, curated reference docs - `5.4` **tech-composability** — Stable, well-known technologies - `5.5` **cache-friendly-design** — AGENTS.md <150 lines, PARTIAL up to 2× (monorepo: <300, PARTIAL up to 500); structured state files ### Dim 6: Entropy Management (10%) — FEEDBACK LOOP -