Skip to main content
ClaudeWave
Skill3.7k estrellas del repoactualizado today

n-agentic-harnesses

n-agentic-harnesses addresses the architectural layer where AI agents typically fail: tool permissions, workflow state, context management, approval controls, and operator visibility. Use this skill when designing or evaluating agentic systems, diagnosing unexplained tool calls or session failures, or building durability into copilots and AI-powered products that require clear boundaries and operational safety.

Instalar en Claude Code
Copiar
git clone --depth 1 https://github.com/NateBJones-Projects/OB1 /tmp/n-agentic-harnesses && cp -r /tmp/n-agentic-harnesses/skills/n-agentic-harnesses ~/.claude/skills/n-agentic-harnesses
Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

SKILL.md

# N Agentic Harnesses

## Problem

Most AI products do not break because the model is too weak. They break at the harness layer: unclear tool boundaries, missing approval policy, brittle state, sloppy context assembly, no evaluation loop, and weak operator visibility. This skill turns those vague issues into concrete primitives, boundaries, phases, and checks.

## Trigger Conditions

- The user is designing or rebuilding an agent, assistant, copilot, or AI workflow
- The request mentions harness architecture, tool-use architecture, tool registries, permission layers, approval gates, workflow state, session persistence, retries, resumability, memory, evals, observability, or multi-agent design
- The user wants to evaluate an existing harness for risks, missing primitives, UX gaps, or operational weakness
- The symptoms point to harness problems even if the word "harness" never appears:
  - tools fire without clear permission
  - sessions fail on crash or long waits
  - context gets stale or bloated
  - operators cannot see what happened or why
  - costs, retries, or handoffs are drifting out of control

## Default Posture

- Bias toward lean, solo-maintainable architecture.
- Start with a single-agent design unless clear constraints justify more.
- Require an evaluation plan even for greenfield builds.
- Prefer explicit system boundaries, permission policy, and workflow state over prompt cleverness.
- Translate ideas into implementation phases, success criteria, and failure tests.

## Step 0: Gather Context

Before routing, make sure you have enough to work with.

For design work, confirm:

- what product or system the harness serves
- what actions the agent will take
- who the users are
- any known constraints such as solo maintenance, existing stack, or timeline

For evaluation work, inspect the harness itself:

- read the codebase, agent config, skills, hooks, or architecture docs
- if evidence is missing, ask for the narrowest missing input and keep moving
- do not evaluate from vibes alone

## Step 1: Classify The Request

Choose one mode before reading reference files.

### `design`

Use when the user is creating a new harness, planning a major rebuild, or asking for architecture, MVP shape, or implementation sequencing.

Default reads:

- `references/01-principles-and-solo-dev-defaults.md`
- `references/02-harness-shapes-and-architecture.md`
- `references/08-design-and-build-playbook.md`

### `evaluation`

Use when the user already has a harness and wants gaps, risks, missing primitives, UX upgrades, or architectural cleanup.

Default reads:

- `references/01-principles-and-solo-dev-defaults.md`
- `references/09-evaluation-and-improvement-playbook.md`

### `design + evaluation`

Use when the user wants a target architecture and a way to verify it, compare it with an existing system, or define acceptance criteria before building.

Default reads:

- `references/01-principles-and-solo-dev-defaults.md`
- `references/02-harness-shapes-and-architecture.md`
- `references/08-design-and-build-playbook.md`
- `references/09-evaluation-and-improvement-playbook.md`

## Step 2: Classify The Product Shape

Determine the closest product shape before going deeper:

- code agent
- chat assistant
- workflow orchestrator
- internal copilot
- embedded AI product feature
- hybrid system

If the request is ambiguous, pick the closest shape and state the assumption.

## Step 3: Read The Smallest Useful Reference Set

Read only the files the request actually needs:

- `references/01-principles-and-solo-dev-defaults.md`
  Use first for almost every request. It defines the default decision posture.
- `references/02-harness-shapes-and-architecture.md`
  Read when choosing system shape, boundaries, lifecycle, transports, or deployment structure.
- `references/03-tools-execution-and-permissions.md`
  Read when the request involves tool registries, tool calling, approval gates, sandboxes, or trust tiers.
- `references/04-state-sessions-and-durability.md`
  Read when the request involves sessions, resumability, retries, idempotency, approval waits, or long-running work.
- `references/05-context-memory-and-evaluation.md`
  Read when the request involves context windows, retrieval, memory, provenance, evals, replay tests, or regression detection.
- `references/06-agents-and-extensibility.md`
  Read when the request involves multi-agent design, plugins, hooks, skills, or extension surfaces.
- `references/07-ux-observability-and-operations.md`
  Read when the request involves streaming UX, health checks, logs, analytics, budgets, or supportability.
- `references/08-design-and-build-playbook.md`
  Read when the user needs a build-ready plan from idea to implementation.
- `references/09-evaluation-and-improvement-playbook.md`
  Read when the user needs findings, missing primitives, upgrade priorities, or acceptance tests.
- `references/10-example-requests-and-output-patterns.md`
  Read when you need prompt examples or response structure examples.
- `references/11-codex-translation-notes.md`
  Read only when adapting the shared skill into a Codex-oriented variant or mapping between client environments.

Do not rely on reference-to-reference chains. This file is the index.

## Operating Rules

- Convert vague ambitions into concrete harness primitives.
- Push back on unnecessary complexity.
- Treat workflow state, permissions, context assembly, and evaluation as first-class architecture, not cleanup tasks.
- Separate universal harness primitives from product-specific manifestation.
- For evaluation requests, present findings first and improvement sequence second.
- For design requests, include how the design will be tested before calling it done.

## Output Contract

### For `design`

Return:

- recommended harness shape
- core primitives and subsystem boundaries
- MVP boundary
- phased implementation plan
- verification and acceptance criteria

### For `evaluation`

Return:

- findings ordered by severity or leverage
-