n-agentic-harnesses
n-agentic-harnesses addresses the architectural layer where AI agents typically fail: tool permissions, workflow state, context management, approval controls, and operator visibility. Use this skill when designing or evaluating agentic systems, diagnosing unexplained tool calls or session failures, or building durability into copilots and AI-powered products that require clear boundaries and operational safety.
git clone --depth 1 https://github.com/NateBJones-Projects/OB1 /tmp/n-agentic-harnesses && cp -r /tmp/n-agentic-harnesses/skills/n-agentic-harnesses ~/.claude/skills/n-agentic-harnessesSKILL.md
# N Agentic Harnesses ## Problem Most AI products do not break because the model is too weak. They break at the harness layer: unclear tool boundaries, missing approval policy, brittle state, sloppy context assembly, no evaluation loop, and weak operator visibility. This skill turns those vague issues into concrete primitives, boundaries, phases, and checks. ## Trigger Conditions - The user is designing or rebuilding an agent, assistant, copilot, or AI workflow - The request mentions harness architecture, tool-use architecture, tool registries, permission layers, approval gates, workflow state, session persistence, retries, resumability, memory, evals, observability, or multi-agent design - The user wants to evaluate an existing harness for risks, missing primitives, UX gaps, or operational weakness - The symptoms point to harness problems even if the word "harness" never appears: - tools fire without clear permission - sessions fail on crash or long waits - context gets stale or bloated - operators cannot see what happened or why - costs, retries, or handoffs are drifting out of control ## Default Posture - Bias toward lean, solo-maintainable architecture. - Start with a single-agent design unless clear constraints justify more. - Require an evaluation plan even for greenfield builds. - Prefer explicit system boundaries, permission policy, and workflow state over prompt cleverness. - Translate ideas into implementation phases, success criteria, and failure tests. ## Step 0: Gather Context Before routing, make sure you have enough to work with. For design work, confirm: - what product or system the harness serves - what actions the agent will take - who the users are - any known constraints such as solo maintenance, existing stack, or timeline For evaluation work, inspect the harness itself: - read the codebase, agent config, skills, hooks, or architecture docs - if evidence is missing, ask for the narrowest missing input and keep moving - do not evaluate from vibes alone ## Step 1: Classify The Request Choose one mode before reading reference files. ### `design` Use when the user is creating a new harness, planning a major rebuild, or asking for architecture, MVP shape, or implementation sequencing. Default reads: - `references/01-principles-and-solo-dev-defaults.md` - `references/02-harness-shapes-and-architecture.md` - `references/08-design-and-build-playbook.md` ### `evaluation` Use when the user already has a harness and wants gaps, risks, missing primitives, UX upgrades, or architectural cleanup. Default reads: - `references/01-principles-and-solo-dev-defaults.md` - `references/09-evaluation-and-improvement-playbook.md` ### `design + evaluation` Use when the user wants a target architecture and a way to verify it, compare it with an existing system, or define acceptance criteria before building. Default reads: - `references/01-principles-and-solo-dev-defaults.md` - `references/02-harness-shapes-and-architecture.md` - `references/08-design-and-build-playbook.md` - `references/09-evaluation-and-improvement-playbook.md` ## Step 2: Classify The Product Shape Determine the closest product shape before going deeper: - code agent - chat assistant - workflow orchestrator - internal copilot - embedded AI product feature - hybrid system If the request is ambiguous, pick the closest shape and state the assumption. ## Step 3: Read The Smallest Useful Reference Set Read only the files the request actually needs: - `references/01-principles-and-solo-dev-defaults.md` Use first for almost every request. It defines the default decision posture. - `references/02-harness-shapes-and-architecture.md` Read when choosing system shape, boundaries, lifecycle, transports, or deployment structure. - `references/03-tools-execution-and-permissions.md` Read when the request involves tool registries, tool calling, approval gates, sandboxes, or trust tiers. - `references/04-state-sessions-and-durability.md` Read when the request involves sessions, resumability, retries, idempotency, approval waits, or long-running work. - `references/05-context-memory-and-evaluation.md` Read when the request involves context windows, retrieval, memory, provenance, evals, replay tests, or regression detection. - `references/06-agents-and-extensibility.md` Read when the request involves multi-agent design, plugins, hooks, skills, or extension surfaces. - `references/07-ux-observability-and-operations.md` Read when the request involves streaming UX, health checks, logs, analytics, budgets, or supportability. - `references/08-design-and-build-playbook.md` Read when the user needs a build-ready plan from idea to implementation. - `references/09-evaluation-and-improvement-playbook.md` Read when the user needs findings, missing primitives, upgrade priorities, or acceptance tests. - `references/10-example-requests-and-output-patterns.md` Read when you need prompt examples or response structure examples. - `references/11-codex-translation-notes.md` Read only when adapting the shared skill into a Codex-oriented variant or mapping between client environments. Do not rely on reference-to-reference chains. This file is the index. ## Operating Rules - Convert vague ambitions into concrete harness primitives. - Push back on unnecessary complexity. - Treat workflow state, permissions, context assembly, and evaluation as first-class architecture, not cleanup tasks. - Separate universal harness primitives from product-specific manifestation. - For evaluation requests, present findings first and improvement sequence second. - For design requests, include how the design will be tested before calling it done. ## Output Contract ### For `design` Return: - recommended harness shape - core primitives and subsystem boundaries - MVP boundary - phased implementation plan - verification and acceptance criteria ### For `evaluation` Return: - findings ordered by severity or leverage -
Use Nate Jones OB1 Agent Memory from OpenClaw with provenance, scope, review, and use-policy discipline.
Continuous learning system that extracts reusable knowledge from work sessions. Triggers: (1) /aiception command, (2) 'save this as a skill' or 'extract a skill from this', (3) 'what did we learn?', (4) after non-obvious debugging or trial-and-error discovery. Creates new skills when valuable reusable knowledge is identified. Integrates with Open Brain to prevent duplicates.
Morning digest of yesterday's Open Brain thoughts, drafted to Gmail
Generate infographic images from any research doc, Open Brain thoughts, or analysis. Auto-chunks content, writes prompts, generates images via Gemini API (free tier), and saves to media/. Use --premium for better text rendering.
|
Use when processing voice transcripts, brain dumps, stream-of-consciousness notes, or any raw multi-topic capture. Extracts every idea thread, then evaluates each one with deep brainstorming, then captures results to Open Brain. Trigger on transcripts, exports, "process this", "pan for gold", "brain dump", "what did I say", or multi-topic markdown files.
|