n-agentic-harnesses-codex
This skill provides a structured framework for designing, evaluating, and refining agentic systems across developer tools, assistants, workflow runtimes, and AI-powered products. Use it when architecting tool-use systems, reviewing agent permissions and state management, planning implementation phases, or assessing gaps in existing harnesses for observability, durability, or user experience. The skill emphasizes lean solo-maintainable designs, explicit system boundaries, and evaluation strategies grounded in concrete success criteria rather than prompt optimization.
git clone --depth 1 https://github.com/NateBJones-Projects/OB1 /tmp/n-agentic-harnesses-codex && cp -r /tmp/n-agentic-harnesses-codex/skills/n-agentic-harnesses/variants/codex ~/.claude/skills/n-agentic-harnesses-codexSKILL.md
# N Agentic Harnesses For Codex Use this skill as a router for designing, building, and evaluating agentic harnesses. Read only the files you need. Do not load the entire reference set unless the request genuinely spans multiple subsystems. Default posture: - Bias toward lean, solo-maintainable architecture. - Start with a single-agent design unless clear constraints justify more. - Require an evaluation plan even for greenfield builds. - Prefer explicit system boundaries, permission policy, and workflow state over prompt cleverness. - Translate ideas into implementation phases, success criteria, and failure tests. ## Step 1: Classify The Request Choose one mode before reading reference files. ### `design` Use when the user is creating a new harness, planning a major rebuild, or asking for architecture, MVP, or implementation sequencing. Default reads: - `references/01-principles-and-solo-dev-defaults.md` - `references/02-harness-shapes-and-architecture.md` - `references/08-design-and-build-playbook.md` Add subsystem files only as needed. ### `evaluation` Use when the user already has a harness and wants gaps, risks, missing primitives, UX upgrades, or architectural cleanup. Default reads: - `references/01-principles-and-solo-dev-defaults.md` - `references/09-evaluation-and-improvement-playbook.md` Add subsystem files only for the parts under review. ### `design + evaluation` Use when the user wants a target architecture and a way to verify it, compare it with an existing system, or define acceptance criteria before building. Default reads: - `references/01-principles-and-solo-dev-defaults.md` - `references/02-harness-shapes-and-architecture.md` - `references/08-design-and-build-playbook.md` - `references/09-evaluation-and-improvement-playbook.md` ## Step 2: Classify The Product Shape Determine the closest product shape before going deeper: - code agent - chat assistant - workflow orchestrator - internal copilot - embedded AI product feature - hybrid system If the request is ambiguous, pick the closest shape and state the assumption. ## Step 3: Read The Smallest Useful Reference Set Read these only when the request needs them: - `references/01-principles-and-solo-dev-defaults.md` Use first for almost every request. It defines the default decision posture. - `references/02-harness-shapes-and-architecture.md` Read when choosing system shape, boundaries, lifecycle, transports, or deployment structure. - `references/03-tools-execution-and-permissions.md` Read when the request involves tool registries, tool calling, approval gates, sandboxes, or trust tiers. - `references/04-state-sessions-and-durability.md` Read when the request involves sessions, resumability, retries, idempotency, approval waits, or long-running work. - `references/05-context-memory-and-evaluation.md` Read when the request involves context windows, retrieval, memory, provenance, evals, replay tests, or regression detection. - `references/06-agents-and-extensibility.md` Read when the request involves multi-agent design, plugins, hooks, skills, or extension surfaces. - `references/07-ux-observability-and-operations.md` Read when the request involves streaming UX, health checks, logs, analytics, budgets, or supportability. - `references/08-design-and-build-playbook.md` Read when the user needs a build-ready plan from idea to implementation. - `references/09-evaluation-and-improvement-playbook.md` Read when the user needs findings, missing primitives, upgrade priorities, or acceptance tests. - `references/10-example-requests-and-output-patterns.md` Read when you need prompt examples or response structure examples. - `references/11-codex-translation-notes.md` Read only when adapting this Anthropic-style skill into a Codex-oriented version or when mapping concepts between the two environments. Do not rely on reference-to-reference chains. This file is the index. ## Operating Rules - Convert vague ambitions into concrete harness primitives. - Push back on unnecessary complexity. - Treat workflow state, permissions, context assembly, and evaluation as first-class architecture, not cleanup tasks. - Separate universal harness primitives from product-specific manifestation. - For evaluation requests, present findings first and improvement sequence second. - For design requests, include how the design will be tested before calling it done. ## Output Contract ### For `design` Return: - recommended harness shape - core primitives and subsystem boundaries - MVP boundary - phased implementation plan - verification and acceptance criteria ### For `evaluation` Return: - findings ordered by severity or leverage - missing or weak primitives - user experience and operational gaps - prioritized upgrade path - tests or checks that confirm the fixes ### For `design + evaluation` Return: - target architecture - comparison against current or likely failure modes - implementation phases - acceptance criteria - evaluation plan covering regressions, safety, and UX ## Final Check Before Responding - Did you keep the design lean enough for a solo developer unless the request clearly demanded more? - Did you avoid recommending multi-agent coordination by default? - Did you include evaluation, not just construction? - Did you give the user an operational path forward instead of abstract theory?
Use Nate Jones OB1 Agent Memory from OpenClaw with provenance, scope, review, and use-policy discipline.
Continuous learning system that extracts reusable knowledge from work sessions. Triggers: (1) /aiception command, (2) 'save this as a skill' or 'extract a skill from this', (3) 'what did we learn?', (4) after non-obvious debugging or trial-and-error discovery. Creates new skills when valuable reusable knowledge is identified. Integrates with Open Brain to prevent duplicates.
Morning digest of yesterday's Open Brain thoughts, drafted to Gmail
Generate infographic images from any research doc, Open Brain thoughts, or analysis. Auto-chunks content, writes prompts, generates images via Gemini API (free tier), and saves to media/. Use --premium for better text rendering.
|
Use when processing voice transcripts, brain dumps, stream-of-consciousness notes, or any raw multi-topic capture. Extracts every idea thread, then evaluates each one with deep brainstorming, then captures results to Open Brain. Trigger on transcripts, exports, "process this", "pan for gold", "brain dump", "what did I say", or multi-topic markdown files.
|