Agent Harness Engineering: structuring agents that won't break

Addy Osmani, a Google engineer known for his work on Chrome and web performance, has spent weeks publishing reflections on the infrastructure layer surrounding AI agents. His latest Twitter thread — picked up on Hacker News on May 10th — puts a name to something many teams already do ad hoc: Agent Harness Engineering, the art of building the harness that keeps an agent within reasonable operational boundaries.

It's not a new framework or a library to install. It's a design discipline: deciding which tools the agent exposes, how it validates its own outputs, what happens when a subagent fails, and who has the final say before something irreversible occurs. The idea is simple on paper and notoriously difficult in production.

What Osmani means by "harness"

The term comes from the testing world: a test harness is the set of utilities that wrap the code under test to control and observe it. Osmani brings the metaphor to agents: the harness is everything that isn't the model itself—the hooks, the input and output gatekeepers, the structured logs, the timeouts, the fallbacks—and it determines whether the agent behaves predictably in a real environment.

In the Claude ecosystem this has a direct translation. Claude Code already offers several natural anchor points for building that scaffolding:

Hooks (`PreToolUse`, `PostToolUse`, `Stop`): shell commands that run on lifecycle events. A `PreToolUse` hook can, for example, intercept any call to a disk-writing tool and log it before it happens, or cancel it if it doesn't comply with a policy.
MCP servers: by exposing only the MCP servers strictly necessary for a task, you reduce the agent's attack surface. An agent with access to forty tools is harder to control than one with five.
Scoped subagents: delegating to specialized subagents allows you to apply different policies at each layer. The search subagent only searches; the writing subagent only writes; the orchestrator doesn't touch the filesystem directly.
Skills as contracts: a well-defined skill isn't just reusable context, it's also an implicit specification of what the agent should and shouldn't do within that domain.

Why it matters now

The proliferation of autonomous agents has brought forward a problem that was theoretical two years ago: agents fail in ways that models alone don't. A model that answers a question badly is easy to fix; an agent that chains ten tool calls and ends up in an inconsistent state is much harder to debug and, depending on the context, can have real consequences.

Teams working with Claude Opus 4.7 and its 1M token window now have more capacity to fit context into a single agent, which can tempt you to build monoliths instead of bounded pipelines. Osmani's implicit warning is that more context doesn't solve the control problem: an agent with more memory still needs a harness that defines what it can do with that memory.

Who this matters for

This discussion primarily interests three profiles:

1. Engineers who already have agents in production and have seen cascading failures or unexpected behaviors they didn't know how to label or prevent.
2. Teams migrating workflows to Claude Code and need to decide how much autonomy to give the agent and where to put guardrails.
3. Developers of plugins and MCP servers who design tools thinking they'll be invoked by agents, not humans, and must anticipate malformed inputs or aberrant call sequences.

Osmani's thread has little engagement on Hacker News so far—three points and no comments at the time of indexing—but the discussion it raises is the kind that tends to grow with delay, when teams hit the problem in production and look for vocabulary to describe it.

---

At ClaudeWave we've spent months watching teams with solid agent architectures and teams with complete chaos use exactly the same tools. The difference almost always comes down to whether someone thought through the harness before letting the agent run. Having someone with Osmani's visibility put a formal name to it is useful, even though the discipline is still far from having consolidated standards.

Agent Harness Engineering: structuring agents that won't break

What Osmani means by "harness"

Why it matters now

Who this matters for

Sources

Read next

Siftly Wants to Train Human Judgment in AI-Assisted Code Review

Cyber.md: security documentation designed for AI agents

Design.md Generator: A skill to codify design taste