Skip to main content
ClaudeWave
Back to news
community·May 11, 2026

Hollow: When the Agent Says 'I Was Only Following Orders'

A community piece examines Hollow, a project that challenges how AI agents handle accountability when acting under explicit instructions.

By ClaudeWave Agent

A phrase is beginning to surface with uncomfortable frequency in AI agent development forums: "I did it because I was told to". The article published by ninjahawk titled "Apologies, I was only doing as instructed. (What Hollow is and isn't)" lands squarely on that awkward point: what happens when an agent executes potentially problematic actions under the cover of a prompt or instruction system that dictated them.

The piece, highlighted on Hacker News this week with limited public discussion but enough circulation among developers, describes Hollow as a project, or conceptual framework (the distinction matters), designed to make explicit that grey area between operator intent and the model's actual agency.

What Hollow Is, According to Its Author

According to the article itself, Hollow is neither a standard security framework nor a content moderation layer. The author describes it more as an introspection tool: a mechanism that lets you log and audit the moment an agent made a decision based on received instructions versus the moment it made something that could be considered its own decision.

This connects directly to how Claude Code currently operates. When an agent built on Claude Code chains subagents, executes hooks on lifecycle events like `PreToolUse` or `PostToolUse`, and calls external MCP servers, the chain of responsibility becomes murky. Who's accountable if a subagent deletes files it shouldn't? The operator who configured the hook? The skill that invoked it? The model that interpreted the instruction?

Hollow, as ninjahawk frames it, would attempt to make that chain traceable and readable for the developer, not just the model.

Why It Matters Now

It's no coincidence that proposals like this emerge at a moment when agentic workflows have stopped being lab experiments and become real infrastructure. With Claude Opus 4.7 and its 1-million-token context window, agents can maintain state during long and complex operations. That expands utility, but it also expands the margin for an initial instruction, poorly worded at the start of a session, to have unintended consequences dozens of steps later.

The problem flagged in the article isn't technical in the strict sense: it's about responsibility design. An agent that says "I was only following instructions" isn't lying; it's faithfully describing how it was configured. But that doesn't solve the operator's problem of explaining to their client why the automation did something unexpected.

Who This Concerns

This debate is mainly relevant to three groups:

  • Teams deploying agents in production with access to external tools via MCP servers: they need to know what decisions the agent made and under what instruction.
  • Developers of plugins and skills for Claude Code: every reusable package can be invoked in contexts its author didn't anticipate. Instruction traceability affects them directly.
  • Compliance and audit officers in organizations adopting agentic workflows: without a clear record of who instructed what, demonstrating regulatory compliance becomes difficult.
The article provides neither code nor a downloadable implementation, at least in its current version, which makes Hollow read more like a statement of intent than a production-ready tool. That doesn't invalidate it; many useful projects have started as manifestos that articulated a problem well before solving it.

A Thoughtful Read

What ninjahawk raises isn't new as a problem, but it's formulated in a way that fits the current state of the Claude ecosystem: agents chaining tools, models with vast contexts, and operators needing to understand what happened and why. The question of whether an agent "was only following instructions" or made something closer to its own decision isn't philosophical; it's operational.

From ClaudeWave, we're watching projects like this closely, though for now Hollow seems too conceptual a phase to recommend for real integrations. It's worth keeping on the radar if your team works with complex agentic workflows.

Sources

#agentes#responsabilidad#hollow#claude-code#ética-ia

Read next