Claude Opus 5 Refuses Basic Biology Questions

Anthropic launched Claude Opus 5 on June 10, presenting it as the most capable model it has released to the general public to date. Among the capabilities highlighted in its official announcement was, explicitly, advanced scientific reasoning, with biology as one of the example fields. Days later, The Verge documented that the model refuses to answer biology questions at the level one would expect from a high school student.

These are not questions about pathogen synthesis or controversial genetic modification. They are the type of queries any secondary school textbook resolves in two paragraphs. When a user poses such a question to Opus 5, the model does not respond: it silently routes the query to Claude Opus 4.8, the previous flagship, which answers it without issue.

What is actually happening

The described behaviour points to a content filter applied too broadly in Opus 5. The model appears to have inherited, or newly received, restrictions designed to block sensitive information in life sciences, but the activation threshold is calibrated so low that it triggers false positives on completely ordinary biological terminology.

The most striking detail is the fallback mechanism: rather than answering or informing the user that it cannot, Opus 5 silently transfers the query to Opus 4.8. This has practical consequences for anyone paying to access the latest model expecting its supposedly improved scientific capabilities, and it also raises an uncomfortable question about transparency: does the user know when they have switched to talking with a different model than the one they contracted?

Why this matters beyond the anecdote

This type of incident is not new in the industry, but it is particularly striking in this case given the launch context. Anthropic did not cite biology as a minor example: it highlighted it as one of the areas where Opus 5 represents a qualitative leap. That the model blocks elementary questions in that same field suggests a notable disconnect between the internal evaluation process and the model's actual behaviour in production.

For teams integrating Claude into educational, research, or scientific technical support applications, this behaviour is directly problematic. An assistant that routes queries without warning introduces a layer of opacity difficult to manage: pipelines that depend on Opus 5's capabilities may be receiving answers from a model with a different performance profile without any explicit signal of it.

It also affects those using the API directly and who have parametrized their calls to target Opus 5. If the fallback to Opus 4.8 occurs at the product level and is not documented in the API, the developer loses control over which model is executing their logic at any given moment.

The tension between safety and utility

The underlying problem is not that Anthropic applies restrictions in life sciences. It is reasonable to do so, and the risk scenario in synthetic biology is real. The problem is the granularity of these filters.

A model that blocks toxin synthesis and one that blocks explanation of the Krebs cycle are making safety decisions at completely different levels. Confusing the two protects no one; it simply degrades product utility for the legitimate user while probably doing nothing to discourage anyone with genuine intentions to obtain dangerous information, who will find other channels.

Anthropoid has historically had more careful communication than other industry actors about these trade-offs. That is why this case stands out: the misalignment between the launch narrative and observed behaviour is hard to ignore.

---

Editorial view: A model presented with advanced scientific capabilities unable to resolve high school biology questions is not a minor calibration problem; it is a failure of coherence between product and communication. We expect Anthropic to document the fallback behaviour and adjust the thresholds before this becomes an entrenched characteristic.

Claude Opus 5 Refuses Basic Biology Questions

What is actually happening

Why this matters beyond the anecdote

The tension between safety and utility

Sources

Read next

MCP gets ready for large scale agent deployment

Claude Code now runs on the Rust port of Bun and almost nobody noticed

Claude Code's creator says token burn is the wrong way to measure AI success