Corporate AI Costs Begin to Outweigh Performance Gains

Two years ago, the conversation in technology departments centred on which model was most powerful. Today, according to TechCrunch, the real question has shifted: do I need the most expensive model for this specific use case, or will a cheaper one do the same job without users noticing the difference?

The logic is straightforward. If a company processes millions of API calls per month for text classification, internal summaries, or FAQ responses, the price difference per token between a high-end and mid-tier model can represent hundreds of thousands of euros annually. That starts to become a difficult argument to ignore in executive meetings.

Pressure comes from CFOs, not engineering teams

During 2024 and 2025, many companies deployed their first AI integrations by choosing the most capable model available. It was an understandable decision: they wanted a safety margin, they wanted to impress internal stakeholders, and they didn't want to risk quality becoming a problem. Performance was the almost exclusive criterion.

Now, with these integrations in production and real invoices on the table, finance teams are asking the questions that engineers didn't ask at the outset. What percentage of those calls actually require maximum reasoning capacity? How many are routine, well-defined tasks that a cheaper model would solve just as well?

The answer, in many cases, is that the distribution of tasks doesn't justify paying the premium price for one hundred percent of interactions.

Intelligent routing: the technical solution gaining traction

One approach that's been gaining ground is query routing: classifying each incoming request by its complexity and directing it to the most appropriate, and cheapest, model that can handle it reliably. Simple requests go to lightweight models; those requiring complex reasoning or extensive context go to the most capable models.

In the Claude ecosystem, this translates to decisions like using Claude Haiku 4.5 for structured extraction tasks or brief summaries, Claude Sonnet 4.6 for medium-complexity workflows, and reserving Claude Opus 4.8, with its optional 1M token context window, for cases that truly justify it: analysis of lengthy documents, complex chain-of-thought reasoning, or tasks demanding maximum accuracy.

This type of architecture isn't conceptually new, but its adoption in real enterprise environments has accelerated in recent months as engineering teams have accumulated enough production data to make informed decisions about which model tier each query type needs.

The real risk: over-optimising and losing quality where it matters

The obvious pitfall of this trend is over-optimisation. Cutting costs indiscriminately, without first analysing which tasks can tolerate slightly lower quality and which cannot, can result in user experience degradations that cost more in lost trust than what's saved in tokens.

Not all tasks are equal. An internal meeting summary has a different quality threshold than an AI-assisted legal contract or a technical support response that a customer receives directly. Conflating both cases for the sake of savings is the most common mistake we see when companies make this transition without methodology.

The TechCrunch article points precisely to this: the cost change is only positive if quality isn't compromised at the points the end user perceives. Savings on invisible workflows are legitimate; savings on critical touchpoints are a risk.

Who this trend affects

This conversation is especially relevant for medium to large companies that have had AI integrations in production for one to two years and are at the natural moment to review their architectures. It also matters for engineering teams that until now faced no pressure to justify model choice but are beginning to experience it.

For model providers, the dynamics are interesting: pressure on premium model pricing grows, while mid-tier and lightweight models become strategic pieces of their catalogue, not second options.

---

From our perspective, what seems most relevant about this moment isn't that companies discover cheap models exist, they've known that from the start, but that they finally have real production data to make informed decisions rather than guess blindly. That is, in essence, adoption maturity, and it typically leads to better architectures.

Corporate AI Costs Begin to Outweigh Performance Gains

Pressure comes from CFOs, not engineering teams

Intelligent routing: the technical solution gaining traction

The real risk: over-optimising and losing quality where it matters

Who this trend affects

Sources

Read next

Brain waves: the next data source physical AI is chasing

Moonshot AI's Kimi and Silicon Valley's new bout of nerves

Snap ships an MCP server for its 950 million users