What CFOs Cannot Measure: The Real Use of AI in Their Companies

Companies have been signing contracts with Anthropic, OpenAI and equivalent infrastructure providers for two years now. The invoices arrive on time. What doesn't arrive is the answer to a basic question: who is using what, for what purpose, and with what result? According to this week's WSJ, chief financial officers are struggling increasingly to track the actual use of AI within their organizations, turning AI spending into an opaque line item on balance sheets.

The problem is significant. When the cost of a tool is visible but its impact is not, the next budgeting decision—whether to cut or scale—becomes a shot in the dark. And in an environment where engineering, legal, marketing and operations teams may be using models in completely different ways, without central coordination, the picture the CFO has is usually fragmented.

Why measuring this is so difficult

Enterprise AI usage is not concentrated at a single access point. One team may use Claude through the API directly, another through Claude Code with custom MCP servers, a third through integrations embedded in third-party SaaS tools. Each channel generates different data, in different formats, and many of these don't reach the finance department in consolidated form.

Adding to this is the nature of consumption itself: models are billed by tokens, not by hours or active users. An analyst generating a comprehensive report might consume more tokens in one session than an entire team in a week of light use. The correlation between usage volume and value generated is not linear, and traditional financial dashboards are not designed to capture that granularity.

Another factor is the proliferation of autonomous agents and subagents. When a task is delegated to a subagent that in turn calls multiple MCP servers and generates multiple model calls in the background, the real cost of that operation is difficult to attribute to a specific department, project or person. The observability of agentic workflows remains an open problem even for technical teams, let alone for those managing budgets.

What companies that do have visibility are doing

Organizations that have achieved some visibility into their AI usage share a common pattern: they centralized access configuration before scaling deployment. In practice, this means all model calls pass through an intermediate layer—a proxy, API gateway or management platform—that logs metadata: department, project, user, tokens consumed, model used.

This approach has an initial implementation cost, but it allows building the metrics CFOs need: cost per department, cost per use case, consumption trends over time, and in the most mature cases, correlation with business KPIs. Without that instrumentation layer, retrospective analysis is nearly impossible.

Some teams have begun using hooks in Claude Code to log events from the session lifecycle—PreToolUse, PostToolUse, Stop—and dump that information into internal observability systems. It's not a universal solution, but it's an example of how technical teams are responding to demand coming from finance.

Who this matters for

This tension primarily affects mid-sized and large companies that have already moved beyond the pilot phase and are consolidating AI spending as a stable budget line. For engineering teams building on Claude—whether via API, Claude Code or MCP integrations—the practical message is clear: instrumentation is not optional if you want to preserve your budget in the next planning cycle.

For CFOs, the more uncomfortable conclusion is that the metric they need won't be generated automatically by any AI tool. It requires architectural decisions that must be made before usage becomes too dispersed to audit.

---

From ElephantPink we have seen this pattern repeat across several integration projects: the conversation about metrics arrives late, when the internal ecosystem is already fragmented. It's not a technically difficult problem to solve, but it does require the will to address it before spending scales up, not after.

What CFOs Cannot Measure: The Real Use of AI in Their Companies

Why measuring this is so difficult

What companies that do have visibility are doing

Who this matters for

Sources

Read next

Brain waves: the next data source physical AI is chasing

Moonshot AI's Kimi and Silicon Valley's new bout of nerves

Snap ships an MCP server for its 950 million users