Subquadratic Claims to Have Solved LLMs' Mathematical Bottleneck

Last May, Miami-based startup Subquadratic emerged from stealth with a claim that raised eyebrows across the research community: they had solved a mathematical problem that has limited the performance of large language models for nearly a decade. Initial details were sparse, and the reception among researchers was, at best, sceptical. Now, according to MIT Technology Review, the company has begun sharing concrete evidence backing up their claim.

The Problem They Say They've Solved

The bottleneck in question is the quadratic complexity of the attention mechanism underlying virtually all modern transformers. Put plainly: when a model processes a text sequence, the computational cost grows quadratically with the length of that sequence. Double the sequence length, and the cost quadruples. This has direct consequences for context windows, inference speed, and training cost.

It's not a new problem. Since 2017, when the original Attention Is All You Need paper was published, the community has sought approaches to reduce that complexity: Linformer, Performer, FlashAttention, linear attention in its various forms. All offer partial improvements or involve quality trade-offs. Subquadratic claims to have found a mathematically sound solution, not an approximation.

Why Initial Scepticism Was Reasonable

Announcing that a problem of this magnitude has been "solved" without peer review publication or public technical details is a legitimate red flag. AI history is full of grand claims that evaporate under scrutiny. In this particular case, the company communicated first and detailed later, which didn't help initial credibility.

However, the fact that they're now sharing verifiable results changes the picture. The MIT Tech Review article suggests that Subquadratic is showing benchmarks and, presumably, sufficient implementation details for external researchers to begin evaluating the robustness of their approach. That's the step that separates a marketing claim from a genuine contribution.

What It Would Mean If Confirmed

If the solution is mathematically robust and generalisable, the implications are substantial, though it's worth tempering expectations:

Longer context windows at lower cost: models like Claude Opus 4.8, which already offers optional 1M token context, could benefit from significant inference cost reductions at higher context ranges.
More efficient training: quadratic cost also hits during training. Reducing it would make it possible to train with longer sequences without scaling the compute budget proportionally.
More modest hardware requirements: if complexity decreases, the barrier to entry for deploying large models on proprietary infrastructure lowers.

That said, there's a considerable distance between a mathematical solution and production adoption: compatibility with CUDA kernels, integration into existing frameworks like PyTorch or JAX, and validation on real-scale models.

Who Should Follow This Closely

Primarily, architecture research teams and any organisation operating its own models at scale where inference cost is a material expense. For most teams working with APIs, the impact would be indirect and arrive, if at all, with some delay.

For those building on the Claude ecosystem, integrations with MCP servers, pipelines with Claude Code, agents handling long contexts, the interest is more prospective: if this type of breakthrough translates to base models, behaviour on extended context tasks could improve without changes at the integration layer.

---

Our position is one of active waiting: the claim deserves attention precisely because it's beginning to be backed by evidence, but the standard in architecture research is independent replication. Until that happens, "bottleneck solved" remains a hypothesis with promising data, not an established fact.

Subquadratic Claims to Have Solved LLMs' Mathematical Bottleneck

The Problem They Say They've Solved

Why Initial Scepticism Was Reasonable

What It Would Mean If Confirmed

Who Should Follow This Closely

Sources

Read next

OpenAI publishes ten advances in mathematics and theoretical computing

RL versus SFT: what changes inside a reasoning model

An LLM-maintained wiki to preserve what research teams forget