Subquadratic Claims to Have Solved LLMs' Mathematical Bottleneck
The Miami startup emerged from stealth with a bold claim: solving a mathematical problem that has constrained LLMs for nearly a decade. Now it's starting to provide evidence.
Last May, Miami-based startup Subquadratic emerged from stealth with a claim that raised eyebrows across the research community: they had solved a mathematical problem that has limited the performance of large language models for nearly a decade. Initial details were sparse, and the reception among researchers was, at best, sceptical. Now, according to MIT Technology Review, the company has begun sharing concrete evidence backing up their claim.
The Problem They Say They've Solved
The bottleneck in question is the quadratic complexity of the attention mechanism underlying virtually all modern transformers. Put plainly: when a model processes a text sequence, the computational cost grows quadratically with the length of that sequence. Double the sequence length, and the cost quadruples. This has direct consequences for context windows, inference speed, and training cost.
It's not a new problem. Since 2017, when the original Attention Is All You Need paper was published, the community has sought approaches to reduce that complexity: Linformer, Performer, FlashAttention, linear attention in its various forms. All offer partial improvements or involve quality trade-offs. Subquadratic claims to have found a mathematically sound solution, not an approximation.
Why Initial Scepticism Was Reasonable
Announcing that a problem of this magnitude has been "solved" without peer review publication or public technical details is a legitimate red flag. AI history is full of grand claims that evaporate under scrutiny. In this particular case, the company communicated first and detailed later, which didn't help initial credibility.
However, the fact that they're now sharing verifiable results changes the picture. The MIT Tech Review article suggests that Subquadratic is showing benchmarks and, presumably, sufficient implementation details for external researchers to begin evaluating the robustness of their approach. That's the step that separates a marketing claim from a genuine contribution.
What It Would Mean If Confirmed
If the solution is mathematically robust and generalisable, the implications are substantial, though it's worth tempering expectations:
- Longer context windows at lower cost: models like Claude Opus 4.8, which already offers optional 1M token context, could benefit from significant inference cost reductions at higher context ranges.
- More efficient training: quadratic cost also hits during training. Reducing it would make it possible to train with longer sequences without scaling the compute budget proportionally.
- More modest hardware requirements: if complexity decreases, the barrier to entry for deploying large models on proprietary infrastructure lowers.
Who Should Follow This Closely
Primarily, architecture research teams and any organisation operating its own models at scale where inference cost is a material expense. For most teams working with APIs, the impact would be indirect and arrive, if at all, with some delay.
For those building on the Claude ecosystem, integrations with MCP servers, pipelines with Claude Code, agents handling long contexts, the interest is more prospective: if this type of breakthrough translates to base models, behaviour on extended context tasks could improve without changes at the integration layer.
---
Our position is one of active waiting: the claim deserves attention precisely because it's beginning to be backed by evidence, but the standard in architecture research is independent replication. Until that happens, "bottleneck solved" remains a hypothesis with promising data, not an established fact.
Sources
Read next
AgenticRei proposes governing AI agents with deontic policies at runtime
An arXiv paper argues that XACML, Rego and Cedar are insufficient for modern autonomous agents and proposes AgenticRei as a deontic governance framework for runtime execution.
CaVe-VLM-CoT: A Framework That Forces Vision-Language Models to Cite Their Sources
A new arXiv paper proposes a five-stage pipeline that routes verification failures back to the retriever, measuring step-by-step citation in vision-language models for the first time.
Autonomous vision-language model operates in orbit for the first time
On April 16, 2026, the NAVI-Orbital system executed multimodal inference entirely onboard a LEO satellite, without human intervention in the loop.