What 10 tokens per second actually feels like

When a provider announces their model generates "30 tokens per second," the number sounds impressive in a benchmark but tells you little about real-world experience. Is that fast? Slow? Like reading text on screen, or more like watching sentences build word by word while you wait? Mike Veerman has built a practical answer: a small web app that simulates different output speeds so you can see the difference yourself.

Linked on May 20th in Simon Willison's blog and shared from Hacker News, the tool does nothing fancy: it plays back text at whatever speed you choose, between 5 and 800 tokens per second, and lets you observe the difference. The source code is available on GitHub and fits in a single HTML file.

Why tokens per second is a slippery metric

The problem with "tokens per second" (TPS) as a marketing measure is that it conflates two very different user experiences depending on context. For interactive tasks, a chat, a pair programming session with Claude Code, a quick query, streaming speed directly affects whether the experience feels smooth or frustrating. For batch processing, summarizing hundreds of documents, generating embeddings, running overnight pipelines, perceived latency matters far less than total throughput.

Moreover, tokens don't map uniformly to words. In Spanish, many common words tokenize into two or three pieces, meaning 30 tokens per second can produce significantly fewer readable words than you'd expect comparing it to English text at the same rate. Veerman's tool doesn't dive into that nuance, simulating generic output instead, but simply seeing the speed in real time puts the number in perspective.

What the simulation shows

Trying the tool confirms something those of us who work with LLMs regularly sense but rarely articulate clearly:

Below 15-20 TPS, reading becomes uncomfortable. You notice the choppy rhythm, similar to text appearing letter by letter in many early production LLM chatbots.
Between 30 and 60 TPS, the experience feels smooth for most users. Text appears at a speed similar to natural fast reading.
Above 100 TPS, visible streaming loses practical value: text completes blocks so quickly that it feels almost like an instant response.

This puts public benchmark numbers in perspective. Modest local models delivering 15-20 TPS on consumer hardware can offer a perfectly usable chat experience. And certain cloud inference services boasting 200+ TPS are optimizing beyond what the average user can appreciate in a standard streaming interface.

Who this is useful for

The tool has value in at least three concrete scenarios:

1. Teams evaluating inference providers: before signing a contract or picking an API tier, simulating the advertised speed helps decide whether the price jump between plans is justified for your specific use case.
2. Developers designing interfaces: knowing that 25 TPS feels smooth lets you make UX decisions, whether to show streaming or use an alternative loading indicator, with criteria instead of guesswork.
3. People evaluating hardware for local inference: when comparing one GPU to another based on TPS benchmarks, having a concrete perceptual reference is more useful than comparing abstract numbers.

It's not a tool that solves anything complex, but that's precisely its merit. Sometimes utility lies in making visible what was implicit in a number.

---

From ClaudeWave, we see this as a healthy reminder that benchmarks need perceptual translation before they become decision criteria. A single-file HTML tool that does exactly that deserves more visibility than it typically gets.

What 10 tokens per second actually feels like

Why tokens per second is a slippery metric

What the simulation shows

Who this is useful for

Sources

Read next

A six-month case study: an AI trainer platform and a job board

PyPI blocks new files on releases older than 14 days

sqlite-utils 4.1 lets you insert rows with Python code