What 10 tokens per second actually feels like
A small web tool by Mike Veerman simulates LLM output speeds between 5 and 800 tokens per second. Useful for understanding what that marketing number really means.
When a provider announces their model generates "30 tokens per second," the number sounds impressive in a benchmark but tells you little about real-world experience. Is that fast? Slow? Like reading text on screen, or more like watching sentences build word by word while you wait? Mike Veerman has built a practical answer: a small web app that simulates different output speeds so you can see the difference yourself.
Linked on May 20th in Simon Willison's blog and shared from Hacker News, the tool does nothing fancy: it plays back text at whatever speed you choose, between 5 and 800 tokens per second, and lets you observe the difference. The source code is available on GitHub and fits in a single HTML file.
Why tokens per second is a slippery metric
The problem with "tokens per second" (TPS) as a marketing measure is that it conflates two very different user experiences depending on context. For interactive tasks, a chat, a pair programming session with Claude Code, a quick query, streaming speed directly affects whether the experience feels smooth or frustrating. For batch processing, summarizing hundreds of documents, generating embeddings, running overnight pipelines, perceived latency matters far less than total throughput.
Moreover, tokens don't map uniformly to words. In Spanish, many common words tokenize into two or three pieces, meaning 30 tokens per second can produce significantly fewer readable words than you'd expect comparing it to English text at the same rate. Veerman's tool doesn't dive into that nuance, simulating generic output instead, but simply seeing the speed in real time puts the number in perspective.
What the simulation shows
Trying the tool confirms something those of us who work with LLMs regularly sense but rarely articulate clearly:
- Below 15-20 TPS, reading becomes uncomfortable. You notice the choppy rhythm, similar to text appearing letter by letter in many early production LLM chatbots.
- Between 30 and 60 TPS, the experience feels smooth for most users. Text appears at a speed similar to natural fast reading.
- Above 100 TPS, visible streaming loses practical value: text completes blocks so quickly that it feels almost like an instant response.
Who this is useful for
The tool has value in at least three concrete scenarios:
1. Teams evaluating inference providers: before signing a contract or picking an API tier, simulating the advertised speed helps decide whether the price jump between plans is justified for your specific use case.
2. Developers designing interfaces: knowing that 25 TPS feels smooth lets you make UX decisions, whether to show streaming or use an alternative loading indicator, with criteria instead of guesswork.
3. People evaluating hardware for local inference: when comparing one GPU to another based on TPS benchmarks, having a concrete perceptual reference is more useful than comparing abstract numbers.
It's not a tool that solves anything complex, but that's precisely its merit. Sometimes utility lies in making visible what was implicit in a number.
---
From ClaudeWave, we see this as a healthy reminder that benchmarks need perceptual translation before they become decision criteria. A single-file HTML tool that does exactly that deserves more visibility than it typically gets.
Sources
Read next
SpaceX's IPO Has Nothing to Do With Claude
SpaceX's IPO is today's big story, but ClaudeWave covers the Claude ecosystem. Here's why we didn't publish this and what you'll find instead.
A Farewell Counter for Fable 5 in Claude Code
A developer has published a countdown calendar marking the days until Fable 5 is discontinued in Claude Code. A modest project, but a signal of something larger.
Kickbacks: Advertising in Code Agent Loading Spinners
A project proposes turning code agent wait screens into ad space. The idea sparks debate over incentives, transparency, and trust in the ecosystem.