freellmpool

Name: 0xzr/freellmpool
Author: 0xzr

Free LLM API pool: 19 LLM providers cataloged, 235 routes, 355 cataloged chat models, keyless start when available.

MCP ServersOfficial Registry13 stars2 forks● PythonMITUpdated today

Install in Claude Code / Claude Desktop

Method: pip / Python · --upgrade

Claude Code CLI

claude mcp add freellmpool -- python -m --upgrade

claude_desktop_config.json (Claude Desktop)

{
  "mcpServers": {
    "freellmpool": {
      "command": "python",
      "args": ["-m", "pip"],
      "env": {
        "ANTHROPIC_BASE_URL": "<anthropic_base_url>",
        "ANTHROPIC_AUTH_TOKEN": "<anthropic_auth_token>",
        "ANTHROPIC_API_KEY": "<anthropic_api_key>",
        "OPENAI_BASE_URL": "<openai_base_url>",
        "OPENAI_API_KEY": "<openai_api_key>"
      }
    }
  }
}

1. Run the command above in your terminal (Claude Code), or paste the JSON config into claude_desktop_config.json (Claude Desktop).

2. Replace any <placeholder> values with your API keys or paths.

3. Restart Claude. The MCP server and its tools appear automatically.

💡 Install first: pip install --upgrade

Detected environment variables

ANTHROPIC_BASE_URLANTHROPIC_AUTH_TOKENANTHROPIC_API_KEYOPENAI_BASE_URLOPENAI_API_KEY

Use cases

AI / ML Dev Tools Creative

About

MCP Servers overview

# freellmpool

<!-- mcp-name: io.github.0xzr/freellmpool -->

![freellmpool tokenmax terminal demo](assets/demo.svg)

![235 enabled routes, 19 LLM providers cataloged, keyless start when available](assets/tokenmax-results.svg)

Pool the free tiers of 19 LLM providers cataloged in freellmpool (235 enabled chat routes, 355 cataloged chat models)
behind one OpenAI-compatible endpoint — as a CLI, a Python library, or a local
proxy. Can start without API keys when a keyless provider is up.

[![PyPI](https://img.shields.io/pypi/v/freellmpool.svg)](https://pypi.org/project/freellmpool/)
[![CI](https://github.com/0xzr/freellmpool/actions/workflows/ci.yml/badge.svg)](https://github.com/0xzr/freellmpool/actions/workflows/ci.yml)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
[![Website](https://img.shields.io/badge/docs-0xzr.github.io%2Ffreellmpool-6ea8ff)](https://0xzr.github.io/freellmpool/)

[FAQ](FAQ.md): where prompts go, ToS posture, failover, bans, and comparisons.

## 30-second quickstart

Fresh install to first free-model reply is measured at about 19 seconds under
the 30-second target on a clean Linux/Python 3.12 environment, with no API keys
when a keyless provider is up:

```bash
python3 -m venv .venv
. .venv/bin/activate
python -m pip install --upgrade pip
python -m pip install freellmpool
freellmpool ask --max-tokens 32 "Reply with one short sentence: freellmpool is ready."
```

CI runs the same path from this checkout with
`FREELLMPOOL_QUICKSTART_PACKAGE=. scripts/quickstart-test.sh`.

Groq, Cerebras, NVIDIA NIM, Google Gemini, OpenRouter, GitHub Models, Cloudflare,
Mistral, Cohere and others each give away a free tier — but each has its own SDK,
rate limits, and daily cap. freellmpool puts them in one pool: it sends each
request to a provider you have access to, fails over to the next when one is rate
limited or down, and tracks per-day usage so you get the most out of every tier.

Several providers (Pollinations, OVHcloud, and Kilo Gateway) need no API key,
and LLM7 works without one, so the quickstart can answer without signup when a
keyless provider is available.

Add keys for the other providers to unlock more models and higher limits.

## Run a coding agent on free models

freellmpool's proxy speaks the OpenAI API and includes an experimental
Anthropic-compatible path, so coding agents can run against pooled free tiers —
just point them at the proxy:

```bash
freellmpool proxy                       # starts http://localhost:8080
freellmpool code claude                 # prints the one-line setup for Claude Code
# (also: codex, aider, cline, continue, cursor, opencode)
```

Claude Code gateway mode can also be launched directly:

```bash
ANTHROPIC_BASE_URL=http://localhost:8080 \
ANTHROPIC_AUTH_TOKEN=dummy \
ANTHROPIC_API_KEY=dummy \
ANTHROPIC_MODEL=auto \
ANTHROPIC_SMALL_FAST_MODEL=auto \
CLAUDE_CODE_ENABLE_GATEWAY_MODEL_DISCOVERY=1 \
claude
```

Existing OpenAI-compatible apps work the same way: set
`OPENAI_BASE_URL=http://localhost:8080/v1` and keep your code unchanged.
Anthropic-compatible tools can use the experimental bridge with
`ANTHROPIC_BASE_URL=http://localhost:8080`.

**OpenCode** gets a deeper integration: a live in-editor **dashboard** (routing mode,
estimated savings, tokens served free, provider race, latency), per-request
**quality routing** via the model picker (`freellmpool/auto|fast|quality|fair`), and `freellmpool_status`
/ `freellmpool_models` tools — see [integrations/opencode-tui](integrations/opencode-tui)
and the [guide](https://0xzr.github.io/freellmpool/run-opencode-on-free-models.html).

**New in 0.11:** capacity tools — `freellmpool capacity status` shows which free
tiers are usable right now, `freellmpool providers health` live-probes them, and
`freellmpool keys add` walks you through configuring more (see
[Capacity & provider health](#capacity--provider-health) and
[docs/CAPACITY.md](docs/CAPACITY.md)).

**New in 0.10:** an async API (`AsyncPool`), an MCP server (`freellmpool mcp`),
latency-aware routing with `freellmpool benchmark`, observability hooks, and a
plugin system for custom providers. See the [changelog](CHANGELOG.md).

## Install

```bash
pip install freellmpool      # or: pipx install freellmpool
```

Only dependency is `httpx`. Python 3.11+.

## Command line

```bash
freellmpool ask "Write a haiku about sqlite"
git diff | freellmpool ask "Write a commit message for this"
freellmpool tokenmax "Hardest question you've got"  # 🌈 blast models, print answers, optional synthesis
freellmpool providers        # which providers are configured
freellmpool models           # every provider/model id
freellmpool stats            # lifetime tokens served free + estimated cost avoided
freellmpool badge -o badge.svg   # a shareable SVG badge of that total
```

`freellmpool tokenmax` is the tongue-in-cheek maximum-effort mode: it fans your
prompt out to many available models at once and prints each answer. The CLI adds
a synthesized verdict by default unless you pass `--no-synthesize`; the MCP tool
returns the model answers for the calling agent to synthesize. (See
[docs/MCP.md](docs/MCP.md).)

`freellmpool stats` is a running, **persistent** lifetime total (it survives restarts
and upgrades). Embed `freellmpool badge` in a README, or serve it live from the proxy
at `/badge.svg` (set `FREELLMPOOL_PUBLIC_BADGE=1` to make it publicly embeddable).

Pin a provider or model; common OpenAI/Anthropic model names are mapped to a free
equivalent so existing scripts keep working:

```bash
freellmpool ask -m groq/llama-3.3-70b-versatile "hi"
freellmpool ask -p cerebras,groq "hi"
freellmpool ask -m gpt-4o-mini "hi"      # routed to a free model
```

## As a proxy

Run a local server that speaks the OpenAI API, then point any OpenAI-compatible
tool at it:

```bash
freellmpool proxy
export OPENAI_BASE_URL=http://localhost:8080/v1
export OPENAI_API_KEY=unused
```

```python
from openai import OpenAI
client = OpenAI()
print(client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "hi"}],
).choices[0].message.content)

# audio → text (Whisper), same client:
print(client.audio.transcriptions.create(
    model="auto", file=open("audio.mp3", "rb"),
).text)
```

Or with `curl` (multipart upload):

```bash
curl -s http://localhost:8080/v1/audio/transcriptions \
  -F file=@audio.mp3 -F model=auto
```

The proxy also implements the OpenAI Responses API (for the Codex CLI) and an
experimental Anthropic Messages API path (for Claude Code), so coding agents can
run on free models too. `freellmpool code <agent>` prints the exact setup:

```bash
freellmpool code aider       # also: claude, codex, cline, continue, cursor, opencode
```

Endpoints: `/v1/chat/completions` (token streaming, tool calling), `/v1/embeddings`,
`/v1/audio/transcriptions` (Whisper, multipart upload), `/v1/responses`,
`/v1/messages` (experimental Anthropic-compatible path), `/v1/models`, and a
`/dashboard` page showing usage.
Setup snippets for specific tools are in [docs/INTEGRATIONS.md](docs/INTEGRATIONS.md)
and [docs/AGENTS.md](docs/AGENTS.md). The repo also includes an experimental
[metaswarm review adapter](integrations/metaswarm) for using `freellmpool` as an
external-tools reviewer/second opinion.

## As a library

```python
from freellmpool import Pool

pool = Pool.from_default_config()
reply = pool.ask("Summarize the plot of Hamlet in 20 words.")
print(reply.text, "—", reply.provider_id)

vectors = pool.embed(["first document", "second document"]).vectors

with open("audio.mp3", "rb") as f:
    text = pool.transcribe(f.read(), "audio.mp3").text   # Whisper, failover across providers
```

Async is the same API with `await`:

```python
from freellmpool import AsyncPool

async with AsyncPool.from_default_config() as pool:
    reply = await pool.aask("Summarize the plot of Hamlet in 20 words.")
```

Pass `on_event=...` to either pool to receive structured routing/cache events
(`attempt`/`success`/`error`/`cooldown`/`cache_hit`/`cache_miss`/`exhausted`) for logging or tracing. Add
your own endpoint with `register_provider(...)`, or a new request shape with
`register_adapter(name, fn)`.

## Benchmark your providers

`freellmpool benchmark` times one call per configured provider and prints
latency and success, so you can see which of your free tiers are fastest right
now. The router learns the same latency/success signal from real traffic as it
runs; set `FREELLMPOOL_ROUTING=fast` to prefer the lowest-latency provider
instead of the default least-used-first.

```
$ freellmpool benchmark
  provider/model            status   latency  note
  cerebras/llama-3.3-70b    ok        180 ms  6 tok
  groq/llama-3.3-70b        ok        240 ms  6 tok
  ovh/Meta-Llama-3_3-70B    FAIL           -  HTTP 429
```

## Capacity & provider health

Free tiers drift through the day — keys expire, providers go down, daily caps
fill. These commands tell you what's usable right now and what to set up next:

```bash
freellmpool capacity status --target 5   # who's healthy / near quota / missing a key
freellmpool providers health             # send one tiny request to each, time it
freellmpool keys checklist --target 5    # which keys to add to reach N healthy providers
freellmpool keys add groq                # configure a key (and record metadata)
```

`capacity status` is local-first: it reads your catalog, environment, and
per-day quota counters and labels each provider `healthy`, `low_quota`,
`exhausted`, `invalid_key`, or `missing`. It also syncs an advisory external
catalog ([mnfst/awesome-free-llm-apis](https://github.com/mnfst/awesome-free-llm-apis))
to suggest free providers you could add — advisory only; your `providers.toml`
stays the source of truth for routing. `keys add <name>` can even import a
suggested provider from that catalog or create an OpenAI-compatible stub and
autodiscover its models. The proxy `/dashboard` shows the same capacity at a
glance. Full reference:

Topics

anthropicclaudecodexcursorfailoverfree-llmfree-llm-apigeminigroqllm-gatewayllm-routermcpmcp-servermodel-context-protocolopenaiopenai-proxyopenrouterpythonrate-limitingspeech-to-text

Frequently asked

What people ask about freellmpool

What is 0xzr/freellmpool?

0xzr/freellmpool is mcp servers for the Claude AI ecosystem. Free LLM API pool: 19 LLM providers cataloged, 235 routes, 355 cataloged chat models, keyless start when available. It has 13 GitHub stars and was last updated today.

How do I install freellmpool?

You can install freellmpool by cloning the repository (https://github.com/0xzr/freellmpool) or following the README instructions on GitHub. ClaudeWave also provides quick install blocks on this page.

Is 0xzr/freellmpool safe to use?

0xzr/freellmpool has not been audited yet by our security agent. Review the original repository on GitHub before using it in production.

Who maintains 0xzr/freellmpool?

0xzr/freellmpool is maintained by 0xzr. The last recorded GitHub activity is from today, with 8 open issues.

Are there alternatives to freellmpool?

Yes. On ClaudeWave you can browse similar mcp servers at /categories/mcp, sorted by popularity or recent activity.

1-click deploy

Deploy freellmpool to your cloud

Ship this repo to production in minutes. Each platform spins up its own environment with editable env vars.

Vercel Railway Render

Embeddable badge

Maintain this repo? Add a badge to your README

Drop the badge into your GitHub README to show it's tracked on ClaudeWave. Each badge links back to this page and reflects the live Trust Score.

Markdown (README)

[![Featured on ClaudeWave](https://claudewave.com/api/badge/0xzr-freellmpool)](https://claudewave.com/repo/0xzr-freellmpool)

HTML

<a href="https://claudewave.com/repo/0xzr-freellmpool"><img src="https://claudewave.com/api/badge/0xzr-freellmpool" alt="Featured on ClaudeWave: 0xzr/freellmpool" width="320" height="64" /></a>

More MCP Servers

freellmpool alternatives

n8n-io

n8n

today

Fair-code workflow automation platform with native AI capabilities. Combine visual building with custom code, self-host or cloud, 400+ integrations.

193k58.6kTypeScript

MCP ServersaiapisInstall

open-webui

yesterday

User-friendly AI Interface (Supports Ollama, OpenAI API, ...)

142.1k20.4kPython

MCP ServersaillmInstall

google-gemini

gemini-cli

today

An open-source AI agent that brings the power of Gemini directly into your terminal.

105.4k14.1kTypeScript

MCP Serversaiai-agentsInstall

netdata

today

The fastest path to AI-powered full stack observability, even for lean teams.

79.3k6.5kC

MCP ServersaialertingInstall

D4Vinci

Scrapling

today

🕷️ An adaptive Web Scraping framework that handles everything from a single request to a full-scale crawl!

64.6k6.3kPython

MCP Serversaiai-scrapingInstall

sansan0

TrendRadar

5d ago

⭐AI-driven public opinion & trend monitor with multi-platform aggregation, RSS, and smart alerts.🎯 告别信息过载，你的 AI 舆情监控助手与热点筛选工具！聚合多平台热点 + RSS 订阅，支持关键词精准筛选。AI 智能筛选新闻 + AI 翻译 + AI 分析简报直推手机，也支持接入 MCP 架构，赋能 AI 自然语言对话分析、情感洞察与趋势预测等。支持 Docker ，数据本地/云端自持。集成微信/飞书/钉钉/Telegram/邮件/ntfy/bark/slack 等渠道智能推送。

59.6k24.7kPython

MCP ServersaibarkInstall