OpenMythos

Name: kyegomez/OpenMythos
Author: kyegomez

A theoretical reconstruction of the Claude Mythos architecture, built from first principles using the available research literature.

Plugins14.8k estrellas3.3k forks● PythonMITActualizado 2mo ago

Nota editorial

OpenMythos is a PyTorch implementation of a speculative reconstruction of the architecture suspected to underlie Anthropic's Claude Mythos model, built entirely from publicly available research rather than any proprietary information. The core design is a Recurrent-Depth Transformer consisting of three stages: a Prelude of standard transformer blocks, a looped Recurrent Block that runs for a configurable number of iterations up to `max_loop_iters`, and a final Coda, enabling compute-adaptive reasoning where inference depth can vary at generation time. Attention can be switched between Multi-head Latent Attention (MLA) and Grouped Query Attention (GQA), and the feed-forward layer uses a sparse Mixture-of-Experts setup with routed and shared experts. Pre-configured factory functions cover scales from 1B to 1T parameters, and a training script for the 3B variant targets 30 billion tokens on FineWeb-Edu using PyTorch DDP. The library is installable via pip and carries no affiliation with Anthropic. Researchers and ML engineers exploring looped transformer architectures are its primary audience.

ClaudeWave Trust Score

100/100

✓ Verified

Passed

✓Open-source license (MIT)
✓Actively maintained (<30d)
✓Healthy fork ratio
✓Clear description
✓Topics declared
✓Documented (README)

Last scanned: 6/11/2026

Install as a Claude Code plugin

Method: Clone

Claude Code

/plugin marketplace add kyegomez/OpenMythos
/plugin install openmythos

1. Inside Claude Code, add the marketplace and install the plugin with the commands above.

2. Follow any post-install configuration from the README.

3. Restart the session if commands or hooks do not show up immediately.

Casos de uso

AI / ML Social Productivity

Sobre el repo

Resumen de Plugins

# OpenMythos

<p align="left">
  <a href="https://pypi.org/project/open-mythos/" target="_blank">
    <picture>
      <source srcset="https://img.shields.io/pypi/v/open-mythos?style=for-the-badge&color=3670A0" media="(prefers-color-scheme: dark)">
      <img alt="Version" src="https://img.shields.io/pypi/v/open-mythos?style=for-the-badge&color=3670A0">
    </picture>
  </a>
  <a href="https://twitter.com/kyegomezb/">
    <picture>
      <source srcset="https://img.shields.io/badge/Twitter-Follow-1DA1F2?style=for-the-badge&logo=twitter&logoColor=white" media="(prefers-color-scheme: dark)">
      <img src="https://img.shields.io/badge/Twitter-Follow-1DA1F2?style=for-the-badge&logo=twitter&logoColor=white" alt="Twitter">
    </picture>
  </a>
  <a href="https://discord.gg/3keGBK9Pvr" target="_blank">
    <picture>
      <source srcset="https://img.shields.io/badge/Discord-Join-5865F2?style=for-the-badge&logo=discord&logoColor=white" media="(prefers-color-scheme: dark)">
      <img alt="Discord" src="https://img.shields.io/badge/Discord-Join-5865F2?style=for-the-badge&logo=discord&logoColor=white">
    </picture>
  </a>
  <a href="https://pytorch.org" target="_blank">
    <picture>
      <source srcset="https://img.shields.io/badge/PyTorch-Implemented-EE4C2C?style=for-the-badge&logo=pytorch&logoColor=white" media="(prefers-color-scheme: dark)">
      <img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-Implemented-EE4C2C?style=for-the-badge&logo=pytorch&logoColor=white">
    </picture>
  </a>
</p>

> **Disclaimer:** OpenMythos is an independent, community-driven theoretical reconstruction based solely on publicly available research and speculation. It is not affiliated with, endorsed by, or connected to Anthropic or any of their proprietary systems.

OpenMythos is an open-source, theoretical implementation of the Claude Mythos model. It implements a Recurrent-Depth Transformer (RDT) with three stages: **Prelude** (transformer blocks), a looped **Recurrent Block** (up to `max_loop_iters`), and a final **Coda**. Attention is switchable between MLA and GQA, and the feed-forward uses a sparse MoE with routed and shared experts ideal for exploring compute-adaptive, depth-variable reasoning.


## Installation

```bash
pip install open-mythos

#uv pip install open-mythos
```

To enable Flash Attention 2 in `GQAttention` (requires CUDA and build tools):

```bash
pip install open-mythos[flash]
```

## Usage

```python

import torch
from open_mythos.main import OpenMythos, MythosConfig


attn_type = "mla"  # or "gqa"

base = {
    "vocab_size": 1000,
    "dim": 256,
    "n_heads": 8,
    "max_seq_len": 128,
    "max_loop_iters": 4,
    "prelude_layers": 1,
    "coda_layers": 1,
    "n_experts": 8,
    "n_shared_experts": 1,
    "n_experts_per_tok": 2,
    "expert_dim": 64,
    "lora_rank": 8,
    "attn_type": attn_type,
}

if attn_type == "gqa":
    cfg = MythosConfig(**base, n_kv_heads=2)
else:
    cfg = MythosConfig(
        **base,
        n_kv_heads=8,
        kv_lora_rank=32,
        q_lora_rank=64,
        qk_rope_head_dim=16,
        qk_nope_head_dim=16,
        v_head_dim=16,
    )

model = OpenMythos(cfg)
total = sum(p.numel() for p in model.parameters())
print(f"\n[{attn_type.upper()}] Parameters: {total:,}")

ids = torch.randint(0, cfg.vocab_size, (2, 16))
logits = model(ids, n_loops=4)
print(f"[{attn_type.upper()}] Logits shape: {logits.shape}")

out = model.generate(ids, max_new_tokens=8, n_loops=8)
print(f"[{attn_type.upper()}] Generated shape: {out.shape}")

A = model.recurrent.injection.get_A()
rho = torch.linalg.eigvals(A).abs().max().item()
print(
    f"[{attn_type.upper()}] Spectral radius ρ(A) = {rho:.4f} (must be < 1)"
)
```



## Model Variants

Pre-configured scales from 1B to 1T parameters:

```python
from open_mythos import (
    mythos_1b,
    mythos_3b,
    mythos_10b,
    mythos_50b,
    mythos_100b,
    mythos_500b,
    mythos_1t,
    OpenMythos,
)

cfg = mythos_7b()  # returns a MythosConfig
model = OpenMythos(cfg)

total = sum(p.numel() for p in model.parameters())
print(f"Parameters: {total:,}")
```

| Variant | `dim` | Experts | `expert_dim` | Loop iters | Context | Max output |
|---|---|---|---|---|---|---|
| `mythos_1b` | 2048 | 64 | 2048 | 16 | 4k | 4k |
| `mythos_3b` | 3072 | 64 | 4096 | 16 | 4k | 4k |
| `mythos_10b` | 4096 | 128 | 5632 | 24 | 8k | 4k |
| `mythos_50b` | 6144 | 256 | 9728 | 32 | 8k | 4k |
| `mythos_100b` | 8192 | 256 | 13568 | 32 | 1M | 128k |
| `mythos_500b` | 12288 | 512 | 23040 | 48 | 1M | 128k |
| `mythos_1t` | 16384 | 512 | 34560 | 64 | 1M | 128k |

---

## Training

The training script for the 3B model on FineWeb-Edu is at [`training/3b_fine_web_edu.py`](training/3b_fine_web_edu.py).

**Single GPU:**
```bash
python training/3b_fine_web_edu.py
```

**Multi-GPU (auto-detects GPU count):**
```bash
torchrun --nproc_per_node=$(python -c "import torch; print(torch.cuda.device_count())") training/3b_fine_web_edu.py
```

Key design choices:

| Feature | Detail |
|---|---|
| Optimizer | AdamW |
| Dataset | `HuggingFaceFW/fineweb-edu` (`sample-10BT` by default, swap to `sample-100BT` or `default` for full run) |
| Tokenizer | `openai/gpt-oss-20b` via `MythosTokenizer` |
| Parallelism | PyTorch DDP via `torchrun`, sharded streaming dataset |
| Precision | bfloat16 on H100/A100, float16 + GradScaler on older GPUs |
| Schedule | Linear warmup (2000 steps) → cosine decay |
| Target | 30B tokens (~Chinchilla-adjusted for looped architecture) |

---

## Documentation

| Page | Description |
|---|---|
| [`docs/open_mythos.md`](docs/open_mythos.md) | Full API reference for the `OpenMythos` class — constructor, `forward`, `generate`, all sub-modules, configuration reference, and usage examples |
| [`docs/datasets.md`](docs/datasets.md) | Recommended training datasets with token budget guidance per model size |

---

## The Central Hypothesis

Claude Mythos is suspected to be a **Recurrent-Depth Transformer (RDT)** — also called a Looped Transformer (LT). Rather than stacking hundreds of unique layers, a subset of layers is recycled and run through multiple times per forward pass. Same weights. More loops. Deeper thinking.

This is not chain-of-thought. There is no intermediate token output. All of this reasoning happens **silently, inside a single forward pass**, in continuous latent space.

---

## Architecture

A looped transformer divides its layers into three functional blocks:

```
Input
  ↓
[Prelude P]        — standard transformer layers, run once
  ↓
[Recurrent Block R] — looped T times
  ↑_______↓         (hidden state h updated each loop with input injection e)
  ↓
[Coda C]           — standard transformer layers, run once
  ↓
Output
```

The recurrent block update rule at each loop step t:

```
h_{t+1} = A·h_t + B·e + Transformer(h_t, e)
```

Where:
- `h_t` is the hidden state after loop t
- `e` is the encoded input (from the Prelude), injected at every loop
- `A` and `B` are learned injection parameters
- The Transformer blocks apply attention and MLP as usual

The injection of `e` at every step is what prevents the model from drifting — it keeps the original input signal alive throughout the entire recurrence depth.

The full implementation is in [`open_mythos/main.py`](open_mythos/main.py). See the [`OpenMythos` class reference](docs/open_mythos.md) for a detailed API walkthrough, configuration options, and usage examples.

### Attention Implementations

The attention layer is switchable via `cfg.attn_type`:

| Option | Class | Description |
|---|---|---|
| `"gqa"` | `GQAttention` | Grouped Query Attention (Ainslie et al., 2023) — fewer KV heads than Q heads (`n_kv_heads < n_heads`), reducing KV-cache memory by `n_heads / n_kv_heads`. Uses **Flash Attention 2** (Dao et al., 2023) when `flash-attn>=2.8.3` is installed: GQA is handled natively (no KV head expansion), I/O-bound-optimal, with a transparent fallback to manual scaled dot-product attention when the package is absent. |
| `"mla"` | `MLAttention` | Multi-Latent Attention (DeepSeek-V2) — caches a compressed KV latent (`kv_lora_rank`) rather than full K/V, with split RoPE / no-RoPE head dims for position-aware compression. |

RoPE is applied to Q and K before caching, so cached values do not need to be re-rotated on retrieval.

---

## Why This Explains Mythos

### 1. Systematic Generalization

Vanilla transformers fail to combine knowledge in ways they have never seen during training. Looped transformers pass this test. The ability emerges through a **three-stage grokking process**:

1. Memorization — model fits training distribution
2. In-distribution generalization — model handles known compositions
3. Systematic generalization — model handles novel compositions OOD, abruptly and suddenly

This is why Mythos feels qualitatively different from other models on novel questions — the capability phase-transitions in, rather than emerging gradually.

### 2. Depth Extrapolation

Train on 5-hop reasoning chains. Test on 10-hop. Vanilla transformer fails. Looped transformer succeeds — by running more inference-time loops. This maps directly to the observation that Mythos handles deeply compositional problems (multi-step math, long-horizon planning, layered arguments) without explicit chain-of-thought.

More loops at inference = deeper reasoning chains = harder problems solved.

### 3. Latent Thoughts as Implicit Chain-of-Thought

Each loop iteration is the functional equivalent of one step of chain-of-thought, but operating in continuous latent space rather than token space. A looped model running T loops implicitly simulates T steps of CoT reasoning. This has been formally proven (Saunshi et al., 2025).

Furthermore, continuous latent thoughts — unlike discrete token outputs — can encode **multiple alternative next steps simultaneously**. This allows something closer to breadth-first search over the reasoning space, rather than a single committed reasoning path. The model is effectively exploring many possible

Topics

aianthropicattentionclaudeclaude-aiclaude-codeclaude-code-pluginclaude-mythosclaude-sonnetdeepmindgpt-5gpt-7jaxlooped-transformersmlpytorchsonnettorch

Preguntas frecuentes

Lo que la gente pregunta sobre OpenMythos

¿Qué es kyegomez/OpenMythos?

kyegomez/OpenMythos es plugins para el ecosistema de Claude AI. A theoretical reconstruction of the Claude Mythos architecture, built from first principles using the available research literature. Tiene 14.8k estrellas en GitHub y se actualizó por última vez 2mo ago.

¿Cómo se instala OpenMythos?

Puedes instalar OpenMythos clonando el repositorio (https://github.com/kyegomez/OpenMythos) o siguiendo las instrucciones del README en GitHub. ClaudeWave también te ofrece bloques de instalación rápida en esta misma página.

¿Es seguro usar kyegomez/OpenMythos?

Nuestro agente de seguridad ha analizado kyegomez/OpenMythos y le ha asignado un Trust Score de 100/100 (tier: Verified). Revisa el desglose completo de comprobaciones superadas y flags en esta página.

¿Quién mantiene kyegomez/OpenMythos?

kyegomez/OpenMythos es mantenido por kyegomez. La última actividad registrada en GitHub es de 2mo ago, con 59 issues abiertos.

¿Hay alternativas a OpenMythos?

Sí. En ClaudeWave puedes explorar plugins similares en /categories/plugins, ordenados por popularidad o actividad reciente.

Deploy en 1 click

Despliega OpenMythos en tu cloud

Lleva este repo a producción en minutos. Cada plataforma genera su propio entorno con variables de entorno editables.

Vercel Railway Render

Badge embebible

¿Mantienes este repo? Añade un badge a tu README

Pega el badge en tu README de GitHub para mostrar que está auditado por ClaudeWave. Cada badge enlaza de vuelta a esta página y muestra el Trust Score actual.

Markdown (README)

[![Featured on ClaudeWave](https://claudewave.com/api/badge/kyegomez-openmythos)](https://claudewave.com/repo/kyegomez-openmythos)

HTML

<a href="https://claudewave.com/repo/kyegomez-openmythos"><img src="https://claudewave.com/api/badge/kyegomez-openmythos" alt="Featured on ClaudeWave: kyegomez/OpenMythos" width="320" height="64" /></a>

Relacionados

Más Plugins

Alternativas a OpenMythos

anthropics

claude-code

3d ago

Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing routine tasks, explaining complex code, and handling git workflows - all through natural language commands.

139.3k22.4kPython

PluginsInstall

Imbad0202

academic-research-skills

today

Academic Research Skills for Claude Code: research → write → review → revise → finalize

39.8k3.2kPython

Pluginsacademic-pipelineacademic-writingInstall

blader

humanizer

5d ago

Agent skill that removes signs of AI-generated writing from text

31.6k2.9kPython

Pluginsagent-skillsai-writingInstall

jarrodwatts

claude-hud

7d ago

A Claude Code plugin that shows what's happening - context usage, active tools, running agents, and todo progress

26.9k1.2kJavaScript

PluginsanthropicclaudeInstall

zarazhangrui

frontend-slides

1mo ago

Create beautiful slides on the web using a coding agent's frontend skills

26.4k2.1kJavaScript

Pluginsai-slidesanthropicInstall

phuryn

pm-skills

24d ago

PM Skills Marketplace: 100+ agentic skills, commands, and plugins — from discovery to strategy, execution, launch, and growth.

24.5k2.6k

Pluginsagent-skill-repositoryagent-skillsInstall