Skill32.8k repo starsupdated today

math-olympiad

This Claude Code skill solves competition mathematics problems from olympiad-level contests including IMO, Putnam, USAMO, and AIME. Use it when tackling rigorous proof problems or numeric competition math, employing a five-step adversarial verification workflow that prioritizes honest abstention over confident errors, separates reasoning from verification, and identifies logical gaps through counterexample testing and unified argument discovery rather than complex case analysis.

View source Repository: claude-plugins-official

Install in Claude Code

Copy

git clone --depth 1 https://github.com/anthropics/claude-plugins-official /tmp/math-olympiad && cp -r /tmp/math-olympiad/plugins/math-olympiad/skills/math-olympiad ~/.claude/skills/math-olympiad

Then start a new Claude Code session; the skill loads automatically.

Definition

SKILL.md

# Math Olympiad Solver

## The five things that change outcomes

1. **Strip thinking before verifying** — a verifier that sees the reasoning is
biased toward agreement. Fresh context, cleaned proof only.
2. **"Does this prove RH?"** — if your theorem's specialization to ζ is a famous
open problem, you have a gap. Most reliable red flag.
3. **Short proof → extract the general lemma** — try 2×2 counterexamples. If
general form is false, find what's special about THIS instance.
4. **Same gap twice → step back** — the case split may be obscuring a unified
argument. Three lines sometimes does what twelve pages couldn't.
5. **Say "no confident solution"** — wrong-and-confident is worse than honest
abstain.

---

**Tool policy**: Solvers and verifiers use THINKING ONLY in the tight-budget
workflow. Competition math is reasoning. Computation is for deep mode (§6c), and
even then bounded — a recurrence that's doubly-exponential can't be computed
past n~30, work mod 2^m instead.

---

## When to use which approach

| Problem | Approach | Verification |
| ---------------------------------------------------- | ------------------------------------------------------------------------------ | ------------------------- |
| AIME numeric answer | Best-of-N → majority vote | Answer check only |
| Olympiad proof (IMO/Putnam/USAMO) | Full workflow below | 5-pass adversarial |
| "Is this proof correct?" | Skip to verification (step 4) | Adversarial + spec-gaming |
| **Full problem set** (e.g. all 6 from a competition) | Sequential: one full workflow per problem, collect results, compile single PDF | Per-problem adversarial |

**Batch in one Workflow**: Set `opts.label` on every `agent()` call to include
the problem ID (e.g., `label: "P3:solver:2"`). Without labels, 36 results come
back with no problem association. Run problems in parallel — the label is what
matters, not ordering.

### For a full problem set

Launch one solver workflow per problem (same VERBATIM prompt, different
statement). Run them in parallel. When all return, run adversarial verification
per problem. Problems that pass get their proof in the PDF; problems that
abstain get "No confident solution" with partial notes.

Don't try to solve all N problems in one agent's context — each problem needs
its own thinking budget and its own fresh-context verifier. The composition is
mechanical: collect the per-problem outputs, fill in LaTeX sections, compile
once. | "Simplify this proof" | Skip to presentation (step 8) | — |

---

## The Workflow

### 1. Interpretation check (30 seconds, catches 50/63 of one class of errors)

Before solving anything, identify the interpretation.

> Read the problem statement. List 2-3 ways it could be interpreted. For each:
> is this reading TRIVIAL? If one reading makes the problem easy and another
> makes it hard, the hard one is almost certainly intended. State which
> interpretation you're solving and WHY you believe it's the intended one.

The Aletheia case study found 50 of 63 "technically correct" solutions were for
the wrong interpretation. Olympiad problems often have a trap easy reading.

### 2. Generate candidates with internal refinement (parallel, thinking only)

Launch 8-12 attempt agents in parallel. **Each agent internally iterates** —
solve → self-improve → self-verify → correct → repeat. This is the Yang-Huang
structure that achieves 85.7% on IMO: one-shot solving isn't enough; per-attempt
refinement matters.

**The Agent tool cannot enforce tool restriction.** Subagents get the full tool
set. The only mechanism is the prompt. Use this prompt VERBATIM — do not
summarize, do not synthesize your own:

```
NO COMPUTATION. Do not use Bash, Python, WebSearch, Read, Write, or any tool that runs code or fetches data. Numerical verification is not a proof step. "I computed n=1..10 and the pattern holds" is not a proof.

(If your agent harness requires a StructuredOutput or similar return-mechanism tool call, that is NOT a computation tool — call it to return your answer. The restriction is on tools that DO work, not tools that REPORT work.)

Your internal process (iterate until done):
- Solve: Complete rigorous solution.
- Self-improve: Reread. Fix gaps before a grader sees it.
- Self-verify: Strict grader mode. Every step justified?
- Correct: Fix and re-verify. Up to 5 rounds.
- Stop: Self-verify passes twice clean, OR 5 rounds, OR approach fundamentally wrong.

A correct answer from flawed reasoning is a failure. If incomplete, say so honestly. Never hide gaps.

PROBLEM: <insert the problem statement here>
ANGLE: <insert one starting angle here>
```

The first two paragraphs are load-bearing. A session that writes its own prompt
and omits them will produce subagents that grind Python for 30 iterations and
confidently get wrong answers — a pattern that fits n≤10 but fails at n=100 is
not a proof.

Starting angles (vary across agents — see `references/solver_heuristics.md`):

- Work out small cases (test past n=3)
- Look for an invariant or monovariant
- Consider the extremal case
- Try induction
- What symmetries?
- Work backwards
- Drop a condition — where does it become trivially false?
- Generalize (inventor's paradox — more structure is sometimes easier)

Each returns its FINAL state (not intermediate rounds):

```
**Verdict**: complete solution | partial result | no progress
**Rounds**: [how many verify→correct cycles]
**Method**: [key idea, one paragraph]
**Detailed Solution**: [full step-by-step, every step justified]
**Answer**: [if applicable]
**Self-verification notes**: [what you caught and fixed; rem