codex-retrospective
Use when you want Codex to review its own recent history (last N days or specific period) and improve its behavior. Produces minimal, high-signal updates to AGENTS.md and tiny reusable skills. The goal is long-term fluency — Codex gradually becomes better at your specific style, constraints, and workflows.
git clone --depth 1 https://github.com/majiayu000/spellbook /tmp/codex-retrospective && cp -r /tmp/codex-retrospective/skills/codex-retrospective ~/.claude/skills/codex-retrospectiveSKILL.md
# Codex Retrospective
A structured self-improvement loop for Codex.
This skill turns Codex from a one-off collaborator into a system that **gets meaningfully better at working with you** over time by systematically eating its own usage history.
## Why This Exists
Most people improve their agent usage by manually maintaining `AGENTS.md` or skills when they notice problems. This skill makes that process deliberate, regular, and high-leverage.
It is directly inspired by strong practices from heavy users (especially Greg Brockman's emphasis on updating the "constitution" from real failures and friction), but executed as a repeatable arsenal-style workflow.
## Core Principle
**Minimal effective change, grounded in evidence from actual history.**
Every output must be:
- The smallest possible useful addition or edit
- Backed by specific patterns observed in recent sessions
- Designed to prevent repetition of the same friction
## When to Trigger
- After a painful or repetitive session ("Codex should have known better by now")
- Periodically (weekly or monthly): "Do a retrospective on the last 7/30 days of my Codex usage"
- When `codex-fluent` reports that Codex is constantly re-asking for the same context or preferences
- After you have manually corrected the same class of mistake multiple times
## The Retrospective Process
### 1. Scope Definition
You specify the time window or focus area:
- "Last 14 days"
- "The three big auth + payments threads from last month"
- "All sessions involving the new Go service"
### 2. History Analysis (The Skill Guides Codex)
Codex is instructed to look for:
- Recurring mistakes or inefficient patterns
- Things the user had to explain or correct repeatedly
- High-friction moments (lots of back-and-forth, context loss, wrong assumptions)
- Successful patterns worth encoding so they happen by default
- Opportunities to extract tiny, reusable skills
### 3. Output — Strict Format
The skill forces Codex to produce output in this order:
1. **Retrospective Summary** (short, evidence-based)
2. **Proposed AGENTS.md Updates** (exact diff or append text only)
3. **New or Refined Minimal Skills** (at most 1-2 tiny ones, full SKILL.md with frontmatter)
4. **Rationale + Evidence** (which sessions/patterns drove each proposal)
5. **Application Plan** (how to safely apply the changes)
### 4. Human Gate + Application
You review. The skill then helps you apply the minimal changes cleanly (never blindly overwriting large sections of AGENTS.md).
## Relationship with codex-fluent
These two skills are designed to be used together:
- `codex-retrospective` finds **behavioral and knowledge** improvements (better defaults, new rules, extracted skills).
- `codex-fluent` finds **state and context** hygiene improvements (session bloat, missing handoffs, archive opportunities).
A good monthly ritual for serious users:
1. Run `codex-retrospective` on the last 30 days.
2. Run `codex-fluent` diagnosis.
3. Apply the best changes from both.
## Hard Constraints on Output
- Never propose large rewrites of AGENTS.md.
- Never create big new skills. Tiny, focused, high-ROI only.
- Every proposal must reference concrete history ("In the payment retry thread on May 3rd and the similar incident on May 18th...").
- If nothing high-confidence was found, say so clearly instead of manufacturing improvements.
## Gotchas
- Do not turn a single annoying session into a permanent rule unless it was
unusually high impact. Prefer "no change" over a weak constitution update.
- Do not rely on memory vibes. Each proposed update needs a concrete session,
date range, file, PR, or repeated correction pattern that can be checked.
- Keep generated AGENTS.md text short. If the proposed rule needs multiple
paragraphs, it probably belongs in a tiny skill or a project doc instead.
- Application is gated by the user. The retrospective can propose exact diffs,
but it must not silently rewrite AGENTS.md or existing skills.
## References
- `references/retrospective-prompt.md` — The core prompt template used to drive Codex's self-analysis
- `references/agents-md-update-rules.md` — Strict rules for what kind of changes are acceptable
- `references/minimal-skill-criteria.md` — What qualifies as a "tiny useful skill" worth extracting
- `references/examples/` — Real (sanitized) retrospective outputs and the resulting AGENTS.md diffs
## Success Looks Like
After 4–8 weeks of regular use:
- Codex makes fewer basic assumption errors in your domain
- You spend less time re-explaining preferences and constraints
- Your AGENTS.md and skills folder feel like they were written by someone who has worked with you for a long time (because they were)
- New projects ramp up faster because the constitution already encodes hard-won lessons
Start with a focused 7- or 14-day retrospective on a project where you've felt the most friction recently. The pattern will become natural quickly.Senior backend TypeScript architect specializing in Bun/Node.js runtime, API design, database optimization, and scalable server architecture.
Expert at exploring and understanding legacy and unfamiliar codebases. Maps dependencies, identifies patterns, and creates documentation for complex systems.
Kubernetes architect specializing in cluster design, manifests, Helm charts, GitOps workflows, security policies, and production operations.
Systematic open source contributor that analyzes projects, finds suitable issues, implements fixes, and creates high-quality PRs with high acceptance probability.
Application security expert specializing in SAST, vulnerability assessment, OWASP Top 10, compliance auditing, and security architecture review.
Fullstack code reviewer with 15+ years experience analyzing code for security vulnerabilities, performance bottlenecks, architectural decisions, and best practices.
Senior technical lead who analyzes complex projects and coordinates multi-step development tasks. Delegates to specialized agents and ensures quality delivery.
Use when the user explicitly asks to stage all current changes, create a commit, and push to the remote after safety checks.