Skip to main content
ClaudeWave
Skill730 repo starsupdated 15d ago

mongodb-schema-design

This Claude Code skill provides MongoDB schema design patterns and anti-patterns to guide data modeling decisions. Use it when designing new schemas, migrating from relational databases, reviewing existing models for performance issues, deciding between embedding and referencing data, modeling relationships, implementing hierarchical structures, hitting document size limits, or adding schema validation to collections.

Install in Claude Code
Copy
git clone --depth 1 https://github.com/fcakyon/claude-codex-settings /tmp/mongodb-schema-design && cp -r /tmp/mongodb-schema-design/plugins/mongodb-skills/skills/mongodb-schema-design ~/.claude/skills/mongodb-schema-design
Then start a new Claude Code session; the skill loads automatically.

SKILL.md

# MongoDB Schema Design

Data modeling patterns and anti-patterns for MongoDB, maintained by MongoDB. Bad schema is the root cause of most MongoDB performance and cost issues—queries and indexes cannot fix a fundamentally wrong model.

## When to Apply

Reference these guidelines when:
- Designing a new MongoDB schema from scratch
- Migrating from SQL/relational databases to MongoDB
- Reviewing existing data models for performance issues
- Troubleshooting slow queries or growing document sizes
- Deciding between embedding and referencing
- Modeling relationships (one-to-one, one-to-many, many-to-many)
- Implementing tree/hierarchical structures
- Seeing Atlas Schema Suggestions or Performance Advisor warnings
- Hitting the 16MB document limit
- Adding schema validation to existing collections

## Quick Reference

### 1. Schema Anti-Patterns - 3 rules

- [antipattern-unnecessary-collections](references/antipattern-unnecessary-collections.md) - Splitting homogeneous data into multiple collections is often an anti-pattern; consult this reference to validate whether this is the case.
- [antipattern-excessive-lookups](references/antipattern-excessive-lookups.md) - When encountering overly normalized collections that reference each other or frequent and possibly slow $lookup operations, consult this reference to validate whether this is problematic and how to fix it.
- [antipattern-unnecessary-indexes](references/antipattern-unnecessary-indexes.md) - Consult this reference when indexes overlap or are not used by queries, to identify and remove unnecessary indexes that add overhead without benefit.

### 2. Schema Fundamentals - 4 rules

- [fundamental-embed-vs-reference](references/fundamental-embed-vs-reference.md) - Consult this reference for approaches to modeling different types of relationships (1:1, 1:few, 1:many, many:many, tree/hierarchical data) and how to decide between embedding and referencing based on access patterns.
- [fundamental-document-model](references/fundamental-document-model.md) - Fundamentals of the document model. Consult this reference when migrating from SQL or other normalized data to a document database like MongoDB.
- [fundamental-schema-validation](references/fundamental-schema-validation.md) - Consult this reference when creating new collections, or adding validation to existing collections, for example in response to finding inconsistent document structures or data quality issues.
- [fundamental-document-size](references/fundamental-document-size.md) - Consult this reference when documents hit the hard 16MB limit, or when accesses are slower than expected as a result of large documents.

### 3. Design Patterns - 11 rules

- [pattern-approximation](references/pattern-approximation.md) - Use approximate values for high-frequency counters
- [pattern-archive](references/pattern-archive.md) - Move historical data to separate/cold storage for performance
- [pattern-attribute](references/pattern-attribute.md) - Collapse many optional fields into key-value attributes
- [pattern-bucket](references/pattern-bucket.md) - Group time-series or IoT data into buckets
- [pattern-computed](references/pattern-computed.md) - Pre-calculate expensive aggregations
- [pattern-document-versioning](references/pattern-document-versioning.md) - Track document changes to enable historical queries and audit trails
- [pattern-extended-reference](references/pattern-extended-reference.md) - Cache frequently-accessed data from related entities
- [pattern-outlier](references/pattern-outlier.md) - Handle collections in which a small subset of documents are much larger than the rest, to prevent outliers from dominating memory and index costs
- [pattern-polymorphic](references/pattern-polymorphic.md) - Store different types of entities in the same collection, often when they are different types of the same base entity (e.g. different types of users or different types of products)
- [pattern-schema-versioning](references/pattern-schema-versioning.md) - Schema evolution, preventing drift, and safe online migrations. Consult when encountering inconsistent document structures, or when planning a schema change that cannot be applied atomically.
- [pattern-time-series-collections](references/pattern-time-series-collections.md) - Use native time series collections for high-frequency time series data

## Key Principle

> **"Data that is accessed together should be stored together."**

This is MongoDB's core philosophy. Embedding related data eliminates joins, reduces round trips, and enables atomic updates. Reference only when you must.

A core way to implement this philosophy is the fact that MongoDB exposes **flexible schemas**. This means you can have different fields in different documents, and even different structures. This allows you to model data in the way that best fits your access patterns, without being constrained by a rigid schema. For example, if different documents have different sets of fields, that is perfectly fine as long as it serves your application's needs. You can also use schema validation to enforce certain rules while still allowing for flexibility.

Another implication of the key principle is that information about the expected read and write workload becomes very relevant to schema design. If pieces of information from different entities are often queried or updated together, that means that prioritizing co-location of that data in the same document can lead to significant performance benefits. On the other hand, if certain pieces of information are rarely accessed together, it may make sense to store them separately to avoid loading more data than necessary.

#### Schema Fundamentals Summary

- **Embed vs Reference**: Choose embedding or referencing based on access patterns: embed when data is always accessed together (1:1, 1:few, bounded arrays, atomic updates needed); reference when data is accessed independently, relationships are many-to-many, or arrays can grow without b
agent-browserSkill

Agent-browser usage guide. Read this before running any agent-browser commands. Covers the snapshot-and-ref workflow, navigating pages, interacting with elements (click, fill, type, select), extracting text and data, taking screenshots, managing tabs, handling forms and auth, waiting for content, running multiple browser sessions in parallel, and troubleshooting common failures. Use when the user asks to interact with a website, fill a form, click something, extract data, take a screenshot, log into a site, test a web app, or automate any browser task.

electronSkill

Automate Electron desktop apps (VS Code, Slack, Discord, Figma, Notion, Spotify, etc.) using agent-browser via Chrome DevTools Protocol. Use when the user needs to interact with an Electron app, automate a desktop app, connect to a running app, control a native app, or test an Electron application. Triggers include "automate Slack app", "control VS Code", "interact with Discord app", "test this Electron app", "connect to desktop app", or any task requiring automation of a native Electron application.

docxSkill

Use this skill whenever the user wants to create, read, edit, or manipulate Word documents (.docx files). Triggers include: any mention of 'Word doc', 'word document', '.docx', or requests to produce professional documents with formatting like tables of contents, headings, page numbers, or letterheads. Also use when extracting or reorganizing content from .docx files, inserting or replacing images in documents, performing find-and-replace in Word files, working with tracked changes or comments, or converting content into a polished Word document. If the user asks for a 'report', 'memo', 'letter', 'template', or similar deliverable as a Word or .docx file, use this skill. Do NOT use for PDFs, spreadsheets, Google Docs, or general coding tasks unrelated to document generation.

pdfSkill

Use when tasks involve reading, creating, or reviewing PDF files where rendering and layout matter; prefer visual checks by rendering pages (Poppler) and use Python tools such as `reportlab`, `pdfplumber`, and `pypdf` for generation and extraction.

pptxSkill

Use this skill any time a .pptx file is involved in any way — as input, output, or both. This includes: creating slide decks, pitch decks, or presentations; reading, parsing, or extracting text from any .pptx file (even if the extracted content will be used elsewhere, like in an email or summary); editing, modifying, or updating existing presentations; combining or splitting slide files; working with templates, layouts, speaker notes, or comments. Trigger whenever the user mentions \"deck,\" \"slides,\" \"presentation,\" or references a .pptx filename, regardless of what they plan to do with the content afterward. If a .pptx file needs to be opened, created, or touched, use this skill.

xlsxSkill

Use this skill any time a spreadsheet file is the primary input or output. This means any task where the user wants to: open, read, edit, or fix an existing .xlsx, .xlsm, .csv, or .tsv file (e.g., adding columns, computing formulas, formatting, charting, cleaning messy data); create a new spreadsheet from scratch or from other data sources; or convert between tabular file formats. Trigger especially when the user references a spreadsheet file by name or path — even casually (like \"the xlsx in my downloads\") — and wants something done to it or produced from it. Also trigger for cleaning or restructuring messy tabular data files (malformed rows, misplaced headers, junk data) into proper spreadsheets. The deliverable must be a spreadsheet file. Do NOT trigger when the primary deliverable is a Word document, HTML report, standalone Python script, database pipeline, or Google Sheets API integration, even if tabular data is involved.

azure-usageSkill

This skill should be used when user asks to "query Azure resources", "list storage accounts", "manage Key Vault secrets", "work with Cosmos DB", "check AKS clusters", "use Azure MCP", or interact with any Azure service.

setupSkill

This skill should be used when user encounters "Tavily MCP error", "Tavily API key invalid", "web search not working", "Tavily failed", or needs help configuring Tavily integration.