skill-extract
The extract skill reverse-engineers undocumented design systems and product architectures by analyzing codebases, screenshots, and live products to automatically generate structured documentation. Use this skill to map design tokens and component variants from React/Vue/Svelte frameworks, detect microservice boundaries and API endpoints, generate C4 architecture diagrams, and create implementation-ready Storybook documentation without manual documentation effort.
git clone --depth 1 https://github.com/nyldn/claude-octopus /tmp/skill-extract && cp -r /tmp/skill-extract/.claude/skills/extract-skill ~/.claude/skills/skill-extractSKILL.md
# Extract Skill - Implementation Guide
## Overview
The `extract` skill provides comprehensive reverse-engineering capabilities for design systems and product architectures. It transforms undocumented codebases into structured, implementation-ready documentation.
## Capabilities
### Design System Extraction
- **Token Extraction**: Colors, typography, spacing, shadows from code or CSS
- **Component Analysis**: Props, variants, usage patterns across React/Vue/Svelte
- **Pattern Detection**: Layout patterns, design rules, accessibility guidelines
- **Storybook Generation**: Auto-generated stories with variants and controls
### Product Architecture Extraction
- **Service Detection**: Microservice boundaries, modules, domain boundaries
- **API Mapping**: REST, GraphQL, tRPC, gRPC endpoint cataloging
- **Data Modeling**: ORM schema extraction (Prisma, TypeORM, Sequelize)
- **Feature Inventory**: Route-based and domain-based feature detection
- **C4 Diagrams**: Automated architecture visualization (Mermaid)
## Technical Implementation
### Token Extraction Pipeline
**Priority Order** (High to Low Confidence):
1. **Code-Defined** (95%): `theme.ts`, `tokens.json`, Tailwind config
2. **CSS Variables** (90%): `:root` declarations
3. **Computed Styles** (60%): DOM analysis
4. **Inferred** (40-60%): Color clustering, scale detection
**Color Clustering Algorithm**:
- Uses CIEDE2000 for perceptually-accurate color distance
- K-means++ initialization for stable clustering
- Default k=8 clusters for primary palettes
- ΔE < 2 threshold for duplicate detection
### Component Analysis
**Detection Strategies**:
- AST parsing for TypeScript/JavaScript
- Prop extraction from interfaces and PropTypes
- Variant detection from union types
- Usage tracking across codebase
**Supported Frameworks**:
- React (functional, class, hooks)
- Vue (SFC, Composition API, Options API)
- Svelte (script/template separation)
### Architecture Detection
**Service Boundary Heuristics**:
- Package.json in subdirectories
- Independent deployment configs
- Team ownership boundaries
- Communication pattern analysis
**API Endpoint Detection**:
- Decorator-based routing (NestJS, routing-controllers)
- Express/Fastify route definitions
- GraphQL resolver classes
- tRPC router procedures
- Protocol Buffer (.proto) files
## Multi-AI Orchestration
When enabled, the extract feature uses multiple AI providers for higher accuracy:
**Provider Roles**:
- **Claude**: Synthesis, conflict resolution, documentation
- **Codex**: Code-level analysis, type extraction, architecture
- **Gemini**: Pattern recognition, alternative interpretations, UX insights
**Consensus Mechanism**:
- Threshold: 67% (2/3 providers must agree)
- Disagreements logged in `90_evidence/disagreements.md`
- Confidence scores attached to all outputs
## Output Structure
```
octopus-extract/
└── project-name/
└── timestamp/
├── README.md # Navigation and summary
├── metadata.json # Extraction parameters
│
├── 00_intent/
│ ├── answers.json # User intent responses
│ ├── intent-contract.md # Human-readable summary
│ └── detection-report.md # Stack auto-detection results
│
├── 10_design/
│ ├── tokens.json # W3C Design Tokens format
│ ├── tokens.css # CSS custom properties
│ ├── tokens.md # Human-readable token docs
│ ├── components.csv # Component inventory (tabular)
│ ├── components.json # Structured component data
│ ├── patterns.md # Layout and design patterns
│ └── storybook/ # Storybook scaffold (optional)
│ ├── .storybook/
│ └── stories/
│
├── 20_product/
│ ├── product-overview.md # What, who, key journeys
│ ├── feature-inventory.md # Features by domain
│ ├── architecture.md # C4 text description
│ ├── architecture.mmd # Mermaid C4 diagrams
│ ├── PRD.md # AI-agent executable PRD
│ ├── user-stories.md # Gherkin-style scenarios
│ ├── api-contracts.md # Endpoint specifications
│ ├── data-model.md # Entity relationships
│ └── implementation-plan.md # Phased milestones
│
└── 90_evidence/
├── quality-report.md # Coverage and confidence metrics
├── disagreements.md # Multi-AI conflicts
├── extraction-log.md # Timestamped progress log
└── references.json # File paths per claim
```
## Quality Gates
Automated validation ensures extraction quality:
1. **Token Coverage**: Fail if 0 tokens in design mode
2. **Component Coverage**: Warn if < 50% of component files detected
3. **Architecture Completeness**: Warn if no services detected in product mode
4. **Multi-AI Consensus**: Fail if < 50% agreement on key outputs
## Usage Patterns
### Basic Extraction
```bash
/octo:extract ./my-app
```
### Design-Only Extraction
```bash
/octo:extract ./my-app --mode design --storybook true
```
### Deep Analysis with Multi-AI
```bash
/octo:extract ./my-app --depth deep --multi-ai force
```
### URL Extraction
```bash
/octo:extract https://example.com --mode design --depth quick
```
## Integration with Other Skills
- **/octo:review**: Review extracted outputs for quality
- **/octo:deliver**: Validate extraction completeness
- **/octo:docs**: Generate additional documentation from extractions
## Error Handling
Common error codes:
- `ERR-001`: Invalid input (path/URL not found)
- `ERR-002`: Network timeout (URL extraction)
- `ERR-003`: Permission denied
- `ERR-004`: Out of memory (use `--depth quick`)
- `VAL-001`: Validation failed (no tokens detected)
- `VAL-004`: Low multi-AI consensus
## Performance Targets
| DepthBackend architect for scalable API design, microservices, and distributed systems
Cloud architect for AWS/Azure/GCP infrastructure, IaC, FinOps, and multi-cloud strategies
Code review expert for quality analysis, security vulnerabilities, and production reliability
Database architect for data modeling, technology selection, schema design, and migration planning
Debugging specialist for errors, test failures, and unexpected behavior
Technical documentation architect for comprehensive system docs and architecture guides
Frontend developer for React, Next.js, responsive layouts, and accessible UI components
Performance engineer for optimization, observability, and scalable system performance