agentic-engineering
Agentic Engineering is a Claude Code skill for managing AI-driven implementation workflows where agents handle most technical work while humans maintain quality oversight. Use it when decomposing engineering tasks into independently verifiable 15-minute units, establishing eval-first testing loops to measure progress against baselines, routing work to appropriate Claude model tiers based on complexity, and conducting focused human reviews prioritizing invariants, edge cases, and security rather than style.
git clone --depth 1 https://github.com/affaan-m/ECC /tmp/agentic-engineering && cp -r /tmp/agentic-engineering/.kiro/skills/agentic-engineering ~/.claude/skills/agentic-engineeringSKILL.md
# Agentic Engineering Use this skill for engineering workflows where AI agents perform most implementation work and humans enforce quality and risk controls. ## Operating Principles 1. Define completion criteria before execution. 2. Decompose work into agent-sized units. 3. Route model tiers by task complexity. 4. Measure with evals and regression checks. ## Eval-First Loop 1. Define capability eval and regression eval. 2. Run baseline and capture failure signatures. 3. Execute implementation. 4. Re-run evals and compare deltas. **Example workflow:** ``` 1. Write test that captures desired behavior (eval) 2. Run test → capture baseline failures 3. Implement feature 4. Re-run test → verify improvements 5. Check for regressions in other tests ``` ## Task Decomposition Apply the 15-minute unit rule: - Each unit should be independently verifiable - Each unit should have a single dominant risk - Each unit should expose a clear done condition **Good decomposition:** ``` Task: Add user authentication ├─ Unit 1: Add password hashing (15 min, security risk) ├─ Unit 2: Create login endpoint (15 min, API contract risk) ├─ Unit 3: Add session management (15 min, state risk) └─ Unit 4: Protect routes with middleware (15 min, auth logic risk) ``` **Bad decomposition:** ``` Task: Add user authentication (2 hours, multiple risks) ``` ## Model Routing Choose model tier based on task complexity: - **Haiku**: Classification, boilerplate transforms, narrow edits - Example: Rename variable, add type annotation, format code - **Sonnet**: Implementation and refactors - Example: Implement feature, refactor module, write tests - **Opus**: Architecture, root-cause analysis, multi-file invariants - Example: Design system, debug complex issue, review architecture **Cost discipline:** Escalate model tier only when lower tier fails with a clear reasoning gap. ## Session Strategy - **Continue session** for closely-coupled units - Example: Implementing related functions in same module - **Start fresh session** after major phase transitions - Example: Moving from implementation to testing - **Compact after milestone completion**, not during active debugging - Example: After feature complete, before starting next feature ## Review Focus for AI-Generated Code Prioritize: - Invariants and edge cases - Error boundaries - Security and auth assumptions - Hidden coupling and rollout risk Do not waste review cycles on style-only disagreements when automated format/lint already enforce style. **Review checklist:** - [ ] Edge cases handled (null, empty, boundary values) - [ ] Error handling comprehensive - [ ] Security assumptions validated - [ ] No hidden coupling between modules - [ ] Rollout risk assessed (breaking changes, migrations) ## Cost Discipline Track per task: - Model tier used - Token estimate - Retries needed - Wall-clock time - Success/failure outcome **Example tracking:** ``` Task: Implement user login Model: Sonnet Tokens: ~5k input, ~2k output Retries: 1 (initial implementation had auth bug) Time: 8 minutes Outcome: Success ``` ## When to Use This Skill - Managing AI-driven development workflows - Planning agent task decomposition - Optimizing model tier selection - Implementing eval-first development - Reviewing AI-generated code - Tracking development costs ## Integration with Other Skills - **tdd-workflow**: Combine with eval-first loop for test-driven development - **verification-loop**: Use for continuous validation during implementation - **search-first**: Apply before implementation to find existing solutions - **coding-standards**: Reference during code review phase
Structured self-debugging workflow for AI agent failures using capture, diagnosis, contained recovery, and introspection reports.
Build an evidence-backed ECC install plan for a specific repo by sorting skills, commands, rules, hooks, and extras into DAILY vs LIBRARY buckets using parallel repo-aware review passes. Use when ECC should be trimmed to what a project actually needs instead of loading the full bundle.
>
Write articles, guides, blog posts, tutorials, newsletter issues, and other long-form content in a distinctive voice derived from supplied examples or brand guidance. Use when the user wants polished written content longer than a paragraph, especially when voice consistency, structure, and credibility matter.
>
Build a source-derived writing style profile from real posts, essays, launch notes, docs, or site copy, then reuse that profile across content, outreach, and social workflows. Use when the user wants voice consistency without generic AI writing tropes.
Bun as runtime, package manager, bundler, and test runner. When to choose Bun vs Node, migration notes, and Vercel support.
>