Skill316 estrellas del repoactualizado 25d ago

ai-llm-engineering

The ai-llm-engineering skill is an operational reference hub for building, validating, and scaling production LLM systems using modern standards. It covers data preparation, fine-tuning, evaluation frameworks, deployment optimization, LLMOps monitoring, and safety implementation. Use this skill when designing LLM architectures, selecting deployment tools like vLLM or quantization methods, deciding between RAG and agents, evaluating system quality, or implementing monitoring for production systems.

Ver fuente Repositorio: ordinary-claude-skills

Instalar en Claude Code

Copiar

git clone --depth 1 https://github.com/Microck/ordinary-claude-skills /tmp/ai-llm-engineering && cp -r /tmp/ai-llm-engineering/skills_all/ai-llm-engineering ~/.claude/skills/ai-llm-engineering

Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

Definición

SKILL.md

# LLM Engineering – Operational Skill Hub

A single resource for executing, validating, and scaling LLM systems with **modern production standards**, while delegating domain depth to specialized skills.

This skill provides quick reference, decision frameworks, and navigation to detailed operational patterns for:

- Data, training, fine-tuning (PEFT/LoRA standard)
- Evaluation (automated testing, metrics, rollout gates)
- Deployment (vLLM 24x throughput, FP8/FP4 quantization)
- LLMOps (automated drift detection, retraining)
- Safety (multi-layered defenses, AI-powered guardrails)

**For detailed patterns:** See [Resources](#resources-best-practices--operational-patterns) and [Templates](#templates-copy-paste-ready) sections below.

---

## Quick Reference

| Task | Tool/Framework | Command/Pattern | When to Use |
|------|----------------|-----------------|-------------|
| RAG Pipeline | LlamaIndex, LangChain | Page-level chunking + hybrid retrieval | Dynamic knowledge, 0.648 accuracy |
| Agentic Workflow | LangGraph, AutoGen, CrewAI | ReAct, multi-agent orchestration | Complex tasks, tool use required |
| Prompt Design | Anthropic, OpenAI guides | CoT, few-shot, structured | Task-specific behavior control |
| Evaluation | LangSmith, W&B, RAGAS | Multi-metric (hallucination, bias, cost) | Quality validation, A/B testing |
| Production Deploy | vLLM, TensorRT-LLM | FP8/FP4 quantization, 24x throughput | High-throughput serving, cost optimization |
| Monitoring | Arize Phoenix, LangFuse | Drift detection, 18-second response | Production LLM systems |

---

## Decision Tree: LLM System Architecture

```text
Building LLM application: [Architecture Selection]
    ├─ Need current knowledge?
    │   ├─ Simple Q&A? → Basic RAG (page-level chunking + hybrid retrieval)
    │   └─ Complex retrieval? → Advanced RAG (reranking + contextual retrieval)
    │
    ├─ Need tool use / actions?
    │   ├─ Single task? → Simple agent (ReAct pattern)
    │   └─ Multi-step workflow? → Multi-agent (LangGraph, CrewAI)
    │
    ├─ Static behavior sufficient?
    │   ├─ Quick MVP? → Prompt engineering (CI/CD integrated)
    │   └─ Production quality? → Fine-tuning (PEFT/LoRA)
    │
    └─ Best results?
        └─ Hybrid (RAG + Fine-tuning + Agents) → Comprehensive solution
```

**See [Decision Matrices](resources/decision-matrices.md) for detailed selection criteria.**

---

## When to Use This Skill

Claude should invoke this skill when the user asks about:

- LLM preflight/project checklists, production best practices, or data pipelines
- Building or deploying RAG, agentic, or prompt-based LLM apps
- Prompt design, chain-of-thought (CoT), ReAct, or template patterns
- Troubleshooting LLM hallucination, bias, retrieval issues, or production failures
- Evaluating LLMs: benchmarks, multi-metric eval, or rollout/monitoring
- LLMOps: deployment, rollback, scaling, resource optimization
- Technology stack selection (models, vector DBs, frameworks)
- Production deployment strategies and operational patterns

---

## Scope Boundaries (Use These Skills for Depth)

- **Prompt design & CI/CD** → [ai-prompt-engineering](../ai-prompt-engineering/SKILL.md)
- **RAG pipelines & chunking** → [ai-llm-rag-engineering](../ai-llm-rag-engineering/SKILL.md)
- **Search tuning (BM25, HNSW, hybrid)** → [ai-llm-search-retrieval](../ai-llm-search-retrieval/SKILL.md)
- **Agent architectures & tools** → [ai-agents-development](../ai-agents-development/SKILL.md)
- **Serving optimization/quantization** → [ai-llm-ops-inference](../ai-llm-ops-inference/SKILL.md)
- **Production deployment/monitoring** → [ai-ml-ops-production](../ai-ml-ops-production/SKILL.md)
- **Security/guardrails** → [ai-ml-ops-security](../ai-ml-ops-security/SKILL.md)

---

## Resources (Best Practices & Operational Patterns)

Comprehensive operational guides with checklists, patterns, and decision frameworks:

### Core Operational Patterns

- **[Project Planning Patterns](resources/project-planning-patterns.md)** - Stack selection, FTI pipeline, performance budgeting
  - AI engineering stack selection matrix
  - Feature/Training/Inference (FTI) pipeline blueprint
  - Performance budgeting and goodput gates
  - Progressive complexity (prompt → RAG → fine-tune → hybrid)

- **[Production Checklists](resources/production-checklists.md)** - Pre-deployment validation and operational checklists
  - LLM lifecycle checklist (modern production standards)
  - Data & training, RAG pipeline, deployment & serving
  - Safety/guardrails, evaluation, agentic systems
  - Reliability & data infrastructure (DDIA-grade)
  - Weekly production tasks

- **[Common Design Patterns](resources/common-design-patterns.md)** - Copy-paste ready implementation examples
  - Chain-of-Thought (CoT) prompting
  - ReAct (Reason + Act) pattern
  - RAG pipeline (minimal to advanced)
  - Agentic planning loop
  - Self-reflection and multi-agent collaboration

- **[Decision Matrices](resources/decision-matrices.md)** - Quick reference tables for selection
  - RAG type decision matrix (naive → advanced → modular)
  - Production evaluation table with targets and actions
  - Model selection matrix (GPT-4, Claude, Gemini, self-hosted)
  - Vector database, embedding model, framework selection
  - Deployment strategy matrix

- **[Anti-Patterns](resources/anti-patterns.md)** - Common mistakes and prevention strategies
  - Data leakage, prompt dilution, RAG context overload
  - Agentic runaway, over-engineering, ignoring evaluation
  - Hard-coded prompts, missing observability
  - Detection methods and prevention code examples

### Domain-Specific Patterns

- **[LLMOps Best Practices](resources/llmops-best-practices.md)** - Operational lifecycle and deployment patterns
- **[Evaluation Patterns](resources/eval-patterns.md)** - Testing, metrics, and quality validation
- **[Prompt Engineering Patterns](resources/prompt-engineering-patterns.md)** - Quick reference (canonical skill: [ai-prompt-engineering](../ai-prompt-engineer

Del mismo repositorio

activitypub-testingSkill

Testing patterns for PHPUnit and Playwright E2E tests. Use when writing tests, debugging test failures, setting up test coverage, or implementing test patterns for ActivityPub features.

adaptyvSkill

Cloud laboratory platform for automated protein testing and validation. Use when designing proteins and needing experimental validation including binding assays, expression testing, thermostability measurements, enzyme activity assays, or protein sequence optimization. Also use for submitting experiments via API, tracking experiment status, downloading results, optimizing protein sequences for better expression using computational tools (NetSolP, SoluProt, SolubleMPNN, ESM), or managing protein design workflows with wet-lab validation.

add-uint-supportSkill

Add unsigned integer (uint) type support to PyTorch operators by updating AT_DISPATCH macros. Use when adding support for uint16, uint32, uint64 types to operators, kernels, or when user mentions enabling unsigned types, barebones unsigned types, or uint support.

Agent DevelopmentSkill

This skill should be used when the user asks to "create an agent", "add an agent", "write a subagent", "agent frontmatter", "when to use description", "agent examples", "agent tools", "agent colors", "autonomous agent", or needs guidance on agent structure, system prompts, triggering conditions, or agent development best practices for Claude Code plugins.

AgentDB Advanced FeaturesSkill

Master advanced AgentDB features including QUIC synchronization, multi-database management, custom distance metrics, hybrid search, and distributed systems integration. Use when building distributed AI systems, multi-agent coordination, or advanced vector search applications.

AgentDB Learning PluginsSkill

Create and train AI learning plugins with AgentDB's 9 reinforcement learning algorithms. Includes Decision Transformer, Q-Learning, SARSA, Actor-Critic, and more. Use when building self-learning agents, implementing RL, or optimizing agent behavior through experience.

AgentDB Memory PatternsSkill

Implement persistent memory patterns for AI agents using AgentDB. Includes session memory, long-term storage, pattern learning, and context management. Use when building stateful agents, chat systems, or intelligent assistants.

AgentDB Performance OptimizationSkill

Optimize AgentDB performance with quantization (4-32x memory reduction), HNSW indexing (150x faster search), caching, and batch operations. Use when optimizing memory usage, improving search speed, or scaling to millions of vectors.