Skip to main content
ClaudeWave
Skill66 repo starsupdated 29d ago

llm-app-patterns

Provides architectural patterns for LLM-powered applications and AI assistants, including prompt engineering, RAG, agent loops, conversation management, and evaluation. Use when building AI-based features, chatbots, or complex AI system architectures.

Install in Claude Code
Copy
git clone --depth 1 https://github.com/tranhieutt/software_development_department /tmp/llm-app-patterns && cp -r /tmp/llm-app-patterns/.claude/skills/llm-app-patterns ~/.claude/skills/llm-app-patterns
Then start a new Claude Code session; the skill loads automatically.

SKILL.md

# LLM Application & AI Assistant Patterns

## Resources



## Architecture decision matrix

| Pattern | Use when | Cost |
|---|---|---|
| Simple RAG | FAQ, docs Q&A | Low |
| Hybrid RAG (semantic + BM25) | Mixed query types | Medium |
| Function calling | Structured tool use | Low |
| ReAct agent | Multi-step reasoning | Medium |
| Plan-and-execute | Complex decomposable tasks | High |
| Multi-agent | Research, critique-refine | Very High |

## RAG: critical config numbers

```python
CHUNK_CONFIG = {
    "chunk_size": 512,       # tokens — sweet spot for most docs
    "chunk_overlap": 50,     # prevents context loss at boundaries
    "separators": ["\n\n", "\n", ". ", " "],
}
# Hybrid search alpha: 1.0=semantic only, 0.0=BM25 only, 0.5=balanced
```

## RAG: retrieval strategies

```python
# Basic: semantic search
results = vector_db.similarity_search(embed(query), top_k=5)

# Better: hybrid (semantic + keyword via RRF)
def hybrid_search(query, alpha=0.5):
    return rrf_merge(vector_db.search(query), bm25_search(query), alpha)

# Best for recall: multi-query (3 variations, deduplicate)
queries = llm.generate_variations(query, n=3)
results = deduplicate([semantic_search(q) for q in queries])
```

## RAG: generation prompt template

```python
RAG_PROMPT = """Answer based ONLY on the context below.
If insufficient, say "I don't have enough information."

Context: {context}
Question: {question}
Answer:"""
```

## Agent: function calling loop

```python
messages = [{"role": "user", "content": question}]
while True:
    response = llm.chat(messages=messages, tools=TOOLS, tool_choice="auto")
    if not response.tool_calls:
        return response.content
    for call in response.tool_calls:
        result = execute_tool(call.name, call.arguments)
        messages.append({"role": "tool", "tool_call_id": call.id, "content": str(result)})
```

## Production: caching (only temperature=0 responses)

```python
def get_or_generate(prompt, model, **kwargs):
    deterministic = kwargs.get("temperature", 1.0) == 0
    if deterministic:
        key = sha256(f"{model}:{prompt}:{json.dumps(kwargs, sort_keys=True)}")
        if cached := redis.get(key): return cached
    response = llm.generate(prompt, model=model, **kwargs)
    if deterministic: redis.setex(key, 3600, response)
    return response
```

## Production: retry + fallback

```python
from tenacity import retry, wait_exponential, stop_after_attempt

@retry(wait=wait_exponential(multiplier=1, min=4, max=60), stop=stop_after_attempt(5))
def call_llm(prompt): return llm.generate(prompt)

# Fallback chain
for model in [primary] + fallbacks:
    try: return llm.generate(prompt, model=model)
    except (RateLimitError, APIError): continue
```

## LLMOps: key metrics

```
Latency : p50, p99 response time
Quality : satisfaction (thumbs), task completion %, hallucination rate
Cost    : cost_per_request, tokens_per_request, cache_hit_rate
Health  : error_rate, timeout_rate, retry_rate
```

## Embedding model selection

| Model | Dims | Cost | Use |
|---|---|---|---|
| text-embedding-3-small | 1536 | $0.02/1M | Most cases |
| text-embedding-3-large | 3072 | $0.13/1M | High accuracy |
| bge-large (local) | 1024 | Free | Self-hosted |
accessibility-specialistSubagent

The Accessibility Specialist ensures the software is accessible to the widest possible audience. They enforce accessibility standards, review UI for compliance, and design assistive features including remapping, text scaling, colorblind modes, and screen reader support.

ai-programmerSubagent

The AI Programmer implements intelligent system features: recommendation engines, classification pipelines, LLM integrations, decision logic, and autonomous agent behavior. Use this agent for AI/ML feature implementation, model integration, intelligent automation, or AI system debugging.

analytics-engineerSubagent

The Analytics Engineer designs telemetry systems, user behavior tracking, A/B test frameworks, and data analysis pipelines. Use this agent for event tracking design, dashboard specification, A/B test design, or user behavior analysis methodology.

backend-developerSubagent

The Backend Developer builds and maintains server-side logic, APIs, databases, authentication, and integrations. Use this agent for REST/GraphQL API implementation, database operations, authentication systems, background jobs, microservices, server performance, and backend testing. Works from API design contracts and PRDs.

community-managerSubagent

The Community Manager handles user-facing communications, feedback synthesis, support escalation, and community engagement. Use this agent for drafting release announcements, synthesizing user feedback into actionable insights, writing support documentation, or coordinating community-facing communication around releases and incidents.

ctoSubagent

The CTO (Chief Technical Officer) owns the high-level technical vision, architecture decisions, technology choices, and technical strategy. Use this agent for architecture-level decisions, technology evaluations, cross-system conflicts, and when a technical choice will constrain or enable product possibilities. This is the highest technical authority in the department.

data-engineerSubagent

The Data Engineer designs database schemas, builds data pipelines, manages migrations, and owns the data infrastructure. Use this agent for schema design, complex migrations, data modeling, ETL/ELT pipelines, database performance optimization, analytics infrastructure, and data integrity strategies.

devops-engineerSubagent

The DevOps Engineer maintains build pipelines, CI/CD configuration, version control workflow, and deployment infrastructure. Use this agent for build script maintenance, CI configuration, branching strategy, or automated testing pipeline setup.