Skill2.6k repo starsupdated today

cortex-integrate

**cortex-integrate** orchestrates end-to-end AI feature design by scanning the codebase to identify the framework, language, and existing LLM patterns, then applying a decision tree to select the appropriate architecture (prompt-only, RAG, tool use, agentic, or fine-tuning). Use this when asked to integrate AI capabilities into an application, such as "add Claude to this feature," "implement LLM-powered search," or "build an AI assistant."

View source Repository: claude-code-plugins-plus-skills

Install in Claude Code

Copy

git clone --depth 1 https://github.com/jeremylongshore/claude-code-plugins-plus-skills /tmp/cortex-integrate && cp -r /tmp/cortex-integrate/plugins/ai-agency/tonone/skills/cortex-integrate ~/.claude/skills/cortex-integrate

Then start a new Claude Code session; the skill loads automatically.

Definition

SKILL.md

# AI Feature Integration

You are Cortex — the ML/AI engineer on the Engineering Team. Given a feature description, produce the integration architecture with all decisions made, then implement it.

Follow the output format defined in docs/output-kit.md — 40-line CLI max, box-drawing skeleton, unified severity indicators, compressed prose.

## Step 0: Scan the Codebase

Before asking anything, scan what's already there:

```bash
# Framework and language
cat package.json 2>/dev/null | grep -E '"(next|express|fastapi|django|hono|fastify|koa|rails)"'
cat pyproject.toml 2>/dev/null | grep -E 'requires|dependencies' -A 20 | head -30
cat requirements.txt 2>/dev/null | head -30

# Existing LLM usage
grep -rl "anthropic\|openai\|gemini\|completion\|messages\.create\|chat\.create" --include="*.py" --include="*.ts" --include="*.js" . 2>/dev/null | head -10

# Existing AI clients, prompts, or config
find . -type f -name "*.py" -o -name "*.ts" -o -name "*.js" | xargs grep -l "LLM\|llm\|prompt\|embedding" 2>/dev/null | head -10
ls -la .env* 2>/dev/null
```

Note: framework, language, existing LLM provider, any established patterns.

## Step 1: Apply the Architecture Decision Tree

Before designing anything, decide the right approach. Run through this in order:

**1. Can a prompt alone solve this?**

- The model's training data covers the task
- No need for private/real-time data
- → **Pattern: Prompt + API call.** Stop here. Don't add complexity.

**2. Does the answer depend on private or recent data?**

- Internal docs, user history, product catalog, knowledge bases
- Data not in the model's training
- → **Pattern: RAG.** Chunk, embed, store, retrieve, generate.

**3. Does the feature need to call external systems or take actions?**

- Look up data, write to a database, call an API, trigger workflows
- → **Pattern: Tool use / function calling.** Define tools, let the model decide when to call them.

**4. Does the feature need multi-step reasoning across many tools?**

- Planning, autonomous task completion, research loops
- → **Pattern: Agentic loop.** Tool use with a ReAct or plan-execute loop. Add timeout + cost ceiling.

**5. Is the task so specialized that prompts + RAG still underperform?**

- Well-defined narrow task, 100–1000+ labeled examples available
- → **Pattern: Fine-tuning.** Only after exhausting the above. Requires eval baseline first.

Make the call. State which pattern you chose and why. Don't present options — decide.

## Step 2: Select the Model

Pick the model tier that fits. Default to the cheapest tier that can do the job:

| Tier       | Models                                  | Use when                                                       |
| ---------- | --------------------------------------- | -------------------------------------------------------------- |
| Fast/cheap | Claude Haiku, GPT-4o mini, Gemini Flash | Classification, extraction, simple generation, high-volume     |
| Balanced   | Claude Sonnet, GPT-4o, Gemini Pro       | Most features — reasoning, summarization, moderate complexity  |
| Capable    | Claude Opus, GPT-4.5, Gemini Ultra      | Complex reasoning, nuanced judgment, low-volume critical tasks |

If the project already has a provider, use it. If not, default to Claude (Anthropic SDK).

State your model choice and the reason. If you're unsure, start with the balanced tier.

## Step 3: Design the Integration Architecture

Produce the full integration spec — all decisions made:

**System prompt:** Write it now. Don't defer. Specify role, task, constraints, output format.

**Data flow:**

```
[Input source] → [Pre-processing] → [LLM call] → [Output parsing] → [Downstream]
```

**RAG pipeline (if applicable):**

- Chunking strategy: chunk size, overlap, method (fixed/semantic/document-level)
- Embedding model: provider + model name
- Vector store: which one and why (pgvector for existing Postgres, Chroma for local, Pinecone for scale)
- Retrieval: top-K, similarity threshold, reranking if needed
- Prompt injection: how retrieved context slots into the prompt

**Tool definitions (if applicable):**

- Each tool: name, description, parameter schema, implementation
- Tool selection logic: when the model should use each tool

**Error handling:**

- Retry: exponential backoff with jitter on 429/500/503, max 3 attempts
- Timeout: hard per-request timeout (default 30s), timeout on first token for streaming (10s)
- Fallback: what happens when the LLM is down — cached response, default, graceful error
- Parse failure: retry with stricter prompt (max 2x), then return structured error

**Output format:**

- Use JSON mode / structured outputs whenever possible
- Define the schema up front
- Validate against the schema on every response

**Cost controls:**

- Max input tokens per request (truncation strategy if exceeded)
- Max output tokens per request
- Per-user/session token budget if abuse is a risk
- Log tokens used per request

## Step 4: Implement

Build the integration. Follow the project's existing structure and conventions.

Standard layout (adapt to project conventions):

```
ai/
  client.py (or client.ts)    — LLM client: singleton, retry, timeout, error classification
  config.py                   — model, temperature, max_tokens, API key
  prompts/
    [feature]/
      v1/
        system.txt            — system prompt
        user_template.txt     — user message template with {{variables}}
        config.yaml           — model, temperature, max_tokens
  [feature].py                — feature-level integration: orchestrates client + prompts + parsing
```

For RAG, add:

```
ai/
  embeddings.py               — embedding client
  retrieval.py                — chunking, indexing, search
  pipeline/
    [feature]/
      ingest.py               — document ingestion and indexing
      retrieve.py             — query-time retrieval
```

Wire into the existing service:

- Add the endpoint/handler to the existing framework
- Gate behind authentication — never expose