Skill110 estrellas del repoactualizado 7mo ago

ml-cv-specialist

The ml-cv-specialist skill provides decision frameworks and technical guidance for selecting machine learning and computer vision models, choosing between API-based and self-hosted solutions, and designing production ML systems. Use it when planning architecture for ML features, evaluating model options across text, vision, audio, or structured data tasks, or determining cost-effective deployment strategies for AI-powered applications.

Ver fuente Repositorio: claude-cto-team

Instalar en Claude Code

Copiar

git clone --depth 1 https://github.com/alirezarezvani/claude-cto-team /tmp/ml-cv-specialist && cp -r /tmp/ml-cv-specialist/skills/ml-cv-specialist ~/.claude/skills/ml-cv-specialist

Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

Definición

SKILL.md

# ML/CV Specialist

Provides specialized guidance for machine learning and computer vision system design, model selection, and production deployment.

## When to Use

- Selecting ML models for specific use cases
- Designing training and inference pipelines
- Optimizing ML system performance and cost
- Evaluating build vs. API for ML capabilities
- Planning data pipelines for ML workloads

## ML System Design Framework

### Model Selection Decision Tree

```
Use Case Identified
    │
    ├─► Text/Language Tasks
    │   ├─► Classification → BERT, DistilBERT, or API (OpenAI, Claude)
    │   ├─► Generation → GPT-4, Claude, Llama (self-hosted)
    │   ├─► Embeddings → OpenAI Ada, sentence-transformers
    │   └─► Search/RAG → Vector DB + Embeddings + LLM
    │
    ├─► Computer Vision Tasks
    │   ├─► Classification → ResNet, EfficientNet, ViT
    │   ├─► Object Detection → YOLOv8, DETR, Faster R-CNN
    │   ├─► Segmentation → SAM, Mask R-CNN, U-Net
    │   ├─► OCR → Tesseract, PaddleOCR, Cloud Vision API
    │   └─► Face Recognition → InsightFace, DeepFace
    │
    ├─► Audio Tasks
    │   ├─► Speech-to-Text → Whisper, DeepSpeech, Cloud APIs
    │   ├─► Text-to-Speech → ElevenLabs, Coqui TTS
    │   └─► Audio Classification → PANNs, AudioSet models
    │
    └─► Structured Data
        ├─► Tabular → XGBoost, LightGBM, CatBoost
        ├─► Time Series → Prophet, ARIMA, Transformer-based
        └─► Recommendations → Two-tower, matrix factorization
```

---

## API vs. Self-Hosted Decision

### When to Use APIs

| Factor | API Preferred | Self-Hosted Preferred |
|--------|---------------|----------------------|
| **Volume** | < 10K requests/month | > 100K requests/month |
| **Latency** | > 500ms acceptable | < 100ms required |
| **Customization** | General use case | Domain-specific fine-tuning |
| **Data Privacy** | Non-sensitive data | PII, HIPAA, financial |
| **Team Expertise** | No ML engineers | ML team available |
| **Budget** | Predictable per-call costs | High volume justifies infra |

### Cost Comparison Framework

```markdown
## API Costs (Example: OpenAI GPT-4)
- Input: $0.03/1K tokens
- Output: $0.06/1K tokens
- Average request: 500 input + 200 output tokens
- Cost per request: $0.027
- 100K requests/month: $2,700

## Self-Hosted Costs (Example: Llama 70B)
- GPU instance: $3/hour (A100 40GB)
- Throughput: ~50 requests/minute = 3K/hour
- Cost per request: $0.001
- 100K requests/month: $100 + $500 engineering time

## Break-even Analysis
- < 50K requests: API likely cheaper
- > 50K requests: Self-hosted may be cheaper
- Factor in: engineering time, ops burden, model quality
```

---

## Training Pipeline Architecture

### Standard ML Pipeline

```
┌─────────────────────────────────────────────────────────────┐
│                    DATA LAYER                                │
├─────────────────────────────────────────────────────────────┤
│  Data Sources → ETL → Feature Store → Training Data         │
│  (S3, DBs)     (Airflow)  (Feast)     (Versioned)          │
└─────────────────────────────────────────────────────────────┘
                            │
                            ▼
┌─────────────────────────────────────────────────────────────┐
│                  TRAINING LAYER                              │
├─────────────────────────────────────────────────────────────┤
│  Experiment Tracking → Training Jobs → Model Registry       │
│  (MLflow, W&B)         (SageMaker)    (MLflow, S3)         │
└─────────────────────────────────────────────────────────────┘
                            │
                            ▼
┌─────────────────────────────────────────────────────────────┐
│                  SERVING LAYER                               │
├─────────────────────────────────────────────────────────────┤
│  Model Server → Load Balancer → Monitoring                  │
│  (TorchServe)   (K8s/ELB)      (Prometheus)                │
└─────────────────────────────────────────────────────────────┘
```

### Component Selection Guide

| Component | Options | Recommendation |
|-----------|---------|----------------|
| **Feature Store** | Feast, Tecton, SageMaker | Feast (open source), Tecton (enterprise) |
| **Experiment Tracking** | MLflow, Weights & Biases, Neptune | MLflow (free), W&B (best UX) |
| **Training Orchestration** | Kubeflow, SageMaker, Vertex AI | SageMaker (AWS), Vertex (GCP) |
| **Model Registry** | MLflow, SageMaker, custom S3 | MLflow (standard) |
| **Model Serving** | TorchServe, TFServing, Triton | Triton (multi-framework) |

---

## Inference Architecture Patterns

### Pattern 1: Synchronous API

Best for: Low-latency requirements, simple integration

```
Client → API Gateway → Model Server → Response
                           │
                      Load Balancer
                           │
                    ┌──────┴──────┐
                    │             │
                Model Pod    Model Pod
```

**Latency targets**:
- P50: < 100ms
- P95: < 300ms
- P99: < 500ms

### Pattern 2: Asynchronous Processing

Best for: Long-running inference, batch processing

```
Client → API → Queue (SQS) → Worker → Result Store → Webhook/Poll
                                          │
                                     S3/Redis
```

**Use when**:
- Inference > 5 seconds
- Batch processing required
- Variable load patterns

### Pattern 3: Edge Inference

Best for: Privacy, offline capability, ultra-low latency

```
┌─────────────────────────────────────────┐
│              EDGE DEVICE                 │
│  ┌─────────┐    ┌─────────────────────┐ │
│  │ Camera  │───▶│ Optimized Model     │ │
│  └─────────┘    │ (ONNX, TFLite)      │ │
│                 └─────────────────────┘ │
│                          │              │
│                     Local Result        │
└─────────────────────────────────────────┘
                           │
                    Sync to Cloud
                    (non-blocking)
```

**Model optimization for edge**:
- Quantization (INT8): 4x smaller, 2-

Del mismo repositorio

cto-architectSubagent

Use this agent when you need comprehensive technical architecture guidance, strategic technology decisions, or system design for complex web/mobile applications with ML/AI integration. Specifically invoke this agent when

cto-orchestratorSubagent

Use this agent when you need strategic technical leadership, complex task orchestration across multiple domains, or help translating business requirements into technical execution. This agent excels at breaking down ambiguous requests, routing work to specialized agents, and maintaining strategic context throughout complex projects.

strategic-cto-mentorSubagent

Use this agent when you need strategic technical advice, architectural reviews, roadmap planning, or honest feedback on technical decisions. This includes evaluating project strategies, challenging assumptions, reviewing system designs, planning execution approaches, or getting brutally honest assessment of ideas and proposals.

ctoSlash Command

Get CTO-level strategic and technical guidance

decideSlash Command

Get strategic guidance on build vs buy and technology decisions

designSlash Command

Design system architecture with roadmap and technology recommendations

validateSlash Command

Validate plans, roadmaps, or proposals with ruthless honesty

antipattern-detectorSkill

Detect common technical and organizational anti-patterns in proposals, architectures, and plans. Use when strategic-cto-mentor needs to identify red flags before they become problems.