ml-cv-specialist
Deep expertise in ML/CV model selection, training pipelines, and inference architecture. Use when designing machine learning systems, computer vision pipelines, or AI-powered features.
git clone --depth 1 https://github.com/alirezarezvani/claude-cto-team /tmp/ml-cv-specialist && cp -r /tmp/ml-cv-specialist/skills/ml-cv-specialist ~/.claude/skills/ml-cv-specialistSKILL.md
# ML/CV Specialist
Provides specialized guidance for machine learning and computer vision system design, model selection, and production deployment.
## When to Use
- Selecting ML models for specific use cases
- Designing training and inference pipelines
- Optimizing ML system performance and cost
- Evaluating build vs. API for ML capabilities
- Planning data pipelines for ML workloads
## ML System Design Framework
### Model Selection Decision Tree
```
Use Case Identified
│
├─► Text/Language Tasks
│ ├─► Classification → BERT, DistilBERT, or API (OpenAI, Claude)
│ ├─► Generation → GPT-4, Claude, Llama (self-hosted)
│ ├─► Embeddings → OpenAI Ada, sentence-transformers
│ └─► Search/RAG → Vector DB + Embeddings + LLM
│
├─► Computer Vision Tasks
│ ├─► Classification → ResNet, EfficientNet, ViT
│ ├─► Object Detection → YOLOv8, DETR, Faster R-CNN
│ ├─► Segmentation → SAM, Mask R-CNN, U-Net
│ ├─► OCR → Tesseract, PaddleOCR, Cloud Vision API
│ └─► Face Recognition → InsightFace, DeepFace
│
├─► Audio Tasks
│ ├─► Speech-to-Text → Whisper, DeepSpeech, Cloud APIs
│ ├─► Text-to-Speech → ElevenLabs, Coqui TTS
│ └─► Audio Classification → PANNs, AudioSet models
│
└─► Structured Data
├─► Tabular → XGBoost, LightGBM, CatBoost
├─► Time Series → Prophet, ARIMA, Transformer-based
└─► Recommendations → Two-tower, matrix factorization
```
---
## API vs. Self-Hosted Decision
### When to Use APIs
| Factor | API Preferred | Self-Hosted Preferred |
|--------|---------------|----------------------|
| **Volume** | < 10K requests/month | > 100K requests/month |
| **Latency** | > 500ms acceptable | < 100ms required |
| **Customization** | General use case | Domain-specific fine-tuning |
| **Data Privacy** | Non-sensitive data | PII, HIPAA, financial |
| **Team Expertise** | No ML engineers | ML team available |
| **Budget** | Predictable per-call costs | High volume justifies infra |
### Cost Comparison Framework
```markdown
## API Costs (Example: OpenAI GPT-4)
- Input: $0.03/1K tokens
- Output: $0.06/1K tokens
- Average request: 500 input + 200 output tokens
- Cost per request: $0.027
- 100K requests/month: $2,700
## Self-Hosted Costs (Example: Llama 70B)
- GPU instance: $3/hour (A100 40GB)
- Throughput: ~50 requests/minute = 3K/hour
- Cost per request: $0.001
- 100K requests/month: $100 + $500 engineering time
## Break-even Analysis
- < 50K requests: API likely cheaper
- > 50K requests: Self-hosted may be cheaper
- Factor in: engineering time, ops burden, model quality
```
---
## Training Pipeline Architecture
### Standard ML Pipeline
```
┌─────────────────────────────────────────────────────────────┐
│ DATA LAYER │
├─────────────────────────────────────────────────────────────┤
│ Data Sources → ETL → Feature Store → Training Data │
│ (S3, DBs) (Airflow) (Feast) (Versioned) │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ TRAINING LAYER │
├─────────────────────────────────────────────────────────────┤
│ Experiment Tracking → Training Jobs → Model Registry │
│ (MLflow, W&B) (SageMaker) (MLflow, S3) │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ SERVING LAYER │
├─────────────────────────────────────────────────────────────┤
│ Model Server → Load Balancer → Monitoring │
│ (TorchServe) (K8s/ELB) (Prometheus) │
└─────────────────────────────────────────────────────────────┘
```
### Component Selection Guide
| Component | Options | Recommendation |
|-----------|---------|----------------|
| **Feature Store** | Feast, Tecton, SageMaker | Feast (open source), Tecton (enterprise) |
| **Experiment Tracking** | MLflow, Weights & Biases, Neptune | MLflow (free), W&B (best UX) |
| **Training Orchestration** | Kubeflow, SageMaker, Vertex AI | SageMaker (AWS), Vertex (GCP) |
| **Model Registry** | MLflow, SageMaker, custom S3 | MLflow (standard) |
| **Model Serving** | TorchServe, TFServing, Triton | Triton (multi-framework) |
---
## Inference Architecture Patterns
### Pattern 1: Synchronous API
Best for: Low-latency requirements, simple integration
```
Client → API Gateway → Model Server → Response
│
Load Balancer
│
┌──────┴──────┐
│ │
Model Pod Model Pod
```
**Latency targets**:
- P50: < 100ms
- P95: < 300ms
- P99: < 500ms
### Pattern 2: Asynchronous Processing
Best for: Long-running inference, batch processing
```
Client → API → Queue (SQS) → Worker → Result Store → Webhook/Poll
│
S3/Redis
```
**Use when**:
- Inference > 5 seconds
- Batch processing required
- Variable load patterns
### Pattern 3: Edge Inference
Best for: Privacy, offline capability, ultra-low latency
```
┌─────────────────────────────────────────┐
│ EDGE DEVICE │
│ ┌─────────┐ ┌─────────────────────┐ │
│ │ Camera │───▶│ Optimized Model │ │
│ └─────────┘ │ (ONNX, TFLite) │ │
│ └─────────────────────┘ │
│ │ │
│ Local Result │
└─────────────────────────────────────────┘
│
Sync to Cloud
(non-blocking)
```
**Model optimization for edge**:
- Quantization (INT8): 4x smaller, 2-Use this agent when you need comprehensive technical architecture guidance, strategic technology decisions, or system design for complex web/mobile applications with ML/AI integration. Specifically invoke this agent when
Use this agent when you need strategic technical leadership, complex task orchestration across multiple domains, or help translating business requirements into technical execution. This agent excels at breaking down ambiguous requests, routing work to specialized agents, and maintaining strategic context throughout complex projects.
Use this agent when you need strategic technical advice, architectural reviews, roadmap planning, or honest feedback on technical decisions. This includes evaluating project strategies, challenging assumptions, reviewing system designs, planning execution approaches, or getting brutally honest assessment of ideas and proposals.
Get CTO-level strategic and technical guidance
Get strategic guidance on build vs buy and technology decisions
Design system architecture with roadmap and technology recommendations
Validate plans, roadmaps, or proposals with ruthless honesty
Detect common technical and organizational anti-patterns in proposals, architectures, and plans. Use when strategic-cto-mentor needs to identify red flags before they become problems.