The Context Layer for unstructured data: typed, versioned datasets over S3, GCS, Azure
DataChain is a Python library and MCP server that turns unstructured files stored in S3, GCS, and Azure into typed, versioned datasets queryable at warehouse speed, without copying bytes out of storage. Its three main components are a Compute Engine for parallel and distributed Python processing with async I/O and checkpoint recovery, a Dataset DB backed by Pydantic schemas that tracks versions, file pointers, and lineage in a local SQLite store, and a Knowledge Base that generates markdown summaries from datasets enriched by LLM. The Agent Harness connects all three to Claude Code, Cursor, Codex, GitHub Copilot, and Pi via a single install command such as `datachain skill install --target claude`, exposing tools like `read_storage`, `map`, and `save` so agents can build and reuse named dataset versions like `pets_embeddings@1.0.0` across sessions. Data engineers, MLOps practitioners, and teams running multimodal pipelines benefit most, particularly when agents need persistent data context rather than recomputing from raw files on every run.
- ✓Open-source license (Apache-2.0)
- ✓Actively maintained (<30d)
- ✓Healthy fork ratio
- ✓Clear description
- ✓Topics declared
- ✓Mature repo (>1y old)
- !README contains suspicious pattern: eval\s*\(
git clone https://github.com/datachain-ai/datachain && cp datachain/*.md ~/.claude/agents/3 items en este repositorio
Use ONLY for abstract DataChain SDK questions — API usage, method signatures, or code patterns — when no specific dataset or bucket is referenced. If the request mentions creating, saving, listing, exploring datasets or buckets, use datachain-knowledge instead.
Use when asked about Studio job analytics — compute hours, user spend, failure rates, cost estimation, cluster usage. Generates and maintains dc-knowledge/jobs/index.md.
Use whenever datasets, cloud storage buckets, or data pipelines are mentioned — creating, saving, querying, listing, exploring, deleting, or processing data in S3, GCS, Azure Blob, or local storage. Also use when running any script that may create datasets as a side effect. Maintains a knowledge base at dc-knowledge/ (JSON + markdown). ALWAYS use this skill when the user creates a dataset, saves pipeline output, runs a data script, or references any storage bucket.
Resumen de Subagents
Lo que la gente pregunta sobre datachain
¿Qué es datachain-ai/datachain?
+
datachain-ai/datachain es subagents para el ecosistema de Claude AI. The Context Layer for unstructured data: typed, versioned datasets over S3, GCS, Azure Tiene 2.8k estrellas en GitHub y se actualizó por última vez today.
¿Cómo se instala datachain?
+
Puedes instalar datachain clonando el repositorio (https://github.com/datachain-ai/datachain) o siguiendo las instrucciones del README en GitHub. ClaudeWave también te ofrece bloques de instalación rápida en esta misma página.
¿Es seguro usar datachain-ai/datachain?
+
Nuestro agente de seguridad ha analizado datachain-ai/datachain y le ha asignado un Trust Score de 100/100 (tier: Verified). Revisa el desglose completo de comprobaciones superadas y flags en esta página.
¿Quién mantiene datachain-ai/datachain?
+
datachain-ai/datachain es mantenido por datachain-ai. La última actividad registrada en GitHub es de today, con 66 issues abiertos.
¿Hay alternativas a datachain?
+
Sí. En ClaudeWave puedes explorar subagents similares en /categories/agents, ordenados por popularidad o actividad reciente.
Despliega datachain en tu cloud
Lleva este repo a producción en minutos. Cada plataforma genera su propio entorno con variables de entorno editables.
¿Mantienes este repo? Añade un badge a tu README
Pega el badge en tu README de GitHub para mostrar que está auditado por ClaudeWave. Cada badge enlaza de vuelta a esta página y muestra el Trust Score actual.
[](https://claudewave.com/repo/datachain-ai-datachain)<a href="https://claudewave.com/repo/datachain-ai-datachain"><img src="https://claudewave.com/api/badge/datachain-ai-datachain" alt="Featured on ClaudeWave: datachain-ai/datachain" width="320" height="64" /></a>Más Subagents
The agent harness performance optimization system. Skills, instincts, memory, security, and research-first development for Claude Code, Codex, Opencode, Cursor and beyond.
The agent that grows with you
Java 面试 & 后端通用面试指南,覆盖计算机基础、数据库、分布式、高并发、系统设计与 AI 应用开发
Production-ready platform for agentic workflow development.
The agent engineering platform.
🤯 LobeHub is your Chief Agent Operator, organizing your agents into 7×24 operations by hiring, scheduling, and reporting on your entire AI team.