datachain

Name: datachain-ai/datachain
Author: datachain-ai

The Context Layer for unstructured data: typed, versioned datasets over S3, GCS, Azure

Subagents2.8k estrellas153 forks● PythonApache-2.0Actualizado today

Nota editorial

DataChain is a Python library and MCP server that turns unstructured files stored in S3, GCS, and Azure into typed, versioned datasets queryable at warehouse speed, without copying bytes out of storage. Its three main components are a Compute Engine for parallel and distributed Python processing with async I/O and checkpoint recovery, a Dataset DB backed by Pydantic schemas that tracks versions, file pointers, and lineage in a local SQLite store, and a Knowledge Base that generates markdown summaries from datasets enriched by LLM. The Agent Harness connects all three to Claude Code, Cursor, Codex, GitHub Copilot, and Pi via a single install command such as `datachain skill install --target claude`, exposing tools like `read_storage`, `map`, and `save` so agents can build and reuse named dataset versions like `pets_embeddings@1.0.0` across sessions. Data engineers, MLOps practitioners, and teams running multimodal pipelines benefit most, particularly when agents need persistent data context rather than recomputing from raw files on every run.

ClaudeWave Trust Score

100/100

✓ Verified

Passed

✓Open-source license (Apache-2.0)
✓Actively maintained (<30d)
✓Healthy fork ratio
✓Clear description
✓Topics declared
✓Mature repo (>1y old)

Flags

!README contains suspicious pattern: eval\s*\(

Last scanned: 6/11/2026

Install as a Claude Code subagent

Method: Clone

Terminal

git clone https://github.com/datachain-ai/datachain && cp datachain/*.md ~/.claude/agents/

1. Clone the repository and copy the agent .md definitions into ~/.claude/agents (or .claude/agents inside a project).

2. Start a new Claude Code session to load the agents.

3. Delegate work to them with the Task/Agent tool or by name.

Items instalables

3 items en este repositorio

datachain-coreSkill

Use ONLY for abstract DataChain SDK questions — API usage, method signatures, or code patterns — when no specific dataset or bucket is referenced. If the request mentions creating, saving, listing, exploring datasets or buckets, use datachain-knowledge instead.

Instalar

datachain-jobsSkill

Use when asked about Studio job analytics — compute hours, user spend, failure rates, cost estimation, cluster usage. Generates and maintains dc-knowledge/jobs/index.md.

Instalar

datachain-knowledgeSkill

Use whenever datasets, cloud storage buckets, or data pipelines are mentioned — creating, saving, querying, listing, exploring, deleting, or processing data in S3, GCS, Azure Blob, or local storage. Also use when running any script that may create datasets as a side effect. Maintains a knowledge base at dc-knowledge/ (JSON + markdown). ALWAYS use this skill when the user creates a dataset, saves pipeline output, runs a data script, or references any storage bucket.

Instalar

Casos de uso

Research DevOps

Sobre el repo

Resumen de Subagents

README no disponible. Visita el repo en GitHub para la documentación completa.

Topics

ai-agentsclaude-codecodexdata-context-layerdata-processingharness-engineeringknowledge-basemlopsmultimodalpydanticunstructured-data

Preguntas frecuentes

Lo que la gente pregunta sobre datachain

¿Qué es datachain-ai/datachain?

datachain-ai/datachain es subagents para el ecosistema de Claude AI. The Context Layer for unstructured data: typed, versioned datasets over S3, GCS, Azure Tiene 2.8k estrellas en GitHub y se actualizó por última vez today.

¿Cómo se instala datachain?

Puedes instalar datachain clonando el repositorio (https://github.com/datachain-ai/datachain) o siguiendo las instrucciones del README en GitHub. ClaudeWave también te ofrece bloques de instalación rápida en esta misma página.

¿Es seguro usar datachain-ai/datachain?

Nuestro agente de seguridad ha analizado datachain-ai/datachain y le ha asignado un Trust Score de 100/100 (tier: Verified). Revisa el desglose completo de comprobaciones superadas y flags en esta página.

¿Quién mantiene datachain-ai/datachain?

datachain-ai/datachain es mantenido por datachain-ai. La última actividad registrada en GitHub es de today, con 79 issues abiertos.

¿Hay alternativas a datachain?

Sí. En ClaudeWave puedes explorar subagents similares en /categories/agents, ordenados por popularidad o actividad reciente.

Deploy en 1 click

Despliega datachain en tu cloud

Lleva este repo a producción en minutos. Cada plataforma genera su propio entorno con variables de entorno editables.

Vercel Railway Render

Badge embebible

¿Mantienes este repo? Añade un badge a tu README

Pega el badge en tu README de GitHub para mostrar que está auditado por ClaudeWave. Cada badge enlaza de vuelta a esta página y muestra el Trust Score actual.

Markdown (README)

[![Featured on ClaudeWave](https://claudewave.com/api/badge/datachain-ai-datachain)](https://claudewave.com/repo/datachain-ai-datachain)

HTML

<a href="https://claudewave.com/repo/datachain-ai-datachain"><img src="https://claudewave.com/api/badge/datachain-ai-datachain" alt="Featured on ClaudeWave: datachain-ai/datachain" width="320" height="64" /></a>

Relacionados

Más Subagents

affaan-m

ECC

today

The agent harness performance optimization system. Skills, instincts, memory, security, and research-first development for Claude Code, Codex, Opencode, Cursor and beyond.

234.2k35.7kJavaScript

Subagentsai-agentsanthropicInstall

NousResearch

hermes-agent

today

The agent that grows with you

221.5k42.3kPython

Subagentsaiai-agentInstall

Snailclimb

JavaGuide

yesterday

Java 面试 & 后端通用面试指南，覆盖计算机基础、数据库、分布式、高并发、系统设计与 AI 应用开发

157.3k46.2kJavaScript

SubagentsagentaiInstall

langgenius

dify

today

Build Agentic workflows, RAG pipelines, with rich AI model and tool support on one collaborative workspace. Deploy on cloud, VPC, or self-hosted, so teams move from prototype to production without rebuilding the stack.

150.5k23.7kTypeScript

Subagentsagentagentic-aiInstall

langchain-ai

langchain

today

The agent engineering platform.

142.7k23.8kPython

SubagentsagentsaiInstall

Graphify-Labs

graphify

today

Turn any codebase, with its docs, SQL schemas, configs, and PDFs, into a queryable knowledge graph. A /graphify skill for Claude Code, Cursor, Codex, and Gemini CLI: local deterministic AST parsing, every edge explained, no vector store.

97.2k9.4kPython

Subagentsai-agentsantigravityInstall