Skip to main content
ClaudeWave

The Context Layer for unstructured data: typed, versioned datasets over S3, GCS, Azure

Subagents2.8k estrellas145 forksPythonApache-2.0Actualizado today
Nota editorial

DataChain is a Python library and MCP server that turns unstructured files stored in S3, GCS, and Azure into typed, versioned datasets queryable at warehouse speed, without copying bytes out of storage. Its three main components are a Compute Engine for parallel and distributed Python processing with async I/O and checkpoint recovery, a Dataset DB backed by Pydantic schemas that tracks versions, file pointers, and lineage in a local SQLite store, and a Knowledge Base that generates markdown summaries from datasets enriched by LLM. The Agent Harness connects all three to Claude Code, Cursor, Codex, GitHub Copilot, and Pi via a single install command such as `datachain skill install --target claude`, exposing tools like `read_storage`, `map`, and `save` so agents can build and reuse named dataset versions like `pets_embeddings@1.0.0` across sessions. Data engineers, MLOps practitioners, and teams running multimodal pipelines benefit most, particularly when agents need persistent data context rather than recomputing from raw files on every run.

ClaudeWave Trust Score
100/100
Verified
Passed
  • Open-source license (Apache-2.0)
  • Actively maintained (<30d)
  • Healthy fork ratio
  • Clear description
  • Topics declared
  • Mature repo (>1y old)
Flags
  • !README contains suspicious pattern: eval\s*\(
Last scanned: 6/11/2026
Install as a Claude Code subagent
Method: Clone
Terminal
git clone https://github.com/datachain-ai/datachain && cp datachain/*.md ~/.claude/agents/
1. Clone the repository and copy the agent .md definitions into ~/.claude/agents (or .claude/agents inside a project).
2. Start a new Claude Code session to load the agents.
3. Delegate work to them with the Task/Agent tool or by name.

3 items en este repositorio

Use ONLY for abstract DataChain SDK questions — API usage, method signatures, or code patterns — when no specific dataset or bucket is referenced. If the request mentions creating, saving, listing, exploring datasets or buckets, use datachain-knowledge instead.

Instalar

Use when asked about Studio job analytics — compute hours, user spend, failure rates, cost estimation, cluster usage. Generates and maintains dc-knowledge/jobs/index.md.

Instalar

Use whenever datasets, cloud storage buckets, or data pipelines are mentioned — creating, saving, querying, listing, exploring, deleting, or processing data in S3, GCS, Azure Blob, or local storage. Also use when running any script that may create datasets as a side effect. Maintains a knowledge base at dc-knowledge/ (JSON + markdown). ALWAYS use this skill when the user creates a dataset, saves pipeline output, runs a data script, or references any storage bucket.

Instalar
Casos de uso

Resumen de Subagents

README no disponible. Visita el repo en GitHub para la documentación completa.
ai-agentsclaude-codecodexdata-context-layerdata-processingharness-engineeringknowledge-basemlopsmultimodalpydanticunstructured-data

Lo que la gente pregunta sobre datachain

¿Qué es datachain-ai/datachain?

+

datachain-ai/datachain es subagents para el ecosistema de Claude AI. The Context Layer for unstructured data: typed, versioned datasets over S3, GCS, Azure Tiene 2.8k estrellas en GitHub y se actualizó por última vez today.

¿Cómo se instala datachain?

+

Puedes instalar datachain clonando el repositorio (https://github.com/datachain-ai/datachain) o siguiendo las instrucciones del README en GitHub. ClaudeWave también te ofrece bloques de instalación rápida en esta misma página.

¿Es seguro usar datachain-ai/datachain?

+

Nuestro agente de seguridad ha analizado datachain-ai/datachain y le ha asignado un Trust Score de 100/100 (tier: Verified). Revisa el desglose completo de comprobaciones superadas y flags en esta página.

¿Quién mantiene datachain-ai/datachain?

+

datachain-ai/datachain es mantenido por datachain-ai. La última actividad registrada en GitHub es de today, con 66 issues abiertos.

¿Hay alternativas a datachain?

+

Sí. En ClaudeWave puedes explorar subagents similares en /categories/agents, ordenados por popularidad o actividad reciente.

Despliega datachain en tu cloud

Lleva este repo a producción en minutos. Cada plataforma genera su propio entorno con variables de entorno editables.

¿Mantienes este repo? Añade un badge a tu README

Pega el badge en tu README de GitHub para mostrar que está auditado por ClaudeWave. Cada badge enlaza de vuelta a esta página y muestra el Trust Score actual.

Featured on ClaudeWave: datachain-ai/datachain
[![Featured on ClaudeWave](https://claudewave.com/api/badge/datachain-ai-datachain)](https://claudewave.com/repo/datachain-ai-datachain)
<a href="https://claudewave.com/repo/datachain-ai-datachain"><img src="https://claudewave.com/api/badge/datachain-ai-datachain" alt="Featured on ClaudeWave: datachain-ai/datachain" width="320" height="64" /></a>

Más Subagents