Skip to main content
ClaudeWave

LLM-Driven Extraction of Unstructured Data — Built for API Deployments & ETL Pipeline Workflows

MCP Servers6.6k stars630 forksPythonAGPL-3.0Updated today
Editor's note

Unstract is an open-source document intelligence platform that uses LLMs to extract structured JSON output from unstructured documents including PDFs, scanned images, and other formats. Its core workflow centers on Prompt Studio, a visual interface where users define extraction schemas using plain natural language rather than regex or rigid templates, then deploy those schemas either as REST API endpoints or as ETL pipelines that pull documents from folders and load results into data warehouses. The platform connects to Claude and other AI agents through a built-in MCP server, and also supports an n8n custom node for automation workflow integration. LLM providers including Anthropic, OpenAI, AWS Bedrock, and Ollama are all supported as interchangeable backends. The entire platform deploys locally via a single `./run-platform.sh` script using Docker Compose with an 8 GB RAM minimum. Primary audiences are engineering and operations teams in finance, insurance, healthcare, and compliance who need reliable, schema-consistent data extraction from high-volume document workflows without building custom extraction pipelines from scratch.

ClaudeWave Trust Score
100/100
Verified
Passed
  • Open-source license (AGPL-3.0)
  • Actively maintained (<30d)
  • Healthy fork ratio
  • Clear description
  • Topics declared
  • Mature repo (>1y old)
Last scanned: 6/11/2026
Install in Claude Code / Claude Desktop
Method: UVX (Python) · unstract
Claude Code CLI
claude mcp add unstract -- uvx unstract
claude_desktop_config.json (Claude Desktop)
{
  "mcpServers": {
    "unstract": {
      "command": "uvx",
      "args": ["unstract"]
    }
  }
}
1. Run the command above in your terminal (Claude Code), or paste the JSON config into claude_desktop_config.json (Claude Desktop).
2. Replace any <placeholder> values with your API keys or paths.
3. Restart Claude. The MCP server and its tools appear automatically.
💡 Package name inferred from the repository name. Verify it exists on PyPI, or clone https://github.com/Zipstack/unstract and follow its README.
Use cases

MCP Servers overview

<div align="center">
  <img src="docs/assets/unstract_u_logo.png" style="height: 120px">
  <h1>Unstract</h1>
  <h2>Turn Unstructured Documents into Structured Data</h2>
  <p>
    <a href="https://docs.unstract.com">Documentation</a> |
    <a href="https://unstract.com/pricing/">Enterprise</a>
  </p>
  <p>
    <a href="LICENSE"><img src="https://img.shields.io/github/license/Zipstack/unstract" alt="License"></a>
    <a href="https://docs.unstract.com/unstract/unstract_platform/quick_start"><img src="https://img.shields.io/badge/tutorials-docs-brightgreen" alt="Tutorials"></a>
    <a href="https://status.unstract.com"><img src="https://img.shields.io/badge/uptime-status-brightgreen" alt="Uptime Status"></a>
    <a href="https://hub.docker.com/u/unstract"><img src="https://img.shields.io/docker/pulls/unstract/backend" alt="Docker Pulls"></a>
    <a href="https://deepwiki.com/Zipstack/unstract"><img src="https://deepwiki.com/badge.svg" alt="Ask DeepWiki"></a>
    <a href="https://cla-assistant.io/Zipstack/unstract"><img src="https://cla-assistant.io/readme/badge/Zipstack/unstract" alt="CLA assistant"></a>
  </p>
  <p>
    <img src="https://img.shields.io/python/required-version-toml?tomlFilePath=https%3A%2F%2Fraw.githubusercontent.com%2FZipstack%2Funstract%2Frefs%2Fheads%2Fmain%2Fpyproject.toml" alt="Python Version from PEP 621 TOML">
    <a href="https://github.com/astral-sh/uv"><img src="https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/uv/main/assets/badge/v0.json" alt="uv"></a>
    <a href="https://vite.dev/"><img src="https://img.shields.io/badge/Vite-6.x-646CFF?logo=vite&logoColor=white" alt="Vite"></a>
    <a href="https://bun.sh/"><img src="https://img.shields.io/badge/Bun-1.x-000000?logo=bun&logoColor=white" alt="Bun"></a>
    <a href="https://biomejs.dev/"><img src="https://img.shields.io/badge/Biome-2.x-60A5FA?logo=biome&logoColor=white" alt="Biome"></a>
  </p>
  <p>
    <a href="https://results.pre-commit.ci/latest/github/Zipstack/unstract/main"><img src="https://results.pre-commit.ci/badge/github/Zipstack/unstract/main.svg" alt="pre-commit.ci status"></a>
    <a href="https://sonarcloud.io/summary/new_code?id=Zipstack_unstract"><img src="https://sonarcloud.io/api/project_badges/measure?project=Zipstack_unstract&metric=alert_status" alt="Quality Gate Status"></a>
    <a href="https://sonarcloud.io/summary/new_code?id=Zipstack_unstract"><img src="https://sonarcloud.io/api/project_badges/measure?project=Zipstack_unstract&metric=code_smells" alt="Code Smells"></a>
    <a href="https://sonarcloud.io/summary/new_code?id=Zipstack_unstract"><img src="https://sonarcloud.io/api/project_badges/measure?project=Zipstack_unstract&metric=duplicated_lines_density" alt="Duplicated Lines (%)"></a>
  </p>
</div>

## What is Unstract?

Unstract uses LLMs to extract structured JSON from documents — PDFs, images, scans, you name it. Define what you want to extract using natural language prompts, and deploy as an API or ETL pipeline.

Built for teams in **finance**, **insurance**, **healthcare**, **KYC/compliance**, and much more.

## Current State vs. Unstract

| Task | Without Unstract | With Unstract |
|------|------------------|---------------|
| Schema definition | Write regex, build templates per vendor | Write a prompt once, handles variations |
| New document type | Days of development | Minutes in Prompt Studio |
| LLM integration | Build your own pipeline | Plug in any provider (OpenAI, Anthropic, Bedrock, Ollama) |
| Deployment | Custom infrastructure | `./run-platform.sh` or managed cloud |
| Output | Unstructured text blobs | Clean JSON, ready for your database |

> ⭐ If Unstract helps you, star this repo!
>
> ![Star Unstract](docs/assets/github_star.gif)

## ✨ Key Features

**Prompt Studio** — Define document extraction schemas with natural language. [Docs →](https://docs.unstract.com/unstract/unstract_platform/features/prompt_studio/prompt_studio_intro/)

![Prompt Studio](docs/assets/prompt_studio.gif)

**API Deployment** — Send a document over REST API, get JSON back. [Docs →](https://docs.unstract.com/unstract/unstract_platform/api_deployment/unstract_api_deployment_intro/)

![API Deployment](docs/assets/api_deployment.gif)

**ETL Pipeline** — Pull documents from a folder, process them, load to your warehouse. [Docs →](https://docs.unstract.com/unstract/unstract_platform/etl_pipeline/unstract_etl_pipeline_intro/)

**MCP Server** — Connect to AI agents (Claude, etc.) via Model Context Protocol. [Docs →](https://docs.unstract.com/unstract/unstract_platform/mcp/unstract_platform_mcp_server/)

**n8n Node** — Drop into existing automation workflows. [Docs →](https://docs.unstract.com/unstract/unstract_platform/api_deployment/unstract_api_deployment_n8n_custom_node/)

## 🚀 Quickstart (~5 mins)

### System Requirements & Prerequisites

- Linux or macOS (Intel or M-series)
- Docker & Docker Compose
- 8 GB RAM minimum
- Git

### Run Locally

```bash
# Clone and start
git clone https://github.com/Zipstack/unstract.git
cd unstract
./run-platform.sh
```

That's it!

- Visit [http://frontend.unstract.localhost](http://frontend.unstract.localhost) in your browser
- Login with username: `unstract` password: `unstract`
- Start extracting data!

## 📦 Other Deployment Options

### Docker Compose

```bash
# Pull and run entire Unstract platform with default env config.
./run-platform.sh

# Pull and run docker containers with a specific version tag.
./run-platform.sh -v v0.1.0

# Upgrade existing Unstract platform setup by pulling the latest available version.
./run-platform.sh -u

# Upgrade existing Unstract platform setup by pulling a specific version.
./run-platform.sh -u -v v0.2.0

# Build docker images locally as a specific version tag.
./run-platform.sh -b -v v0.1.0

# Build docker images locally from working branch as `current` version tag.
./run-platform.sh -b -v current

# Display the help information.
./run-platform.sh -h

# Only do setup of environment files.
./run-platform.sh -e

# Only do docker images pull with a specific version tag.
./run-platform.sh -p -v v0.1.0

# Only do docker images pull by building locally with a specific version tag.
./run-platform.sh -p -b -v v0.1.0

# Upgrade existing Unstract platform setup with docker images built locally from working branch as `current` version tag.
./run-platform.sh -u -b -v current

# Pull and run docker containers in detached mode.
./run-platform.sh -d -v v0.1.0
```

## 🔐 Backup Encryption Key

> [!WARNING]
> This key encrypts adapter credentials — losing it makes existing adapters inaccessible!

Copy the value of `ENCRYPTION_KEY` from `backend/.env` or `platform-service/.env` to a secure location.

## 🏗️ Unstract Architecture

```text
┌────────────────────────────────────────────────────────────┐
│                          Unstract                          │
├─────────────┬─────────────┬─────────────┬──────────────────┤
│  Frontend   │   Backend   │   Worker    │ Platform Service │
│  (React)    │  (Django)   │  (Celery)   │   (FastAPI)      │
├─────────────┴─────────────┴─────────────┴──────────────────┤
│                      Cache (Redis)                         │
├────────────────────────────────────────────────────────────┤
│                  Message Queue (RabbitMQ)                  │
├────────────────────────────────────────────────────────────┤
│                   Database (PostgreSQL)                    │
├────────────────────────────────────────────────────────────┤
│  LLM Adapters    │  Vector DBs    │  Text Extractors       │
│  (OpenAI, etc.)  │ (Qdrant, etc.) │  (LLMWhisperer)        │
└────────────────────────────────────────────────────────────┘
```

Also see [architecture](docs/ARCHITECTURE.md).

## 📄 Document File Formats

| Category | Formats |
|----------|---------|
| Documents | PDF, DOCX, DOC, ODT, TXT, CSV, JSON |
| Spreadsheets | XLSX, XLS, ODS |
| Presentations | PPTX, PPT, ODP |
| Images | PNG, JPG, JPEG, TIFF, BMP, GIF, WEBP |

## 🔌 Connectors & Adapters

### LLM Providers

| Provider | Status | Provider | Status |
|----------|--------|----------|--------|
| OpenAI | ✅ | Azure OpenAI | ✅ |
| OpenAI Compatible | ✅ | Anthropic Claude | ✅ |
| AWS Bedrock | ✅ | Google Gemini | ✅ |
| Ollama (local) | ✅ | Mistral AI | ✅ |
| Anyscale | ✅ | | |

### Vector Databases

| Provider | Status | Provider | Status |
|----------|--------|----------|--------|
| Qdrant | ✅ | Pinecone | ✅ |
| Weaviate | ✅ | PostgreSQL | ✅ |
| Milvus | ✅ | | |

### Text Extractors

| Provider | Status |
|----------|--------|
| LLMWhisperer | ✅ |
| Unstructured.io | ✅ |
| LlamaIndex Parse | ✅ |

### ETL Sources & Destinations

**Sources:** AWS S3, MinIO, Google Cloud Storage, Azure Blob, Google Drive, Dropbox, SFTP

**Destinations:** Snowflake, Amazon Redshift, Google BigQuery, PostgreSQL, MySQL, MariaDB, SQL Server, Oracle

[Full Connector List](https://docs.unstract.com/unstract/unstract_platform/setup_accounts/whats_needed)

## 🛠️ Development

### Change Default Credentials

Follow [these steps](backend/README.md#authentication) to change the default username and password.

### Local Development

```bash
# Install pre-commit hooks
./dev-env-cli.sh -p

# Run pre-commit checks
./dev-env-cli.sh -r
```

[Local Development Guide](https://docs.unstract.com/unstract/unstract_platform/user_guides/run_platform)

## 🏢 Use Cases by Industry

[Finance & Banking →](https://unstract.com/finance-automation/) | [Insurance →](https://unstract.com/insurance-automation/) | [Healthcare →](https://unstract.com/healthcare-automation/) | [Income Tax →](https://unstract.com/ai-income-tax-forms-data-extraction/)

## ☁️ Cloud & Enterprise

For teams that need managed infrastructure, advanced accuracy features, or compliance certifications.

- ✅ **LLMChallenge** — dual-LLM verification
- ✅ **SinglePass & Summarized Extraction** — reduce LLM token costs
- ✅ **Human-in-the-Loop** — review interface with document highlighting
- ✅ **SSO & Enterprise RBAC** — SAML
ai-agentsdata-engineeringdocument-aigenerative-aiidpjson-extractionllmmcp-serverocrpdf-extractionprompt-engineeringstructured-output

What people ask about unstract

What is Zipstack/unstract?

+

Zipstack/unstract is mcp servers for the Claude AI ecosystem. LLM-Driven Extraction of Unstructured Data — Built for API Deployments & ETL Pipeline Workflows It has 6.6k GitHub stars and was last updated today.

How do I install unstract?

+

You can install unstract by cloning the repository (https://github.com/Zipstack/unstract) or following the README instructions on GitHub. ClaudeWave also provides quick install blocks on this page.

Is Zipstack/unstract safe to use?

+

Our security agent has analyzed Zipstack/unstract and assigned a Trust Score of 100/100 (tier: Verified). See the full breakdown of passed checks and flags on this page.

Who maintains Zipstack/unstract?

+

Zipstack/unstract is maintained by Zipstack. The last recorded GitHub activity is from today, with 96 open issues.

Are there alternatives to unstract?

+

Yes. On ClaudeWave you can browse similar mcp servers at /categories/mcp, sorted by popularity or recent activity.

Deploy unstract to your cloud

Ship this repo to production in minutes. Each platform spins up its own environment with editable env vars.

Maintain this repo? Add a badge to your README

Drop the badge into your GitHub README to show it's tracked on ClaudeWave. Each badge links back to this page and reflects the live Trust Score.

Featured on ClaudeWave: Zipstack/unstract
[![Featured on ClaudeWave](https://claudewave.com/api/badge/zipstack-unstract)](https://claudewave.com/repo/zipstack-unstract)
<a href="https://claudewave.com/repo/zipstack-unstract"><img src="https://claudewave.com/api/badge/zipstack-unstract" alt="Featured on ClaudeWave: Zipstack/unstract" width="320" height="64" /></a>

More MCP Servers

unstract alternatives