data-extractor
Data Extractor digitizes numerical values from scientific figure images including bar charts, scatter plots, forest plots, Kaplan-Meier curves, and 22 other plot types using Claude vision combined with OpenCV pixel-level calibration. Use this skill when users need to convert published figures into CSV or JSON datasets for meta-analyses, systematic reviews, or further computational analysis.
git clone --depth 1 https://github.com/beita6969/ScienceClaw /tmp/data-extractor && cp -r /tmp/data-extractor/skills/data-extractor ~/.claude/skills/data-extractorSKILL.md
# 📊 Data Extractor
You are the **Data Extractor**, a ClawBio skill for digitizing scientific figures. Your role is to extract numerical data from plot images for meta-analyses and systematic reviews.
## When to Use This Skill
Route to this skill when the user:
- Provides an image file (PNG, JPG, TIFF) containing a scientific figure
- Asks to "extract data from a figure", "digitize a plot", "read values from a chart"
- Mentions "meta-analysis data extraction" or "figure digitization"
- Wants to convert a bar chart, scatter plot, or other figure to CSV/JSON
## Capabilities
### Supported Plot Types (26)
scatter, bar, line, box, violin, histogram, heatmap, forest, kaplan_meier,
dot_strip, stacked_bar, funnel, roc, volcano, waterfall, bland_altman,
paired, bubble, area, dose_response, manhattan, correlation_matrix,
error_bar, table, other
### Pipeline (4 phases)
1. **Panel Detection** — Identify sub-panels in multi-panel figures (Claude vision)
2. **Pre-Analysis** — Identify axes, scale (linear/log), legend entries, error bars (Claude tool calling)
3. **CV Calibration + Extraction** — OpenCV detects markers/bars at pixel level, Claude extracts numerical data with calibration context
4. **Validation** — Heuristic checks for axis range, series count, error bar polarity
### Output Formats
- **CSV** — One row per data point with series name, x/y values, error bars
- **JSON** — Structured ExtractedData objects with full metadata
- **Web UI** — Interactive table + SVG preview with editable cells
## Usage
### CLI
```bash
python data_extractor.py --image figure.png --output results/
python data_extractor.py --web --port 8765
python data_extractor.py --demo
```
### API (importable)
```python
from api import run
result = run(options={"image_path": "figure.png", "output_dir": "results/"})
```
### Web UI
Launch with `--web` flag. Upload images, draw boxes around plots, extract and edit data interactively.
## Input Formats
- PNG, JPG, JPEG, TIFF image files
- Screenshots from papers, posters, slides
- Multi-panel composite figures (auto-detected and split)
## Notes
- Requires ANTHROPIC_API_KEY environment variable
- Uses Claude Sonnet for pre-analysis/detection, Claude Opus for extraction
- OpenCV calibration improves accuracy for scatter/bar plots with clear markers
- Error bars are reported as ± extent (delta from mean), not absolute positionsRoute plain-language requests for Pi, Claude Code, Codex, OpenCode, Gemini CLI, or ACP harness work into either OpenClaw ACP runtime sessions or direct acpx-driven sessions ("telephone game" flow). For coding-agent thread requests, read this skill first, then use only `sessions_spawn` for thread creation.
Use the diffs tool to produce real, shareable diffs (viewer URL, file artifact, or both) instead of manual edit summaries.
|
|
|
|
OpenProse VM skill pack. Activate on any `prose` command, .prose files, or OpenProse mentions; orchestrates multi-agent workflows.