Skip to main content
ClaudeWave
Skill12k estrellas del repoactualizado today

doc-reader

The doc-reader skill extracts text from a wide range of file formats including PDF, Word, Excel, PowerPoint, images (via OCR), CSV, plain text, JSON/YAML, HTML/XML, and source code. Use the `read_document` tool when you need to analyze or process document content programmatically without manually opening files, with optional page range selection for large PDFs and automatic format detection by file extension.

Instalar en Claude Code
Copiar
git clone --depth 1 https://github.com/HKUDS/Vibe-Trading /tmp/doc-reader && cp -r /tmp/doc-reader/agent/src/skills/doc-reader ~/.claude/skills/doc-reader
Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

SKILL.md

# Universal Document Reader

## Purpose

Return extracted text from any supported file in a single unified JSON
envelope. The tool dispatches by file extension — you always call the same
tool regardless of format.

### Supported formats

| Category | Extensions | Notes |
|---|---|---|
| PDF | `.pdf` | Text pages extracted in ms; scanned/image pages fall back to OCR |
| Word | `.docx` | Paragraphs + table cells |
| Excel | `.xlsx`, `.xls` | All sheets, first 100 rows per sheet as preview |
| PowerPoint | `.pptx` | Slide text content |
| Images | `.png/.jpg/.jpeg/.gif/.bmp/.webp/.tiff` | OCR only |
| CSV / TSV | `.csv`, `.tsv` | Raw text with encoding fallback |
| Plain text | `.txt/.md/.log/.rst` | Encoding fallback |
| Config | `.json/.yaml/.yml/.toml/.ini/.cfg/.env` | Raw text |
| Markup | `.html/.htm/.xml` | Raw text (no HTML stripping) |
| Source code | `.py/.js/.ts/.tsx/.go/.rs/.java/.cpp/.c/.sql/.sh/...` | Raw text |
| Unknown extension | anything else | Best-effort read as UTF-8/GBK text |

**Blocked** (rejected at `/upload`): executables (`.exe/.dll/.so/...`) and
archives (`.zip/.tar/...`). Ask the user to unpack archives locally first.

## Usage

**Always call the tool directly — do not run Python from bash.**

```
read_document(file_path="uploads/paper.pdf")
read_document(file_path="uploads/annual_report.pdf", pages="1-10")
read_document(file_path="uploads/contract.docx")
read_document(file_path="uploads/sales.xlsx")
read_document(file_path="uploads/deck.pptx")
read_document(file_path="uploads/chart.png")     # image → OCR
read_document(file_path="uploads/config.yaml")
read_document(file_path="uploads/notes.md")
```

The `pages` parameter only applies to PDF; other formats ignore it.

## Return envelope

All formats share this shape:

```json
{
  "status": "ok",
  "file": "paper.pdf",
  "format": "pdf",
  "char_count": 52000,
  "truncated": true,
  "text": "..."
}
```

Format-specific extra fields:

| Format | Extra keys |
|---|---|
| `pdf` | `total_pages`, `pages_read`, `ocr_pages` |
| `docx` | `paragraphs`, `tables` |
| `excel` | `sheets` (array of `{name, rows, cols}`) |
| `pptx` | `slides` |
| `text` | `encoding`, `size` |

Content longer than 15000 chars is truncated; for PDFs use the `pages`
parameter to read slices.

## Workflows

### Paper / report summary
```
1. read_document(file_path="paper.pdf")  → full text
2. Extract abstract, methodology, conclusion → summarize
```

### Contract review
```
1. read_document(file_path="contract.docx")  → paragraphs + tables
2. Flag key clauses (termination, liability, payment, IP)
```

### Spreadsheet quick-look
```
1. read_document(file_path="sales.xlsx")  → all sheet previews
2. If user wants trade journal analysis specifically, pivot to
   `analyze_trade_journal` tool instead (see trade-journal skill).
```

### Chart / screenshot / scanned PDF
```
1. read_document(file_path="scan.png")  → OCR text
2. If OCR returns empty, tell the user; don't fabricate.
```

## Notes

- **Encoding fallback** order for text: utf-8 → utf-8-sig → gbk → gb2312 → big5 → latin-1.
- **OCR** uses RapidOCR; if the package is missing, image/scanned files
  return empty `text` with a `note` field — tell the user to install
  `rapidocr-onnxruntime`.
- **Excel previews** are limited to 100 rows per sheet to stay in budget.
  If the user needs full data (e.g. trade journals), call
  `analyze_trade_journal` instead.
- **Source-code files** are returned raw; do not re-format or re-indent.
vibe-tradingSkill

Professional finance research toolkit — backtesting (7 engines + benchmark comparison panel), factor analysis, Alpha Zoo (452 pre-built alphas across qlib158/alpha101/gtja191/academic), options pricing, 77 finance skills, 29 multi-agent swarm teams, Trade Journal analyzer, and Shadow Account (extract → backtest → render) across 7 data sources (tushare, yfinance, okx, akshare, mootdx, ccxt, futu).

adr-hshareSkill

ADR/H-share/A-share cross-listing premium analysis — track pricing gaps between US-listed ADRs, HK-listed H-shares, and A-shares for arbitrage signals, dual-listing valuation, and delisting risk assessment.

akshareSkill

AKShare financial data aggregator (18k+ stars). Free, no API key. Covers A-shares, US, HK, futures, macro, forex. Primary fallback for tushare and yfinance.

alpha-zooSkill

Browse and bench the bundled alpha zoos — prebuilt cross-sectional factor libraries (Kakushadze 101, GTJA 191, Qlib 158, Fama-French / Carhart). Use when the user asks "which alphas exist", wants metadata on a named alpha, or wants to run IC/IR on a whole zoo over a universe.

ashare-pre-st-filterSkill

A 股 ST/*ST 风险预测框架 — 基于最新中报/三季报或业绩预告/快报,预测下一财年是否会因营收、利润、净资产、分红不达标而被风险警示,并将新浪监管处罚记录作为独立证据面纳入风险等级。仅适用于 A 股,不预测财务造假。

asset-allocationSkill

Asset allocation theory and optimizer usage — MPT / Black-Litterman / risk budgeting / all-weather strategy, including guides for 4 optimizers and rebalancing rules.

backtest-diagnoseSkill

Diagnose failed or underperforming backtests, locate the root cause, and fix the issue

behavioral-financeSkill

Behavioral finance applications: theories of overreaction and underreaction, behavioral explanations for momentum and reversal, investor sentiment cycles, cognitive-bias checklists, and debiasing quantitative strategies.