heavy-file-ingestion-claude-code
This Claude Code skill optimizes analysis of large files like PDFs, Word documents, spreadsheets, and presentations by converting them to lightweight markdown or CSV formats before processing. Use it when users request reading or analyzing heavyweight files, ensuring the conversion script runs first to generate an index and compressed artifact, then only reading the converted output to preserve context tokens and avoid inefficient direct ingestion of the original file.
git clone --depth 1 https://github.com/NateBJones-Projects/OB1 /tmp/heavy-file-ingestion-claude-code && cp -r /tmp/heavy-file-ingestion-claude-code/skills/heavy-file-ingestion/variants/claude-code ~/.claude/skills/heavy-file-ingestion-claude-codeSKILL.md
# Heavy File Ingestion For Claude Code ## Problem Claude Code has the tools to convert files locally, so it should not waste context by reading heavyweight files raw. ## Trigger Conditions - The user asks to read or analyze a PDF, DOCX, PPTX, XLSX, CSV, or TSV - The file is large enough or structured enough that direct ingestion is a bad trade - The user wants a markdown working copy, CSV normalization, or a fast map of the file before deeper analysis ## Process 1. Do not read the original heavyweight file directly into context if conversion is possible. 1. Resolve the bundled converter relative to this skill directory: `scripts/convert_heavy_file.py` 1. Run the converter first. Default command: ```bash python scripts/convert_heavy_file.py /absolute/path/to/file.ext ``` 1. If dependencies are missing, prefer: ```bash uv run \ --with pdfplumber \ --with python-docx \ --with python-pptx \ --with openpyxl \ python scripts/convert_heavy_file.py /absolute/path/to/file.ext ``` 1. Read the generated `index.md` before reading any converted artifact. 2. Use the index to decide the cheapest next step: - `read_extracted_artifact`: read the markdown or CSV and continue - `install_dependency_and_retry`: install the missing deterministic dependency and rerun - `cheap_model_or_stronger_converter`: retry with a better converter or use a cheaper model only on the extracted artifact 3. Only escalate to a stronger model after the file has already been compressed into markdown, CSV, or a short sampled subset. ## Client Rules - Prefer deterministic scripts over model-based conversion. - Save the converted artifacts next to the source file and work from those files. - For spreadsheets, use the generated per-sheet CSV files instead of trying to reason over workbook internals directly. - For PDFs, treat scan detection and low-density warnings as a routing signal, not as a reason to read the original PDF raw. ## Bundled References - `references/open-source-stack.md` explains the tool choices and fallback strategy.
Use Nate Jones OB1 Agent Memory from OpenClaw with provenance, scope, review, and use-policy discipline.
Continuous learning system that extracts reusable knowledge from work sessions. Triggers: (1) /aiception command, (2) 'save this as a skill' or 'extract a skill from this', (3) 'what did we learn?', (4) after non-obvious debugging or trial-and-error discovery. Creates new skills when valuable reusable knowledge is identified. Integrates with Open Brain to prevent duplicates.
Morning digest of yesterday's Open Brain thoughts, drafted to Gmail
Generate infographic images from any research doc, Open Brain thoughts, or analysis. Auto-chunks content, writes prompts, generates images via Gemini API (free tier), and saves to media/. Use --premium for better text rendering.
|
Use when processing voice transcripts, brain dumps, stream-of-consciousness notes, or any raw multi-topic capture. Extracts every idea thread, then evaluates each one with deep brainstorming, then captures results to Open Brain. Trigger on transcripts, exports, "process this", "pan for gold", "brain dump", "what did I say", or multi-topic markdown files.
|